CN115439470A

CN115439470A - Polyp image segmentation method, computer-readable storage medium, and computer device

Info

Publication number: CN115439470A
Application number: CN202211261125.0A
Authority: CN
Inventors: 施连焘; 李正国; 王玉峰; 李建阳
Original assignee: Shenzhen Polytechnic
Current assignee: Shenzhen Polytechnic
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2022-12-06
Anticipated expiration: 2042-10-14
Also published as: CN115439470B

Abstract

The application provides a polyp image segmentation method, a computer readable storage medium and a computer device, comprising: inputting a polyp image to be segmented into a multi-scale semantic fusion model to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the multi-scale semantic fusion model again, then performing down-sampling, and repeating for multiple times to obtain a high-dimensional semantic information image; inputting a high-dimensional semantic information image into a context-aware pyramid aggregation model, outputting a fused feature map, performing upsampling on the fused feature map, extracting features through convolution, performing upsampling on the feature map after feature extraction again, extracting features through convolution, and repeating for multiple times until a final feature map with the same size as the polyp image channel to be segmented is obtained. Can adapt to polyp change to realize accurate polyp segmentation.

Description

Polyp image segmentation method, computer-readable storage medium, and computer device

Technical Field

The present application relates to the field of image segmentation, and more particularly, to a polyp image segmentation method, a computer-readable storage medium, and a computer device.

Background

Colorectal cancer is a colorectal cancer which is caused by long-term and many reasons when polyps (raised masses in gastrointestinal channels) formed in intestinal tracts are developed at the earliest stage, and if the polyps can be detected and cut off by intervention at the early stage and the colorectal cancer can be prevented, the most effective method for screening and diagnosing the colorectal cancer is colorectal endoscopy which is the most mainstream method for diagnosing the colorectal cancer at present.

However, although the current diagnosis method is advanced and accurate, some problems still exist, according to some professional research reports, in the process of endoscopy, every four polyps are missed to cause incomplete resection, and hidden troubles are left, in addition, the shapes of the polyps are different and changeable, each fine judgment is difficult to be carried out through naked eyes, and especially under the condition that some polyps are not different from the gastrointestinal channel background, the last is that the rapid identification can not be carried out only by people, a lot of time and energy are needed, and the judgment needs to be carried out by adding a lot of workload of gastroenterology doctors under the current medical system.

Disclosure of Invention

The application aims to provide a polyp image segmentation method, a computer readable storage medium and computer equipment, and aims to solve the problem that hidden dangers are left due to incomplete resection caused by omission of polyps in endoscopy.

In a first aspect, the present application provides a polyp image segmentation method, comprising:

acquiring a polyp image to be segmented;

inputting a polyp image to be segmented into a multi-scale semantic fusion model to obtain a semantic information image, then down-sampling the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the multi-scale semantic fusion model again, then down-sampling, and repeating for multiple times to obtain a high-dimensional semantic information image; the multi-scale semantic fusion model is characterized in that an initial feature map with the same size as a polyp image to be segmented is obtained by extracting features of the polyp image to be segmented, the initial feature map is divided into 4 feature maps with the same channel number, 3 feature maps are selected to be spliced with the remaining feature map in channel dimension in sequence after being subjected to convolution and batch regularization algorithm, and the feature maps obtained after residual connection and splicing are fused with the polyp image to be segmented to obtain a semantic information image;

inputting a high-dimensional semantic information image into a context-aware pyramid aggregation model, outputting a fused feature map, performing upsampling on the fused feature map, extracting features through convolution, performing upsampling on the feature map after feature extraction again, extracting features through convolution, and repeating for multiple times until a final feature map with the same size as the polyp image channel to be segmented is obtained;

the context-aware pyramid aggregation model is used for performing pooling operation on an input high-dimensional semantic information image in multiple different scales, extracting feature maps with the same number of channels and different resolutions, sequentially performing up-sampling on the four feature maps after dimension reduction to obtain an up-sampled feature map with the same size as the high-dimensional semantic information image, and splicing the up-sampled feature maps according to channel dimensions to obtain a spliced feature map; performing convolution on the spliced feature map to reduce the dimension of a channel, obtaining an attention weight map by using a Sigmoid activation function, performing attention moment matrix multiplication on the attention weight map, reshaping the weight of the spliced feature map, and obtaining a feature map based on a spatial attention mechanism; performing feature extraction on the spliced feature graph, inputting the feature graph into a channel attention mechanism to obtain channel weight, and obtaining a feature graph based on the channel attention mechanism; and fusing the characteristic diagram based on the space attention mechanism and the characteristic diagram based on the channel attention mechanism to obtain a fused characteristic diagram.

Further, the specific process of the multi-scale semantic fusion model is as follows:

the polyp image X to be segmented is defined as: x is formed by R ^C×H×W Passing the polyp image to be segmented through W ₁ (. The) extracting the characteristics to obtain an initial characteristic diagram X' with the same size as the polyp image to be segmented as follows: x' is belonged to R ^C×H×W ；

The W is ₁ (. 1) includes a 1 × 1 convolution, a batch regularization algorithm, and a ReLU nonlinear activation function;

according to the channel dimension, making the initial characteristic diagram X' be equal to R ^C×H×W Divided into 4 characteristic graphs with same channel number

3 feature maps X in the three ₁ ，X ₂ ，X ₃ Via W ₂ (. The) is converted, and the converted characteristic diagram W is ₂ (X ₁ ),W ₂ (X ₂ ),W ₂ (X ₃ ) With the remaining one of the feature maps X ₀ Sequentially splicing according to the channel dimension to obtain a spliced characteristic diagram X with the same number as the channels of the polyp image to be segmented _Cat Namely:

X _Cat ＝CONCAT(W ₂ (X ₁ ),W ₂ (X ₂ ),W ₂ (X ₃ ),X ₀ )；

the W is ₂ (. H) includes 3 × 3 convolution and batch regularization algorithms;

residual errors are connected and spliced to form a characteristic diagram, then the characteristic diagram is fused with a polyp image to be segmented, and a semantic information image X is output _Out Namely:

wherein, R represents a three-dimensional array image, C, H and W respectively represent the channel number, length and width of the image;

the addition and summation operation of the pixel level is represented, and CONCAT represents splicing on the channel dimension; w ₃ (. Cndot.) includes a 1 × 1 convolution, a batch regularization algorithm, and a ReLU nonlinear activation function.

Further, the context-aware pyramid aggregation model includes a context-aware fusion model and an attention correction model.

Further, the specific operation flow of the context-aware fusion model is as follows:

defining the input semantic information image D with high dimension as D belonged to R ^C×H×W Extracting feature maps with unchanged number of four channels and different resolutions by using a plurality of pooling operations with different scales, wherein the feature maps respectively comprise: d ₀ ∈R ^C×6×6 ，D ₁ ∈R ^C×3×3 ，D ₂ ∈R ^C×2×2 and D₃ ∈R ^C×1×1 ；

Respectively reducing dimensions of the four feature maps through 1 × 1 convolution, a batch regularization algorithm and a ReLU nonlinear activation function, and compressing the number of channels to one fourth, namely:

and

then, the feature map after dimensionality reduction is up-sampled to obtain an up-sampled feature map D' with the same size as the high-dimensional semantic information image D _i Namely:

D″ _i ＝(Up(D′ _i ,β _i ))；

splicing the up-sampled characteristic graphs by channel dimension to obtain a spliced characteristic graph D _Cat Namely:

D _Cat ＝CONCAT(D″ ₀ ,D″ ₁ ,D″ ₂ ,D″ ₃ )；

wherein ,

i represents a natural number, beta _i And representing a correlation coefficient, wherein Up is conventional bilinear interpolation upsampling, and CONCAT is splicing on a channel dimension.

Further, the specific operation flow of the attention correction model is as follows:

reducing the dimension of the channel dimension by adopting 1 multiplied by 1 convolution on the spliced feature map, obtaining an attention weight map through a Sigmoid activation function, carrying out attention moment matrix multiplication on the attention weight map, reshaping the weight of the spliced feature map, modeling a spatial attention mechanism, and obtaining a feature map D based on the spatial attention mechanism _Spatial Namely:

wherein ,

multiplication of the expression attention matrix, σ (-) is the Sigmiod activation function, S ₀ Representing a 1 × 1 convolution operation, α being with S ₀ A coefficient of correlation;

extracting the characteristics of the spliced characteristic diagram to obtain an extracted characteristic diagram, inputting the extracted characteristic diagram into a channel attention mechanism to obtain channel weight, and acquiring the characteristic diagram based on the channel attention mechanism, namely:

wherein ,F_Adaptive (i) Cross-channel information interaction can be achieved locally with different convolution kernel sizes, G (i) denotes global average pooling

H ', W' refers to the pixel space coordinates, D _channel The attention mechanism of channel dimension is represented, i, j represents natural number, and theta is the phase of G (i)A correlation coefficient;

fusing the characteristic diagram based on the space attention mechanism and the characteristic diagram based on the channel attention mechanism to obtain a fused characteristic diagram D _Out Namely:

in a second aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the polyp image segmentation method.

In a third aspect, the present application provides a computer device comprising: one or more processors, a memory, and one or more computer programs, the processors and the memory being connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, which when executing the computer programs implement the steps of the polyp image segmentation method.

In the application, a multi-scale semantic fusion model is designed, and semantic information images of different scales are collected through various filters to improve the representation capability, so that the size change of polyps is adapted, particularly smaller polyps are internally provided with finer granularity levels, and the sense field of a network is increased by extracting features through convolution kernels of different scales; a context perception pyramid aggregation model is designed, feature information of different regions is guided to be fused, a dual attention mechanism is contained inside the context perception pyramid aggregation model, important features are further strengthened, features of non-important regions are effectively restrained, accurate polyp segmentation is achieved, and real-time performance is considered.

Drawings

Fig. 1 is a flowchart of a polyp image segmentation method according to an embodiment of the present application.

Fig. 2 is a flowchart of another polyp image segmentation method according to an embodiment of the present application.

Fig. 3 is a flowchart of a multi-scale semantic fusion model provided in an embodiment of the present application.

Fig. 4 is a flowchart of a context-aware fusion model according to an embodiment of the present application.

Fig. 5 is a flowchart of an attention correction model according to an embodiment of the present application.

Fig. 6 is a table of data analysis provided by an embodiment of the present application in contrast to current advanced polyp image segmentation methods.

Fig. 7 is a block diagram illustrating a specific structure of a computer device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Referring to fig. 1, a polyp image segmentation method according to an embodiment of the present application includes the following steps: note that the polyp image segmentation method according to the present application is not limited to the flow sequence shown in fig. 1 if substantially the same result is obtained.

S101, obtaining a polyp image to be segmented;

s102, inputting a polyp image to be segmented into a multi-scale semantic fusion model to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a semantic information image after down-sampling, inputting the semantic information image after down-sampling into the multi-scale semantic fusion model again, then performing down-sampling, and repeating for multiple times to obtain a high-dimensional semantic information image; the multi-scale semantic fusion model is characterized in that an initial feature map with the same size as a polyp image to be segmented is obtained by extracting features of the polyp image to be segmented, the initial feature map is divided into 4 feature maps with the same channel number, 3 feature maps are selected to be spliced with the remaining feature map in channel dimension in sequence after being subjected to convolution and batch regularization algorithm, and the feature maps obtained after residual connection and splicing are fused with the polyp image to be segmented to obtain a semantic information image;

s103, inputting a high-dimensional semantic information image into a context-aware pyramid aggregation model, outputting a fused feature map, performing upsampling on the fused feature map, extracting features through convolution, performing upsampling on the feature map after feature extraction again, extracting features through convolution, and repeating for multiple times until a final feature map with the same size as the polyp image channel to be segmented is obtained;

s104, performing pooling operation on an input high-dimensional semantic information image in multiple different scales by using the context-aware pyramid aggregation model, extracting four feature maps with unchanged channel number and different resolutions, performing dimensionality reduction on the four feature maps, then sequentially performing upsampling on the feature maps to obtain an upsampled feature map with the same size as the high-dimensional semantic information image, and splicing the upsampled feature maps by using channel dimensionality to obtain a spliced feature map; performing convolution on the spliced feature graph to reduce the dimension of a channel, obtaining an attention weight graph by using a Sigmoid activation function, performing attention moment matrix multiplication on the attention weight graph, and reshaping the weight of the spliced feature graph to obtain a feature graph based on a space attention mechanism; performing feature extraction on the spliced feature map, and inputting the feature map into a channel attention mechanism to obtain channel weight and a feature map based on the channel attention mechanism; and fusing the characteristic diagram based on the space attention mechanism and the characteristic diagram based on the channel attention mechanism to obtain a fused characteristic diagram.

Referring to fig. 2, 001 represents a multi-scale semantic fusion model, 002 represents down-sampling, 003 represents a context-aware pyramid aggregation model, and 004 represents up-sampling; CAF represents a context-aware fusion model, and APO represents an attention correction model; 005 stands for convolution feature extraction; the left side and the right side are symmetrical, the left area is a coding area, the right side is a decoding area, and a broken line arrow represents jump connection operation.

Referring to fig. 3, in an embodiment of the present application, a specific process of the multi-scale semantic fusion model (i.e., MSFM) is as follows:

the polyp image X to be segmented is defined as: x belongs to R ^C×H×W Passing the polyp image to be segmented through W ₁ (i) Performing feature extraction to obtain an initial feature map X' with the same size as the polyp image to be segmented as follows: x' is belonged to R ^C×H×W ；

W is ₁ (i) The method comprises 1 × 1 convolution, a batch regularization algorithm and a ReLU nonlinear activation function;

X _Cat ＝CONCAT(W ₂ (X ₁ ),W ₂ (X ₂ ),W ₂ (X ₃ ),X ₀ )；

w is ₂ (. H) includes 3 × 3 convolution and batch regularization algorithms;

addition of representation pixel levelsAnd operation, CONCAT represents splicing in channel dimension; w ₃ (. Cndot.) includes a 1 x 1 convolution, a batch regularization algorithm, and a ReLU nonlinear activation function.

In an embodiment of the present application, the context-aware pyramid aggregation model (i.e., CPAM) includes a context-aware fusion model and an attention-correction model.

In an embodiment of the present application, a specific operation flow of the context-aware fusion model is as follows:

and

D″ _i ＝(Up(D′ _i ,β _i ))；

D _Cat ＝CONCAT(D″ ₀ ,D″ ₁ ,D″ ₂ ,D″ ₃ )；

wherein ,

i represents a natural number, β _i And representing a correlation coefficient, wherein Up is conventional bilinear interpolation upsampling, and CONCAT is splicing on a channel dimension.

Referring to FIG. 4, CBR represents a 1 × 1 convolution, batch regularization algorithm, and ReLU nonlinear activation function.

In an embodiment of the present application, referring to fig. 5, a specific operation flow of the attention correction model includes:

wherein ,

wherein ,F_Adaptive (. H) Cross-channel information interaction can be achieved locally with different convolution kernel sizes, G (-) represents the Global average pooling

H ', W' refer to the pixel spatial coordinates,D _channel representing a channel dimension attention mechanism, i, j representing a natural number, and theta is a correlation coefficient of G (·);

fig. 6 is a data analysis table comparing with the current advanced polyp image segmentation method, which is provided by an embodiment of the present application, and can more intuitively show various performance indicators.

An embodiment of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of a polyp image segmentation method as provided by an embodiment of the present application.

Fig. 7 shows a specific structural block diagram of a computer device provided in an embodiment of the present application, where a computer device 100 includes: one or more processors 101, a memory 102, and one or more computer programs, wherein the processors 101 and the memory 102 are connected by a bus, the one or more computer programs being stored in the memory 102 and configured to be executed by the one or more processors 101, the processor 101 implementing the steps of the polyp image segmentation method as provided by an embodiment of the present application when executing the computer programs.

The computer equipment comprises a server, a terminal and the like. The computer device may be a desktop computer, a mobile terminal or a vehicle-mounted device, and the mobile terminal includes at least one of a mobile phone, a tablet computer, a personal digital assistant or a wearable device.

In the embodiment of the application, a multi-scale semantic fusion model is designed, and semantic information images of different scales are collected through various filters to improve the representation capability, so that the size change of polyps is adapted, particularly smaller polyps are internally provided with finer granularity levels, and the sense field of a network is increased by extracting features through convolution kernels of different scales; a context-aware pyramid aggregation model is designed, the feature information of different regions is guided to be fused, a dual attention mechanism is contained in the context-aware pyramid aggregation model, important features are further strengthened, the features of non-important regions are effectively restrained, accurate polyp segmentation is achieved, and real-time performance is considered.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of segmenting a polyp image, comprising:

acquiring a polyp image to be segmented;

inputting a polyp image to be segmented into a multi-scale semantic fusion model to obtain a semantic information image, then performing down-sampling on the semantic information image to obtain a down-sampled semantic information image, inputting the down-sampled semantic information image into the multi-scale semantic fusion model again, then performing down-sampling, and repeating for multiple times to obtain a high-dimensional semantic information image; the multi-scale semantic fusion model is characterized in that an initial feature map with the same size as a polyp image to be segmented is obtained by extracting features of the polyp image to be segmented, the initial feature map is divided into 4 feature maps with the same channel number, 3 feature maps are selected to be spliced with the remaining feature map in channel dimension in sequence after being subjected to convolution and batch regularization algorithm, and the feature maps obtained after residual connection and splicing are fused with the polyp image to be segmented to obtain a semantic information image;

the context-aware pyramid aggregation model is used for performing pooling operation on an input high-dimensional semantic information image in multiple different scales, extracting four feature maps with unchanged channel number and different resolutions, sequentially performing up-sampling on the four feature maps after dimensionality reduction to obtain an up-sampled feature map with the same size as the high-dimensional semantic information image, and splicing the up-sampled feature maps according to channel dimensionality to obtain a spliced feature map; performing convolution on the spliced feature graph to reduce the dimension of a channel, obtaining an attention weight graph by using a Sigmoid activation function, performing attention moment matrix multiplication on the attention weight graph, and reshaping the weight of the spliced feature graph to obtain a feature graph based on a space attention mechanism; performing feature extraction on the spliced feature map, and inputting the feature map into a channel attention mechanism to obtain channel weight and a feature map based on the channel attention mechanism; and fusing the characteristic diagram based on the space attention mechanism and the characteristic diagram based on the channel attention mechanism to obtain a fused characteristic diagram.

2. The polyp image segmentation method as set forth in claim 1, wherein the specific flow of the multi-scale semantic fusion model is:

the polyp image X to be segmented is defined as: x is formed by R ^C×H×W Passing the polyp image to be segmented through W ₁ (. The) carries on the characteristic extraction, get an initial characteristic map X' with the same size of the polyp picture to be cut apart: x' is belonged to R ^C×H×W ；

W is ₁ (. 1) includes a 1 × 1 convolution, a batch regularization algorithm, and a ReLU nonlinear activation function;

according to the channel dimensionDegree is to make the initial characteristic diagram X' be equal to R ^C×H×W Divided into 4 characteristic graphs with same channel number

3 feature maps X in the three ₁ ，X ₂ ，X ₃ Via W ₂ (. The) making a transition, and making the transformed characteristic diagram W ₂ (X ₁ ),W ₂ (X ₂ ),W ₂ (X ₃ ) With the remaining one of the feature maps X ₀ Sequentially splicing according to the channel dimension to obtain a spliced characteristic diagram X with the same number as the channels of the polyp image to be segmented _Cat Namely:

X _Cat ＝CONCAT(W ₂ (X ₁ ),W ₂ (X ₂ ),W ₂ (X ₃ ),X ₀ )；

3. The polyp image segmentation method of claim 1, wherein the context-aware pyramid aggregation model comprises a context-aware fusion model and an attention correction model.

4. The polyp image segmentation method as set forth in claim 3, wherein the specific operation flow of the context-aware fusion model is:

defining the input high-dimensional semantic information image D as D belongs to R ^C×H×W Extracting feature maps with unchanged number of four channels and different resolutions by using a plurality of pooling operations with different scales, wherein the feature maps respectively comprise: d ₀ ∈R ^C×6×6 ，D ₁ ∈R ^C×3×3 ，D ₂ ∈R ^C×2×2 and D₃ ∈R ^C ^×1×1 ；

and

then, the feature map after dimensionality reduction is subjected to upsampling to obtain an upsampled feature map D' with the same size as the semantic information image D with high dimensionality _i Namely:

D″ _i ＝(Up(D _i ′,β _i ))；

D _Cat ＝CONCAT(D″ ₀ ,D″ ₁ ,D″ ₂ ,D″ ₃ )；

wherein ,

5. The polyp image segmentation method as set forth in claim 4, wherein the specific operation flow of the attention correction model is:

performing dimension reduction on channel dimensions on the spliced feature map by adopting 1 multiplied by 1 convolution, obtaining an attention weight map through a Sigmoid activation function, performing attention moment array multiplication on the attention weight map, reshaping the weight of the spliced feature map, modeling a spatial attention mechanism, and obtaining a feature map D based on the spatial attention mechanism _Spatial Namely:

wherein ,

H ', W' refers to the pixel space coordinates, D _channel Representing a channel dimension attention mechanism, i, j representing a natural number, and theta is a correlation coefficient of G (·);

fusing the characteristic diagram based on the space attention mechanism and the characteristic diagram based on the channel attention mechanism to obtain a fused characteristic diagramCombined feature map D _Out Namely:

6. a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the polyp image segmentation method according to any one of claims 1 to 5.

7. A computer device, comprising:

one or more processors;

a memory; and one or more computer programs, the processor and the memory being connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, characterized in that the steps of the polyp image segmentation method according to any one of claims 1 to 5 are implemented when the computer programs are executed by the processors.