CN113724276A

CN113724276A - Polyp image segmentation method and device

Info

Publication number: CN113724276A
Application number: CN202110889919.0A
Authority: CN
Inventors: 李镇; 魏军; 胡译文; 周少华; 崔曙光
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2021-11-30

Abstract

The embodiment of the invention discloses a method and a device for segmenting a polyp image, which comprise the following steps: acquiring a polyp image to be input; selecting a reference image with a color different from that of the polyp image from a preset training set, and exchanging the colors of the reference image and the polyp image; extracting a shallow feature and a deep feature from the polyp image after color exchange, suppressing background noise of the shallow feature by using a shallow attention model, and fusing the shallow feature and the deep feature; and (4) carrying out predictive response value re-equalization processing on the fused features by adopting a probability correction strategy model to obtain a polyp feature image with clear edges. The method can accurately and efficiently segment the polyp part from the image, and has better generalization in various complex actual scenes.

Description

Polyp image segmentation method and device

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a method and a device for segmenting a polyp image.

Background

Polyps are susceptible to carcinogenesis, especially multiple polyps, and early screening and treatment of polyps is therefore essential. Polyp Segmentation (Polyp Segmentation) is a computer vision task, which can automatically segment Polyp parts in images or videos, greatly reduce the workload of doctors, and therefore, establishing a set of accurate Polyp Segmentation models has great significance for clinical medical diagnosis.

Currently, PraNet based on parallel-reverse attention networks is the most commonly used prior art. PraNet firstly utilizes Res2Net neural network to extract features with different semantic levels from a polyp image, and then utilizes a parallel decoder to aggregate the features with high semantic levels so as to obtain the global context information of the image, but because the features with high semantic levels lose excessive detail information, the obtained polyp segmentation result is relatively rough. To further mine polyp boundary cues, PraNet utilizes a reverse attention module to construct the relationship between polyp regions and polyp boundaries. By continuous interactive supplementation between polyp regions and polyp boundaries, PraNet can obtain more accurate polyp segmentation prediction results.

Although PraNet can obtain relatively accurate results, it has two important drawbacks: (1) the result of the target segmentation of the mince meat is poor. Small polyps lose too much information in features of high semantic level and are difficult to recover directly; in addition, a large error exists in the boundary marking of the small meat, and the influence on the final segmentation result is large; (2) color deviations present in the data set are disregarded. Generally, there is a large difference in color of polyp images acquired under different conditions, which may interfere with the training of polyp segmentation models, and especially when there are few training images, the models are easily over-fitted to the polyp color, resulting in a significant generalization capability of the models in practical application scenarios.

Disclosure of Invention

To solve the above technical problem, an embodiment of the present invention provides a method for segmenting a polyp image, including:

acquiring a polyp image to be input;

selecting a reference image with a color different from that of the polyp image from a preset training set, and exchanging the colors of the reference image and the polyp image;

extracting a shallow feature and a deep feature from the polyp image after color exchange, suppressing background noise of the shallow feature by using a shallow attention model, and fusing the shallow feature and the deep feature;

and (4) carrying out predictive response value re-equalization processing on the fused features by adopting a probability correction strategy model to obtain a polyp feature image with clear edges.

Further, said swapping colors of said reference image and said polyp image comprises:

converting the colors of the polyp image X1 and the reference image X2 from an RGB color space to an LAB color space to obtain color values L1 and L2 of the polyp image X1 and the reference image X2 in the LAB color space;

calculating the mean and standard deviation of the channels in LAB color space for the polyp image X1 and the mean and standard deviation of the channels in LAB color space for the reference image X2;

the color value of the polyp image Y1 in the RGB color space and the color value of the reference image Y2 in the RGB color space are obtained by using a preset color conversion formula.

Further, the suppressing the background noise of the shallow feature by using the shallow attention model includes:

the deep features are subjected to bilinear difference upsampling, so that the resolution of the deep features after sampling is the same as that of the shallow features;

selecting elements larger than 0 from the sampled deep features to determine the elements as the attention diagram of the shallow features, and obtaining the deep features to be fused;

and multiplying the deep layer feature to be fused and the shallow layer feature element by element to obtain the shallow layer feature after background noise suppression.

Further, said fusing the shallow features and the deep features comprises:

extracting first, second and third features of the last three scales when the shallow feature after the background noise suppression is processed by adopting a convolutional neural network;

fusing the first feature and the second feature to obtain a first fused feature;

fusing the second feature and the third feature to obtain a second fused feature;

and splicing the first fusion feature and the second fusion feature according to a channel to obtain a final fusion feature.

Further, carrying out predictive response value rebalancing on the fused features by adopting a probability correction strategy model to obtain a polyp feature image with clear edges, wherein the steps of:

counting the number of pixels with the characteristic response value larger than 0 in the fused polyp characteristic image to obtain a first pixel value;

counting the number of pixels with the characteristic response value smaller than 0 in the fused polyp characteristic image to obtain a second pixel value;

and performing normalization processing on the first pixel value and the second pixel value, dividing the characteristic response value which is greater than 0 in the polyp characteristic image by the normalized first pixel value, and dividing the characteristic response value which is less than 0 in the polyp characteristic image by the normalized second pixel value to obtain a corrected polyp characteristic image.

A polyp image segmentation apparatus comprising:

the acquisition module is used for acquiring a polyp image to be input;

the processing module is used for selecting a reference image with a color different from that of the polyp image from a preset training set and exchanging the colors of the reference image and the polyp image;

the processing module is used for extracting a shallow feature and a deep feature from the polyp image after color exchange, suppressing background noise of the shallow feature by using a shallow attention model, and fusing the shallow feature and the deep feature;

and the execution module is used for carrying out prediction response value re-equalization processing on the fused features by adopting a probability correction strategy model to obtain a polyp feature image with clear edges.

Further, the module for processing comprises:

a first processing sub-module, configured to convert colors of the polyp image X1 and the reference image X2 from an RGB color space to an LAB color space, and obtain color values L1 and L2 of the polyp image X1 and the reference image X2 in the LAB color space;

a second processing sub-module for computing the mean and standard deviation of the channels in LAB color space of said polyp image X1 and the mean and standard deviation of the channels in LAB color space of said reference image X2;

and the third processing submodule is used for obtaining the color value of the polyp image Y1 in the RGB color space and the color value of the reference image Y2 in the RGB color space by using a preset color conversion formula.

Further, the processing module comprises:

the fourth processing submodule is used for upsampling the deep layer features through bilinear difference values to enable the resolution of the sampled deep layer features to be the same as that of the shallow layer features;

the first acquisition submodule is used for selecting elements larger than 0 from the sampled deep features to determine the elements as the attention diagram of the shallow feature so as to obtain the deep features to be fused;

and the first execution submodule is used for multiplying the deep layer features to be fused and the shallow layer features element by element to obtain the shallow layer features after background noise is suppressed.

Further, the processing module comprises:

the second acquisition submodule is used for extracting the first feature, the second feature and the third feature of the last three scales when the shallow feature after the background noise is suppressed is processed by adopting a convolutional neural network;

a fifth processing submodule, configured to fuse the first feature and the second feature to obtain a first fused feature;

a sixth processing submodule, configured to fuse the second feature and the third feature to obtain a second fused feature;

and the second execution submodule is used for splicing the first fusion feature and the second fusion feature according to a channel to obtain a final fusion feature.

Further, the execution module includes:

the third obtaining submodule is used for counting the number of pixels with the characteristic response value larger than 0 in the fused polyp characteristic image to obtain a first pixel value;

the fourth obtaining submodule is used for counting the number of pixels with the characteristic response value smaller than 0 in the fused polyp characteristic image to obtain a second pixel value;

and the third execution submodule is used for carrying out normalization processing on the first pixel value and the second pixel value, dividing the characteristic response value which is greater than 0 in the polyp characteristic image by the normalized first pixel value, and dividing the characteristic response value which is less than 0 in the polyp characteristic image by the normalized second pixel value to obtain a corrected polyp characteristic image.

The embodiment of the invention has the beneficial effects that:

(1) aiming at the problem of inaccurate target segmentation of the meat mince, the Shallow Attention Module (SAM) in the invention can enhance the extraction and utilization capacity of the neural network shallow features by the model, because the shallow features reserve more detailed features for the meat mince. Different from the traditional method for fusing multilayer features by directly performing operations such as addition or splicing, the SAM utilizes the deep features as assistance and removes background noise in the shallow features by means of attention mechanism guidance, so that the usability of the shallow features is greatly improved. In addition, the foreground and background pixel distribution of the mince meat image is unbalanced, and for the reason, the response value of the mince meat image can be dynamically and adaptively corrected according to the prediction result in the model inference stage through a Probability Correction Strategy (PCS), so that the edge of the segmentation target is optimized, and the influence of unbalanced distribution of the foreground and the background is reduced.

(2) The present invention addresses the problem of dataset color bias, a Color Exchange (CE) operation is proposed to eliminate the effect of color bias on model training. Specifically, the colors of different images of the CE can migrate to each other, and the colors of the same image can change differently, so that the decoupling of the image colors and the image contents is realized, and therefore, the model can concentrate on the image contents without being interfered by the colors in the training. A large number of quantitative and qualitative experiments show that the SANet model provided by the invention can accurately and efficiently segment polyp parts from images and has better generalization in various complex actual scenes.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a polyp image segmentation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating comparison of effects provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating comparison of effects provided by the embodiment of the present invention;

fig. 4 is a schematic structural diagram of a polyp image segmentation apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of a basic structure of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a diagram illustrating a method for segmenting a polyp image according to an embodiment of the present invention, including:

s1100, acquiring a polyp image to be input;

s1200, selecting a reference image with a color different from that of the polyp image from a preset training set, and exchanging the colors of the reference image and the polyp image;

s1300, extracting a shallow feature and a deep feature from the polyp image after color exchange, utilizing a shallow attention model to restrain background noise of the shallow feature, and fusing the shallow feature and the deep feature;

and S1400, carrying out predictive response value rebalancing treatment on the fused features by adopting a probability correction strategy model to obtain a polyp feature image with clear edges.

The Superficial Attention Module (SAM) in the present invention can enhance the extraction and utilization of neural network superficial features by the model, since the superficial features retain more detailed features for the mince. Different from the traditional method for fusing multilayer features by directly performing operations such as addition or splicing, the SAM utilizes the deep features as assistance and removes background noise in the shallow features by means of attention mechanism guidance, so that the usability of the shallow features is greatly improved. In addition, the foreground and background pixel distribution of the mince meat image is unbalanced, and for the reason, the response value of the mince meat image can be dynamically and adaptively corrected according to the prediction result in the model inference stage through a Probability Correction Strategy (PCS), so that the edge of the segmentation target is optimized, and the influence of unbalanced distribution of the foreground and the background is reduced. Furthermore, the present invention proposes a Color Exchange (CE) operation to eliminate the effect of color bias on model training for the data set color bias problem. Specifically, the colors of different images of the CE can migrate to each other, and the colors of the same image can change differently, so that the decoupling of the image colors and the image contents is realized, and therefore, the model can concentrate on the image contents without being interfered by the colors in the training. A large number of quantitative and qualitative experiments show that the SANet model provided by the invention can accurately and efficiently segment polyp parts from images and has better generalization in various complex actual scenes.

The embodiment of the invention comprises three models of CE, SAM and PCS, wherein the CE is used in a data augmentation stage and can transfer the colors of different images to an input image; the SAM can fully exert the potential of shallow features in the feature fusion stage; the PCS is used in a model reasoning stage to optimally adjust the prediction result.

Specifically, the color swapping operation acts directly on the input images, so that the same input image may exhibit different color styles during the model training process, as shown in fig. 2. Specifically, for any input image, an image with different colors is randomly selected from a training set as a reference and the color of the image is transferred to the input image, and because the selected reference image has randomness each time, the same input image can present different color styles but the corresponding label is kept unchanged, so that the model focuses on the image content in the training without being influenced by the image color. Wherein swapping the colors of the reference image and the polyp image comprises:

step one, converting colors of the polyp image X1 and the reference image X2 from an RGB color space to an LAB color space to obtain color values L1 and L2 of the polyp image X1 and the reference image X2 in the LAB color space;

step two, calculating the mean value and standard deviation of the channel of the polyp image X1 in the LAB color space and the mean value and standard deviation of the channel of the reference image X2 in the LAB color space;

and step three, obtaining the color value of the polyp image Y1 in the RGB color space and the color value of the reference image Y2 in the RGB color space by using a preset color conversion formula.

In the embodiment of the invention, the minuscule image faces a serious information loss problem in feature downsampling, so that the shallow features containing abundant details are fully utilized to have important significance for the segmentation of the minuscule target, but due to the limitation of receptive field, the features contain a large amount of background noise. Therefore, the SAM provided by the invention utilizes the deep features to carry out background noise suppression on the shallow features, can fully improve the usability of the shallow features, and promotes the segmentation effect of the model on the mince meat target. Specifically, suppressing background noise of the shallow feature by using a shallow attention model comprises:

firstly, up-sampling the deep features by a bilinear difference value to enable the resolution of the deep features and the shallow features after sampling to be the same;

and step three, multiplying the deep layer feature to be fused and the shallow layer feature element by element to obtain the shallow layer feature after background noise suppression.

According to one embodiment of the invention, the element which is up-sampled by bilinear interpolation to the same resolution, namely, is less than 0 is set as 0, and is taken as an attention map, namely; the sums are multiplied element by element to suppress background noise, i.e. the noise is suppressed.

In embodiments of the invention, the SAM may effectively fuse deep and shallow features together. Wherein fusing the shallow features and the deep features comprises:

In the SANet model, the characteristics (respectively denoted as f3, f4 and f5) output by the stage3, the stage4 and the stage5 of Res2Net are fused so as to reduce the calculation amount of the model. Based on the SAM, will be fused together to take full advantage of the features of each scale. Mixing the above materials together; mixing the above materials together; and splicing the newly obtained fusion characters according to the channels to obtain the final fusion character.

In the embodiment of the invention, the polyp image has a serious phenomenon of uneven distribution of foreground and background pixels. The negative samples (background pixels) dominate the model training process, and the a priori deviation causes the model to be more inclined to give the positive samples (foreground pixels) lower response values (logit), so that the target edge segmentation effect is poor. In order to correct the imbalance, the invention uses PCS to rebalance the predicted response value in the model reasoning stage, wherein, the probability correction strategy model is adopted to perform the rebalance processing on the predicted response value of the fused feature to obtain the polyp feature image with clear edge, comprising:

According to one embodiment of the invention, the number of pixels with response values larger than 0 (logic >0) is counted; counting the number of pixels with response values smaller than 0 (logic < 0); carrying out normalization, namely; dividing the response value of logic >0 by the obtained corrected polyp characteristic image, and dividing the response value of logic <0 by the obtained corrected polyp characteristic image. After passing through the PCS, the deviation of the prediction result caused by the number of positive and negative samples is eliminated, and the target edge portion can obtain a clearer prediction result, as shown in fig. 2, which shows part of details of the result obtained by using the PCS.

TABLE 1 quantitative results of different models on data set

In one embodiment of the present invention, table 1 shows the quantitative results of different models on 5 data sets such as Kvasir, CVC-clicidb, CVC-ColonDB, endoscreen, ETIS, etc., and it can be seen that the present invention achieves the highest score on all data sets. Fig. 3 shows the qualitative experimental results of different algorithms on a specific image, and it can be seen that the present invention can obtain more complete and clear polyp regions than the previous model. By combining the above experiments, the present invention can better remove the bias and background noise existing in the data set, thereby having excellent performance in polyp segmentation.

As shown in fig. 4, in order to solve the above problem, an embodiment of the present invention further provides a polyp image segmentation apparatus including: a fetching module 2100, a processing module 2200 and an executing module 2300, wherein the acquiring module 2100 is used for acquiring a polyp image to be input; a processing module 2200, configured to select a reference image with a color different from that of the polyp image from a preset training set, and exchange the colors of the reference image and the polyp image; a processing module 2200, configured to extract a shallow feature and a deep feature from the color-swapped polyp image, suppress background noise of the shallow feature using a shallow attention model, and fuse the shallow feature and the deep feature; and the execution module 2300 is used for carrying out predictive response value rebalancing processing on the fused features by adopting a probability correction strategy model to obtain a polyp feature image with clear edges.

In some embodiments, the processing module comprises: a first processing sub-module, configured to convert colors of the polyp image X1 and the reference image X2 from an RGB color space to an LAB color space, and obtain color values L1 and L2 of the polyp image X1 and the reference image X2 in the LAB color space; a second processing sub-module for computing the mean and standard deviation of the channels in LAB color space of said polyp image X1 and the mean and standard deviation of the channels in LAB color space of said reference image X2; and the third processing submodule is used for obtaining the color value of the polyp image Y1 in the RGB color space and the color value of the reference image Y2 in the RGB color space by using a preset color conversion formula.

In some embodiments, the processing module comprises: the fourth processing submodule is used for upsampling the deep layer features through bilinear difference values to enable the resolution of the sampled deep layer features to be the same as that of the shallow layer features; the first acquisition submodule is used for selecting elements larger than 0 from the sampled deep features to determine the elements as the attention diagram of the shallow feature so as to obtain the deep features to be fused; and the first execution submodule is used for multiplying the deep layer features to be fused and the shallow layer features element by element to obtain the shallow layer features after background noise is suppressed.

In some embodiments, the processing module comprises: the second acquisition submodule is used for extracting the first feature, the second feature and the third feature of the last three scales when the shallow feature after the background noise is suppressed is processed by adopting a convolutional neural network; a fifth processing submodule, configured to fuse the first feature and the second feature to obtain a first fused feature; a sixth processing submodule, configured to fuse the second feature and the third feature to obtain a second fused feature; and the second execution submodule is used for splicing the first fusion feature and the second fusion feature according to a channel to obtain a final fusion feature.

In some embodiments, the execution module comprises: the third obtaining submodule is used for counting the number of pixels with the characteristic response value larger than 0 in the fused polyp characteristic image to obtain a first pixel value; the fourth obtaining submodule is used for counting the number of pixels with the characteristic response value smaller than 0 in the fused polyp characteristic image to obtain a second pixel value; and the third execution submodule is used for carrying out normalization processing on the first pixel value and the second pixel value, dividing the characteristic response value which is greater than 0 in the polyp characteristic image by the normalized first pixel value, and dividing the characteristic response value which is less than 0 in the polyp characteristic image by the normalized second pixel value to obtain a corrected polyp characteristic image.

In order to solve the above technical problem, an embodiment of the present invention further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.

As shown in fig. 5, the internal structure of the computer device is schematically illustrated. As shown in fig. 5, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize an image processing method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a method of image processing. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific contents of the obtaining module 2100, the processing module 2200, and the executing module 2300 in fig. 4, and the memory stores program codes and various data required for executing the modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data required for executing all the sub-modules in the image processing method, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.

According to the computer device provided by the embodiment of the invention, the reference feature map is obtained by extracting the features of the high-definition image set in the reference pool, and due to the diversification of the images in the high-definition image set, the reference feature map contains all possible local features, so that high-frequency texture information can be provided for each low-resolution image, the feature richness is ensured, and the memory burden is reduced. In addition, the reference feature map is searched according to the low-resolution image, and the selected reference feature map can adaptively shield or enhance various different features, so that the details of the low-resolution image are richer.

The present invention also provides a storage medium storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the image processing method according to any one of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of segmenting a polyp image, comprising:

acquiring a polyp image to be input;

2. The segmentation method according to claim 1, wherein said swapping colors of the reference image and the polyp image comprises:

3. The segmentation method according to claim 1, wherein the suppressing the background noise of the shallow feature by using the shallow attention model comprises:

4. The segmentation method according to claim 3, wherein the fusing the shallow features and the deep features comprises:

5. The segmentation method according to claim 1, wherein the performing the predictive response value re-equalization process on the fused features by using the probability correction strategy model to obtain the polyp feature image with clear edges comprises:

and performing normalization processing on the first pixel value and the second pixel value, dividing the characteristic response value which is greater than 0 in the polyp characteristic image by the normalized second pixel value, and dividing the characteristic response value which is less than 0 in the polyp characteristic image by the normalized second pixel value to obtain a corrected polyp characteristic image.

6. An apparatus for segmenting a polyp image, comprising:

the acquisition module is used for acquiring a polyp image to be input;

7. The segmentation apparatus according to claim 6, wherein the processing module comprises:

8. The segmentation apparatus as set forth in claim 6, wherein the processing module comprises:

9. The segmentation apparatus as set forth in claim 8, wherein the processing module comprises:

10. The segmentation apparatus according to claim 1, wherein the execution module includes: