CN117132774B - Multi-scale polyp segmentation method and system based on PVT - Google Patents
Multi-scale polyp segmentation method and system based on PVT Download PDFInfo
- Publication number
- CN117132774B CN117132774B CN202311097260.0A CN202311097260A CN117132774B CN 117132774 B CN117132774 B CN 117132774B CN 202311097260 A CN202311097260 A CN 202311097260A CN 117132774 B CN117132774 B CN 117132774B
- Authority
- CN
- China
- Prior art keywords
- map
- convolution
- feature
- prediction
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 61
- 208000037062 Polyps Diseases 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000000605 extraction Methods 0.000 claims abstract description 50
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 238000010586 diagram Methods 0.000 claims description 16
- 239000011800 void material Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 7
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000002156 mixing Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 4
- 230000004069 differentiation Effects 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000003672 processing method Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 4
- 238000012549 training Methods 0.000 description 6
- 238000000386 microscopy Methods 0.000 description 5
- 206010009944 Colon cancer Diseases 0.000 description 4
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000003709 image segmentation Methods 0.000 description 3
- 238000001000 micrograph Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a PVT-based multi-scale polyp segmentation method and system, and relates to the technical field of deep learning medical image semantic segmentation. The invention comprises the following steps: obtaining a colorectal mirror image to be detected, and performing image preprocessing; carrying out multi-scale feature extraction on the preprocessed image by using a PVTv2 backbone network; step-by-step fusion is carried out on the PVT generated original feature images with different scales by using a parallel Sobel edge decoder, so as to obtain a global prediction image; performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module; gradually guiding and gradually generating a multi-stage prediction graph by using the global prediction graph; and comparing the global prediction graph with the multi-stage prediction graph with the truth graph to obtain prediction loss, wherein the final stage of prediction graph is the final polyp segmentation prediction graph. The method can accurately identify and segment polyps of colorectal mirror images, and provides effective help for doctors to diagnose correctly.
Description
Technical Field
The invention belongs to the technical field of deep learning medical image semantic segmentation, and particularly relates to a PVT-based multi-scale polyp segmentation method and system.
Background
Colorectal cancer is a common malignancy, and early detection and treatment of colorectal cancer is of great importance in improving survival of patients. Since colorectal cancer does not have typical symptoms in the early stages of development, the importance of screening colorectal cancer is becoming important, and one of the screening means is colorectal microscopy. In colorectal microscopy, polyps are characterized by a similar color, a variable shape, and a different size to the normal tissue surrounding it, even though small polyps may stick together, and the boundaries of the polyp are often ambiguous. This makes polyp segmentation for colonoscopic imaging results a number of challenges. Traditional medical image segmentation methods, such as a threshold segmentation method and a region growing method, often require manual labeling assisted by a professional doctor, and the segmentation result is time-consuming and labor-consuming due to the influence of illumination conditions, doctor experience, subjective factors and the like, and has larger errors and instability. How to achieve automatic segmentation of the colonoscope image, and thus obtain polyp segmentation results with clearer and more efficient boundaries, becomes one of the hot spots of medical image segmentation.
In recent years, deep learning techniques have been widely applied to medical image segmentation. In colorectal microscopy, polyp segmentation methods based on convolutional neural networks have been widely used. Convolutional neural networks for polyp segmentation have two main typical architectures: U-Net based U-shaped structures, and PraNet architecture. For example, U-Net uses the encoder-decoder architecture to combine low-level and high-level features via a jump connection to effectively preserve spatial locality information, but is susceptible to noise and occlusion. PraNet first uses a Parallel Partial Decoder (PPD) to aggregate elevation features and generate a global map to roughly locate polyps, then uses a Reverse Attention (RA) module to progressively refine regions and boundaries; however, due to the limitations of convolutional neural networks themselves, there are certain problems with the segmentation accuracy and robustness of the model. Therefore, there is a need for improvements to existing models that improve the polyp segmentation performance of imaging results during colorectal microscopy.
Recently, the successful use of transformers in the field of Natural Language Processing (NLP) inspired computer vision researchers, which led to certain applications and developments of transformers in computer vision research tasks. Since the transfomer-based network is good at capturing long-range dependencies of image objects through global self-attention, application of transfomers to polyp segmentation tasks can be considered, namely: in a polyp segmentation task, a transducer is used for learning the dependency relationship between different areas in a colorectal microscopy image, and the segmentation performance and the robustness of a model are improved by utilizing the information; in addition, the training process of the model can be accelerated by using an advanced optimization algorithm, and the convergence rate of the model can be improved. By applying the techniques, the segmentation effect on polyps in colorectal microscopy images can be further improved, and more accurate and reliable diagnosis results can be provided for clinicians.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a PVT-based multi-scale polyp segmentation method and system, which are used for effectively solving the problem that a polyp region cannot be accurately identified in the prior art, further improving the accuracy and semantic integrity of a polyp segmentation boundary in a colorectal microscopy image and realizing accurate, rapid and automatic polyp segmentation.
In order to achieve the above object, the present invention provides the following solutions:
a PVT-based multi-scale polyp segmentation method comprising the steps of:
s1, acquiring a colorectal mirror image to be detected, and preprocessing the colorectal mirror image;
s2, performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
s3, using a parallel Sobel edge decoder to fuse the original feature images step by step to obtain a global prediction image;
s4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module;
s5, gradually guiding the original feature map after multi-receptive field feature extraction by using the global predictive map, and gradually generating a multi-stage predictive map;
s6, comparing the global prediction graph with the multi-stage prediction graph and the truth graph, and calculating loss, wherein the obtained final stage prediction graph is the final polyp segmentation prediction graph.
Preferably, in the step S1, the method for preprocessing the colorectal mirror image includes:
and enhancing the colorectal mirror image data by adopting the technologies of random rotation, vertical overturn, horizontal overturn and normalization, and finally uniformly clipping the colorectal mirror image to 352X 352, and scaling the colorectal mirror image by using a {0.75,1,1.25} multiscale strategy.
Preferably, in the step S2, the method for performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network includes:
judging whether the preprocessed colorectal mirror image input into the PVTv2 backbone network is a 3-channel image or not, if the preprocessed colorectal mirror image is the 3-channel image, directly sending the preprocessed colorectal mirror image into the network to perform feature extraction, and if the preprocessed colorectal mirror image is not the 3-channel image, using 1X 1 convolution once to adjust the channel number of the image to 3;
four-stage feature extraction was performed using a pre-trained model of PVTv 2-B2.
Preferably, in the step S3, the method for obtaining a global prediction graph by using a parallel Sobel edge decoder to perform progressive fusion on the original feature graph includes:
s31: compressing the feature map channel using a 1 x 1 convolution in the first branch;
s32: in the second branch, firstly, 1X 1 convolution is used for compressing the characteristic map channel, and then, 1X 3, 3X 1 asymmetric convolution and 3X 3 convolution with the cavity rate of 3 are used for extracting the characteristic of the characteristic map at one time respectively;
s33: in the third branch, firstly, a 1×1 convolution is used for compressing a feature map channel, and then, an asymmetric convolution of 1×5 and 5×1 and a 3×3 convolution with a void ratio of 5 are used for extracting features of the feature map at one time respectively;
s34: in the fourth branch, firstly, a 1×1 convolution is used for compressing the characteristic map channel, and then, an asymmetric convolution of 1×7 and 7×1 and a 3×3 convolution with a void ratio of 7 are used for extracting the characteristic map at one time respectively;
s35: splicing the compressed characteristic map of the first branch with the characteristic maps of the second branch, the third branch and the fourth branch after characteristic extraction in the channel dimension, and then compressing the characteristic map channel by using convolution of 1 multiplied by 1;
s36: adding the compressed spliced feature map and the original feature map compressed by the channel by convolution pixel by pixel, and then inputting the added feature map into Sobel operation after passing through a ReLU nonlinear activation function;
s37: and adding the feature images subjected to gradient sharpening by the Sobel operator pixel by pixel, and generating an initial polyp segmentation global prediction image by carrying out 1X 1 convolution and using bilinear interpolation up-sampling operation.
Preferably, in the step S4, the method for extracting the features of the multiple receptive fields from the original feature map by using the multi-scale parallel hole convolution attention module includes:
s41: to PVT encoderThe four layers of original feature images are subjected to channel compression through a 1X 1 convolution to obtain the number of channels with the original feature imagesA multiple multichannel profile;
s42: the method comprises the steps of uniformly grouping channels of a characteristic diagram after channel compression, and sending the characteristic diagram into four branches for processing, wherein the processing method comprises the following steps: the 3 multiplied by 3 convolution with the void ratio of 1, 3, 5 and 7 is used for extracting the characteristics of the corresponding branches, and then the processing results of the four branches are spliced;
s43: carrying out 1X 1 convolution on the characteristic graphs spliced by the channels, and then sequentially carrying out nonlinear operation of batch normalization BN and ReLU activation functions to obtain a processed characteristic graph;
s44: and sending the processed feature map to a CBAM module, and carrying out further attention weighting on the feature map to obtain a feature map with more differentiation.
Preferably, in the step S5, the method for gradually guiding the original feature map after extracting the multi-receptive field features by using the global prediction map and gradually generating the multi-stage prediction map includes:
s51: the global predictive graph is subjected to space downsampling, so that the resolution is consistent with the resolution of the feature graph in the fourth stage of PVT; then sending into the RA module to perform anti-attention operation so as to generate an attention map; then, performing element-by-element multiplication operation with the feature map of the PVT fourth stage, and further performing pixel-by-pixel addition with the prediction map of the previous stage after feature dimension reduction of three 3×3 convolutions to generate a prediction map of the stage;
s52: the prediction graph of the present stage is sent to the next stage, and the same operation as that of the S51 stage is performed to guide the generation of the final stage feature graph.
Preferably, in the step S6, the method for comparing the global prediction map and the multi-stage prediction map with a truth map, and calculating the loss, where the obtained final stage prediction map is the final polyp segmentation prediction map includes:
the global predictive map and the multi-stage predictive map are subjected to spatial up-sampling operation of bilinear interpolation, the sizes of all predictive maps are adjusted to be the same as the sizes of truth maps corresponding to input images, and the mixing loss of weighted BCE and weighted IOU is calculated;
the weighted BCE loss calculation method is defined as:
wherein G represents a truth chart; p represents a prediction graph; (x, y) represents any pixel position in the image, the corresponding weighting coefficient ω (x, y) is used to represent the pixel (x, y) importance,will +.>Set to 5;
the weighted IOU penalty calculation method is defined as:
the method for obtaining the mixed loss of the prediction graph relative to the truth graph by combining the loss of the weighted BCE and the weighted IOU comprises the following steps:
the invention also provides a PVT-based multi-scale polyp segmentation system, which comprises: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;
the pretreatment module is used for acquiring a colorectal mirror image to be detected and carrying out pretreatment on the colorectal mirror image;
the first feature extraction module is used for carrying out multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
the fusion module is used for carrying out step-by-step fusion on the original feature images by using a parallel Sobel edge decoder to obtain a global prediction image;
the second feature extraction module is used for extracting features of multiple receptive fields from the original feature map by using a multi-scale parallel cavity convolution attention module;
the guiding module is used for gradually guiding the original feature map after the multi-receptive field feature extraction by using the global prediction map, and gradually generating a multi-stage prediction map;
the prediction module is used for comparing the global prediction graph and the multi-stage prediction graph with a truth value graph, calculating loss, and obtaining a final stage of prediction graph which is a final polyp segmentation prediction graph.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention adopts PVTv2 as a backbone network to replace a ResNet backbone network in PraNet, so that the network has better global feature extraction capability.
2. The invention provides a parallel Sobel edge decoder which fuses feature images under different scales, so that the segmentation effect of a model on polyps with different sizes is improved.
3. The invention also provides a multi-scale parallel cavity convolution attention module, which is used for extracting the characteristics of the characteristic images with different scales under the multi-receptive field, and re-distributing weights to the characteristic images by using CBAM to extract the region of interest.
4. The invention also uses a new loss function to train the model by introducing a mode of combining weighted BCE loss and weighted IOU loss, thereby eliminating the influence caused by unbalanced distribution of positive and negative samples and further improving the segmentation precision and robustness of the model.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an implementation of a PVT-based multi-scale polyp segmentation method provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a PraNet network model structure;
fig. 3 is a schematic diagram of a network model structure of a PVT-based multi-scale polyp segmentation method constructed in an embodiment of the present invention;
FIG. 4 is a block diagram of a parallel Sobel edge decoder according to the present invention;
FIG. 5 is a schematic diagram of the RFB operation in the parallel Sobel edge decoder module of the present invention;
FIG. 6 is a multi-scale parallel hole convolution attention module of the present invention;
fig. 7 is an experimental visual contrast diagram of a PVT-based multi-scale polyp segmentation method in an embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
Fig. 1 is a schematic flow chart of an implementation of a PVT-based multi-scale polyp segmentation method according to an embodiment of the present invention. As shown in fig. 1, the PVT-based multi-scale polyp segmentation method comprises the following steps:
and S1, obtaining colorectal mirror images to be detected in Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS and CVC-T, and preprocessing the images.
In practice, the preprocessing may include the processing modes of random rotation, vertical overturn, horizontal overturn, normalization, and the like of the colorectal mirror image to be detected, and the colorectal mirror image to be detected is processed into an image meeting the detection requirements. The image is then uniformly sized to 352 rows by 352 columns and scaled using a multi-scale strategy of {0.75,1,1.25}, which preprocessing techniques can provide more reliable input data for the neural network model so that the network can handle polyps of different sizes, thereby improving the segmentation effect of polyps in colorectal microscopy.
And S2, performing multi-scale feature extraction on the preprocessed image by using a PVTv2 backbone network.
Referring to fig. 2, which shows a PraNet network model structure, praNet is a parallel reverse attention network that can accurately segment polyps from colonoscopic images. The network firstly utilizes a Parallel Partial Decoder (PPD) to aggregate advanced features, so as to generate an initial global predictive picture which guides the following steps; the relationship between the target area and the boundary is then established by using the reverse attention module, and the complementarity between the edge and the area is fully exerted. But due to the limited acceptance domain of the PraNet backbone network, only local information can be captured, while spatial context and global information are ignored. The invention improves the method, and fig. 3 is a schematic diagram of a network model structure of a multi-scale polyp segmentation method based on PVT constructed in the embodiment of the invention.
In the embodiment of the invention, the PVTv2 backbone network is used for multi-scale feature extraction. PVTv2 is one of the most advanced pre-training models currently, which can provide more accurate and robust feature extraction capabilities with respect to input images due to its architecture of Pyramid Vision Transformer, and which can perform well in a variety of visual tasks including image classification, object detection, and segmentation. Polyps of different resolutions in colorectal mirror images can be better handled by using PVTv2 backbone networks.
PVTv2 marks images with overlapping patch embedding in order to model local continuity information. We expand the patch window, overlap adjacent windows by half, and fill the feature map with zeros to preserve resolution. To ensure PVTv2 has the same linear complexity as CNN, we use zero-filled convolution to achieve overlapping patch embedding. The specific operation of multi-scale feature extraction using PVTv2 backbone network is as follows:
s201, first check whether the network-inputted preprocessed colorectal mirror image is a 3-channel image.
S202, if the image is a 3-channel image, the image is directly sent into a network for feature extraction; if the image is not a 3-channel image, channel adjustment is performed by one 1×1 convolution, so that the number of image channels after adjustment is changed to 3.
S203, the model uses a pre-training model of PVTv2-B2 to conduct four-stage feature extraction. In the four stages, the layers of PVTv2 basic units are respectively 3, 6 and 3, and the generated four-layer original characteristic diagram X i The dimensions (i.e., number of channels x number of rows x number of columns) are: 64× (H/4) × (W/4), 128× (H/8) × (W/8), 320× (H/16) × (W/16), 512× (H/32) × (W/32).
And step S3, performing step-by-step fusion on the original feature images with different scales generated by PVT by using a parallel Sobel edge decoder to obtain a global prediction image. The parallel Sobel edge decoder module of the present invention as shown in fig. 4 operates as follows:
as shown in FIG. 4, X in the figure 1 、X 2 、X 3 、X 4 Is the original feature map of four levels of the PVT stage. We perform RFB operations on it in parallel, the specific operation of which is shown in fig. 5, in X 1 For the purposes of illustration, X 1 A convolution operation of four branches is required. The operation is as follows:
s301, the number of channels of the feature map is compressed by using a 1×1 convolution in the first branch, and for the sake of calculation, we compress the number of channels of all features to 32.
S302, in the second branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×3 and 3×1 and a convolution of 3×3 with a void ratio of 3, respectively.
S303, in the third branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×5 and 5×1 and a convolution of 3×3 with a void ratio of 5, respectively.
S304, in the fourth branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×7 and 7×1 and a convolution of 3×3 with a void ratio of 7, respectively, at a time.
S305, in the fourth branch, the number of channels of the feature map is compressed by first using a convolution of 1×1, and then feature extraction is performed on the feature map by using an asymmetric convolution of 1×7 and 7×1 and a convolution of 3×3 with a void ratio of 7, respectively.
S306, splicing the feature graphs of the four branches in the channel dimension, and then compressing the number of channels of the feature graph by using 1X 1 convolution.
S307, the feature map and the original feature map compressed by the channel with the convolution of 1 multiplied by 1 are added pixel by pixel, and then are subjected to nonlinear operation of a ReLU activation function, and then Sobel operation is performed.
X 2 、X 3 、X 4 The RFB operation of the feature map is also as above.
And performing gradient sharpening on the four processed feature maps by using a Sobel operator, adding pixel by pixel, performing 1X 1 convolution, and performing spatial up-sampling operation based on bilinear interpolation to generate an initial polyp segmentation global prediction map.
And S4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module. The multi-scale parallel cavity convolution attention module of the invention is shown in fig. 6, and specifically operates as follows:
s401, respectively sending the original feature images of four layers of the PVT encoder into a multi-scale parallel cavity convolution attention module for further special purposeThe sign extraction is carried out by compressing a 1X 1 convolution channel to make the number of the processed characteristic image channels be the number of the original characteristic image channels
S402, uniformly dividing the processed feature map into four groups according to channels, respectively sending the four groups of feature maps into four branches for processing, namely respectively carrying out feature extraction by using 3X 3 convolution with void ratios of 1, 3, 5 and 7, and then carrying out channel splicing on processing results.
S403, carrying out 1×1 convolution on the characteristic diagram of channel splicing, and then sequentially carrying out nonlinear activation of batch normalization BN and ReLU functions.
S404, inputting the feature map obtained by the processing in S403 into a CBAM module, and further enhancing the attention of the feature map. Wherein the CBAM module mainly comprises two parts: channel attention and spatial attention. The channel attention module carries out importance allocation among channels with global significance on the feature map, so that redundant calculation is reduced; the spatial attention module then retains more effective characteristic information of the spatial locality by focusing on different degrees of spatial locality information. The feature map through the CBAM module is further optimized to obtain a more differentiated feature map. Finally, feature maps of four dimensions (i.e., the number of channels×the number of rows×the number of columns) of 64× (H/4), 128× (H/8) × (W/8), 320× (H/16) × (W/16), 512× (H/32) × (W/32) are generated.
Step S5, gradually guiding and gradually generating a multi-stage prediction graph by using the global prediction graph. The operation is the same as PraNet, and the specific operation is as follows:
s501, performing spatial downsampling on the global predictive map to enable the resolution of the global predictive map to be consistent with that of the feature map in the PVT fourth stage, then sending the global predictive map into an RA module to perform anti-attention operation to generate an attention map, then performing element-by-element multiplication operation on the global predictive map and the feature map in the PVT fourth stage, performing feature dimension reduction by using three 3X 3 convolutions, and further performing pixel-by-pixel addition on the global predictive map and the predictive map in the previous stage to generate the predictive map in the current stage.
S502, the prediction graph of the stage is sent to the next stage, and the same operation as that of the stage S501 is carried out to guide the generation of the final stage of feature graph.
The four feature maps generated in the S4 are guided step by step and the prediction map is generated step by using the global prediction map generated by the parallel Sobel edge decoder, and the specific operation is as follows:
s501: performing guide prediction by using the global prediction graph generated by S3 and the feature graph with the size of 512× (H/32) × (W/32) generated by S4, firstly performing space downsampling operation on the global prediction graph generated by S3, and reducing the size of the global prediction graph to the size of the feature graph of S4; then, the predicted map with a size of 1× (H/32) × (W/32) is generated by fusing with the S4 feature map. The guided prediction method can combine global and local information to more accurately segment polyps of colorectal mirror images.
S502: performing guided prediction on the feature map with the size of 320× (H/16) × (W/16) generated by S4 by using the prediction map generated by S501, and firstly expanding the size of the prediction map generated by S501 to the size of the feature map of S4 through a spatial upsampling operation; then fusing the S4 feature map to generate a prediction map with the size of 1× (H/16) × (W/16); then, the next stage is guided and predicted in order, and two polyp segmentation prediction maps with sizes of 1× (H/8) × (W/8) and 1× (H/4) × (W/4) are generated in order.
And S6, comparing the generated global prediction graph and the multi-stage prediction graph with a truth value graph, calculating the prediction loss, and obtaining the final polyp segmentation prediction graph by the final stage of prediction graph. The specific operation is as follows:
s601, performing spatial up-sampling operation of bilinear interpolation on the five prediction graphs, adjusting the size of the prediction graphs to be the same as the size of a truth graph corresponding to the input image, and calculating a mixing loss L composed of weighted BCE and weighted IOU seg 。
The calculation method of the weighted BCE loss is as follows:
g represents a truth diagram, P represents a predictive diagram, and (x, y) represents an arbitrary pixel position in the image.
The weight coefficient ω (x, y) represents the importance of the pixel (x, y) by the following calculation method:
in the concrete calculation, we willSet to 5.
The weighted IOU loss calculation method is as follows:
the mixing loss function of the final prediction graph P relative to the truth graph G is:
wherein the method comprises the steps ofAnd->Is a weighted IOU penalty and a weighted BCE penalty.
To illustrate the effectiveness of the method, the method is trained using the Kvasir-SEG, CVC-clinical db data sets as training sets, then tested using the Kvasir-SEG, CVC-clinical db, CVC-ColonDB, ETIS, and CVC-T data sets, and the test results are compared with the polyp segmentation algorithm mainstream in the prior art. In the specific experimental process, based on a PyTorch 1.8 deep learning framework, training by using 1 NVIDIA RTX 2080Ti video card, wherein the video memory is 11G; the input image size was set to 352 x 352, the initial learning rate 1e-4, using an AdamW optimizer, the batch size was set to 6, and the total training round number was 100epochs. The test results at the Kvasir-SEG, CVC-clinic db dataset are shown in table 1:
TABLE 1 test results of the invention and other 7 methods for polyp segmentation in Kvasir-SEG, CVC-ClinicDB datasets
TABLE 1
The experimental results in CVC-ClinicDB, CVC-ColonDB, ETIS data sets are shown in Table 2:
TABLE 2 test results of the invention and other 7 methods for polyp segmentation in CVC-ClinicDB, CVC-ColonDB, ETIS dataset
TABLE 2
Among the 7 evaluation indexes, the first 2 are the evaluation indexes commonly used in semantic segmentation tasks, and the closer the values of the first 6 indexes are to 1, the better the segmentation result is; the 7 th index is not lower than 0, and the value is better as the value is close to 0.
Fig. 7 is a visual comparison chart of experimental results of the method and other methods according to the present invention, and from the results, the polyp segmentation method according to the present invention can obtain segmentation results with more accurate boundaries and more complete semantic structures.
Example two
The invention also provides a PVT-based multi-scale polyp segmentation system, which comprises: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;
the pretreatment module is used for acquiring a colorectal mirror image to be detected and carrying out pretreatment on the colorectal mirror image;
the first feature extraction module is used for carrying out multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
the fusion module is used for carrying out step-by-step fusion on the original feature images by using a parallel Sobel edge decoder to obtain a global prediction image;
the second feature extraction module is used for extracting features of multiple receptive fields from the original feature map by using a multi-scale parallel cavity convolution attention module;
the guiding module is used for gradually guiding the original feature map after the multi-receptive field feature extraction by using the global prediction map, and gradually generating a multi-stage prediction map;
the prediction module is used for comparing the global prediction graph and the multi-stage prediction graph with a truth value graph, calculating loss, and obtaining a final stage of prediction graph which is a final polyp segmentation prediction graph.
The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present invention pertains are made without departing from the spirit of the present invention, and all modifications and improvements fall within the scope of the present invention as defined in the appended claims.
Claims (5)
1. The PVT-based multi-scale polyp segmentation method is characterized by comprising the following steps of:
s1, acquiring a colorectal mirror image to be detected, and preprocessing the colorectal mirror image;
s2, performing multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
s3, using a parallel Sobel edge decoder to fuse the original feature images step by step to obtain a global prediction image;
in the step S3, the method for obtaining a global prediction graph by using a parallel Sobel edge decoder to perform step-by-step fusion on the original feature graph includes:
s31: compressing the feature map channel using a 1 x 1 convolution in the first branch;
s32: in the second branch, firstly, 1X 1 convolution is used for compressing the characteristic map channel, and then, 1X 3, 3X 1 asymmetric convolution and 3X 3 convolution with the cavity rate of 3 are used for extracting the characteristic of the characteristic map at one time respectively;
s33: in the third branch, firstly, a 1×1 convolution is used for compressing a feature map channel, and then, an asymmetric convolution of 1×5 and 5×1 and a 3×3 convolution with a void ratio of 5 are used for extracting features of the feature map at one time respectively;
s34: in the fourth branch, firstly, a 1×1 convolution is used for compressing the characteristic map channel, and then, an asymmetric convolution of 1×7 and 7×1 and a 3×3 convolution with a void ratio of 7 are used for extracting the characteristic map at one time respectively;
s35: splicing the compressed characteristic map of the first branch with the characteristic maps of the second branch, the third branch and the fourth branch after characteristic extraction in the channel dimension, and then compressing the characteristic map channel by using convolution of 1 multiplied by 1;
s36: adding the compressed spliced feature map and the original feature map compressed by the channel by convolution pixel by pixel, and then inputting the added feature map into Sobel operation after passing through a ReLU nonlinear activation function;
s37: adding the feature images subjected to gradient sharpening by Sobel operator pixel by pixel, and generating an initial polyp segmentation global prediction image through a 1X 1 convolution and bilinear interpolation up-sampling operation;
s4, performing multi-receptive field feature extraction on the original feature map by using a multi-scale parallel cavity convolution attention module;
in the S4, the method for extracting the features of the multiple receptive fields from the original feature map by using the multi-scale parallel hole convolution attention module includes:
s41: carrying out channel compression on the original feature images of four layers of the PVT encoder through one 1X 1 convolution to obtain the number of channels with the original feature imagesA multiple multichannel profile;
s42: the method comprises the steps of uniformly grouping channels of a characteristic diagram after channel compression, and sending the characteristic diagram into four branches for processing, wherein the processing method comprises the following steps: the 3 multiplied by 3 convolution with the void ratio of 1, 3, 5 and 7 is used for extracting the characteristics of the corresponding branches, and then the processing results of the four branches are spliced;
s43: carrying out 1X 1 convolution on the characteristic graphs spliced by the channels, and then sequentially carrying out nonlinear operation of batch normalization BN and ReLU activation functions to obtain a processed characteristic graph;
s44: the processed feature images are sent to a CBAM module, and further attention weighting is carried out on the feature images, so that feature images with more differentiation are obtained;
s5, gradually guiding the original feature map after multi-receptive field feature extraction by using the global predictive map, and gradually generating a multi-stage predictive map;
in the step S5, the method for gradually guiding the original feature map after extracting the multi-receptive field features by using the global prediction map and gradually generating the multi-stage prediction map includes:
s51: the global predictive graph is subjected to space downsampling, so that the resolution is consistent with the resolution of the feature graph in the fourth stage of PVT; then sending into the RA module to perform anti-attention operation so as to generate an attention map; then, performing element-by-element multiplication operation with the feature map of the PVT fourth stage, and further performing pixel-by-pixel addition with the prediction map of the previous stage after feature dimension reduction of three 3×3 convolutions to generate a prediction map of the stage;
s52: sending the prediction graph of the stage to the next stage, and executing the same operation as the S51 stage to guide the generation of the final stage of feature graph;
s6, comparing the global prediction graph with the multi-stage prediction graph and the truth graph, and calculating loss, wherein the obtained final stage prediction graph is the final polyp segmentation prediction graph.
2. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S1, the method of preprocessing the colorectal scopic image comprises:
and enhancing the colorectal mirror image data by adopting the technologies of random rotation, vertical overturn, horizontal overturn and normalization, and finally uniformly clipping the colorectal mirror image to 352X 352, and scaling the colorectal mirror image by using a {0.75,1,1.25} multiscale strategy.
3. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein the step S2 of performing multi-scale feature extraction on the pre-processed colorectal mirror image using PVTv2 backbone network comprises:
judging whether the preprocessed colorectal mirror image input into the PVTv2 backbone network is a 3-channel image or not, if the preprocessed colorectal mirror image is the 3-channel image, directly sending the preprocessed colorectal mirror image into the network to perform feature extraction, and if the preprocessed colorectal mirror image is not the 3-channel image, using 1X 1 convolution once to adjust the channel number of the image to 3;
four-stage feature extraction was performed using a pre-trained model of PVTv 2-B2.
4. The PVT-based multi-scale polyp segmentation method according to claim 1, wherein in S6, the global prediction map and the multi-stage prediction map are compared with a truth map, a loss is calculated, and the obtained final stage prediction map is a final polyp segmentation prediction map, comprising:
the global predictive map and the multi-stage predictive map are subjected to spatial up-sampling operation of bilinear interpolation, the sizes of all predictive maps are adjusted to be the same as the sizes of truth maps corresponding to input images, and the mixing loss of weighted BCE and weighted IOU is calculated;
the weighted BCE loss calculation method is defined as:
wherein G represents a truth chart; p represents a prediction graph; (x, y) represents any pixel position in the image, the corresponding weighting coefficient ω (x, y) is used to represent the pixel (x, y) importance,will +.>Set to 5;
the weighted IOU penalty calculation method is defined as:
the method for obtaining the mixed loss of the prediction graph relative to the truth graph by combining the loss of the weighted BCE and the weighted IOU comprises the following steps:
5. a PVT-based multi-scale polyp segmentation system, comprising: the device comprises a preprocessing module, a first feature extraction module, a fusion module, a second feature extraction module, a guiding module and a prediction module;
the pretreatment module is used for acquiring a colorectal mirror image to be detected and carrying out pretreatment on the colorectal mirror image;
the first feature extraction module is used for carrying out multi-scale feature extraction on the preprocessed colorectal mirror image by using a PVTv2 backbone network to obtain original feature images with different scales;
the fusion module is used for carrying out step-by-step fusion on the original feature images by using a parallel Sobel edge decoder to obtain a global prediction image;
the original feature images are fused step by using a parallel Sobel edge decoder, and the process of obtaining a global prediction image comprises the following steps:
compressing the feature map channel using a 1 x 1 convolution in the first branch;
in the second branch, firstly, 1X 1 convolution is used for compressing the characteristic map channel, and then, 1X 3, 3X 1 asymmetric convolution and 3X 3 convolution with the cavity rate of 3 are used for extracting the characteristic of the characteristic map at one time respectively;
in the third branch, firstly, a 1×1 convolution is used for compressing a feature map channel, and then, an asymmetric convolution of 1×5 and 5×1 and a 3×3 convolution with a void ratio of 5 are used for extracting features of the feature map at one time respectively;
in the fourth branch, firstly, a 1×1 convolution is used for compressing the characteristic map channel, and then, an asymmetric convolution of 1×7 and 7×1 and a 3×3 convolution with a void ratio of 7 are used for extracting the characteristic map at one time respectively;
splicing the compressed characteristic map of the first branch with the characteristic maps of the second branch, the third branch and the fourth branch after characteristic extraction in the channel dimension, and then compressing the characteristic map channel by using convolution of 1 multiplied by 1;
adding the compressed spliced feature map and the original feature map compressed by the channel by convolution pixel by pixel, and then inputting the added feature map into Sobel operation after passing through a ReLU nonlinear activation function;
adding the feature images subjected to gradient sharpening by Sobel operator pixel by pixel, and generating an initial polyp segmentation global prediction image through a 1X 1 convolution and bilinear interpolation up-sampling operation;
the second feature extraction module is used for extracting features of multiple receptive fields from the original feature map by using a multi-scale parallel cavity convolution attention module;
the process of extracting the multi-receptive field features from the original feature map by using the multi-scale parallel hole convolution attention module comprises the following steps:
carrying out channel compression on the original feature images of four layers of the PVT encoder through one 1X 1 convolution to obtain the number of channels with the original feature imagesA multiple multichannel profile;
the method comprises the steps of uniformly grouping channels of a characteristic diagram after channel compression, and sending the characteristic diagram into four branches for processing, wherein the processing method comprises the following steps: the 3 multiplied by 3 convolution with the void ratio of 1, 3, 5 and 7 is used for extracting the characteristics of the corresponding branches, and then the processing results of the four branches are spliced;
carrying out 1X 1 convolution on the characteristic graphs spliced by the channels, and then sequentially carrying out nonlinear operation of batch normalization BN and ReLU activation functions to obtain a processed characteristic graph;
the processed feature images are sent to a CBAM module, and further attention weighting is carried out on the feature images, so that feature images with more differentiation are obtained;
the guiding module is used for gradually guiding the original feature map after the multi-receptive field feature extraction by using the global prediction map, and gradually generating a multi-stage prediction map;
the process of gradually guiding the original feature map after multi-receptive field feature extraction by using the global predictive map and gradually generating a multi-stage predictive map comprises the following steps:
the global predictive graph is subjected to space downsampling, so that the resolution is consistent with the resolution of the feature graph in the fourth stage of PVT; then sending into the RA module to perform anti-attention operation so as to generate an attention map; then, performing element-by-element multiplication operation with the feature map of the PVT fourth stage, and further performing pixel-by-pixel addition with the prediction map of the previous stage after feature dimension reduction of three 3×3 convolutions to generate a prediction map of the stage;
sending the prediction graph of the stage into the next stage, and executing the same operation as the previous stage to guide the generation of the final stage of feature graph;
the prediction module is used for comparing the global prediction graph and the multi-stage prediction graph with a truth value graph, calculating loss, and obtaining a final stage of prediction graph which is a final polyp segmentation prediction graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311097260.0A CN117132774B (en) | 2023-08-29 | 2023-08-29 | Multi-scale polyp segmentation method and system based on PVT |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311097260.0A CN117132774B (en) | 2023-08-29 | 2023-08-29 | Multi-scale polyp segmentation method and system based on PVT |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117132774A CN117132774A (en) | 2023-11-28 |
CN117132774B true CN117132774B (en) | 2024-03-01 |
Family
ID=88859454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311097260.0A Active CN117132774B (en) | 2023-08-29 | 2023-08-29 | Multi-scale polyp segmentation method and system based on PVT |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117132774B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117338556B (en) * | 2023-12-06 | 2024-03-29 | 四川大学华西医院 | Gastrointestinal endoscopy pressing system |
CN117392157B (en) * | 2023-12-13 | 2024-03-19 | 长春理工大学 | Edge-aware protective cultivation straw coverage rate detection method |
CN117853432B (en) * | 2023-12-26 | 2024-08-16 | 北京长木谷医疗科技股份有限公司 | Hybrid model-based osteoarthropathy identification method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114820635A (en) * | 2022-04-21 | 2022-07-29 | 重庆理工大学 | Polyp segmentation method combining attention U-shaped network and multi-scale feature fusion |
CN115331024A (en) * | 2022-08-22 | 2022-11-11 | 浙江工业大学 | Intestinal polyp detection method based on deep supervision and gradual learning |
CN115601330A (en) * | 2022-10-20 | 2023-01-13 | 湖北工业大学(Cn) | Colonic polyp segmentation method based on multi-scale space reverse attention mechanism |
CN115841495A (en) * | 2022-12-19 | 2023-03-24 | 安徽大学 | Polyp segmentation method and system based on double-boundary guiding attention exploration |
CN115965596A (en) * | 2022-12-26 | 2023-04-14 | 深圳英美达医疗技术有限公司 | Blood vessel identification method and device, electronic equipment and readable storage medium |
CN116630245A (en) * | 2023-05-05 | 2023-08-22 | 浙江工业大学 | Polyp segmentation method based on saliency map guidance and uncertainty semantic enhancement |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220138548A1 (en) * | 2022-01-18 | 2022-05-05 | Intel Corporation | Analog hardware implementation of activation functions |
-
2023
- 2023-08-29 CN CN202311097260.0A patent/CN117132774B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114820635A (en) * | 2022-04-21 | 2022-07-29 | 重庆理工大学 | Polyp segmentation method combining attention U-shaped network and multi-scale feature fusion |
CN115331024A (en) * | 2022-08-22 | 2022-11-11 | 浙江工业大学 | Intestinal polyp detection method based on deep supervision and gradual learning |
CN115601330A (en) * | 2022-10-20 | 2023-01-13 | 湖北工业大学(Cn) | Colonic polyp segmentation method based on multi-scale space reverse attention mechanism |
CN115841495A (en) * | 2022-12-19 | 2023-03-24 | 安徽大学 | Polyp segmentation method and system based on double-boundary guiding attention exploration |
CN115965596A (en) * | 2022-12-26 | 2023-04-14 | 深圳英美达医疗技术有限公司 | Blood vessel identification method and device, electronic equipment and readable storage medium |
CN116630245A (en) * | 2023-05-05 | 2023-08-22 | 浙江工业大学 | Polyp segmentation method based on saliency map guidance and uncertainty semantic enhancement |
Non-Patent Citations (4)
Title |
---|
Object localization and edge refinement network for salient object detection;Zhaojian Yao等;《Expert Systems with Applications》;第213卷;1-18 * |
Polyp2Seg: Improved Polyp Segmentation with Vision Transformer;Vittorino Mandujano-Cornejo等;《MIUA 2022: Medical Image Understanding and Analysis》;第3413卷;519-534 * |
基于卷积神经网络的CT图像肺结节检测研究;傅寰宇;《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》(第06期);E072-113 * |
基于视网膜影像的医学辅助诊断方法研究;雷莹;《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》(第11期);E073-8 * |
Also Published As
Publication number | Publication date |
---|---|
CN117132774A (en) | 2023-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111784671B (en) | Pathological image focus region detection method based on multi-scale deep learning | |
CN117132774B (en) | Multi-scale polyp segmentation method and system based on PVT | |
CN109191476A (en) | The automatic segmentation of Biomedical Image based on U-net network structure | |
CN108268870A (en) | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study | |
CN113674253A (en) | Rectal cancer CT image automatic segmentation method based on U-transducer | |
CN115409733A (en) | Low-dose CT image noise reduction method based on image enhancement and diffusion model | |
CN115018824A (en) | Colonoscope polyp image segmentation method based on CNN and Transformer fusion | |
CN112712528B (en) | Intestinal tract focus segmentation method combining multi-scale U-shaped residual error encoder and integral reverse attention mechanism | |
CN111951288A (en) | Skin cancer lesion segmentation method based on deep learning | |
CN116309648A (en) | Medical image segmentation model construction method based on multi-attention fusion | |
CN113610859B (en) | Automatic thyroid nodule segmentation method based on ultrasonic image | |
CN115471470A (en) | Esophageal cancer CT image segmentation method | |
CN116757986A (en) | Infrared and visible light image fusion method and device | |
CN111916206A (en) | CT image auxiliary diagnosis system based on cascade connection | |
CN116433654A (en) | Improved U-Net network spine integral segmentation method | |
CN117333470B (en) | Method and device for partitioning hardened exudates of diabetic retinopathy | |
CN117522891A (en) | 3D medical image segmentation system and method | |
CN117726814A (en) | Retinal vessel segmentation method based on cross attention and double branch pooling fusion | |
CN117197470A (en) | Polyp segmentation method, device and medium based on colonoscope image | |
CN116188396A (en) | Image segmentation method, device, equipment and medium | |
Kim et al. | Tackling Structural Hallucination in Image Translation with Local Diffusion | |
CN118229712B (en) | Liver tumor image segmentation system based on enhanced multidimensional feature perception | |
CN117765251B (en) | Bladder tumor segmentation method based on pyramid vision converter | |
CN113239978B (en) | Method and device for correlation of medical image preprocessing model and analysis model | |
CN118212241B (en) | Neck MRI image analysis method based on dual-stage granularity network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |