CN116563252A - Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion - Google Patents

Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion Download PDF

Info

Publication number
CN116563252A
CN116563252A CN202310532910.3A CN202310532910A CN116563252A CN 116563252 A CN116563252 A CN 116563252A CN 202310532910 A CN202310532910 A CN 202310532910A CN 116563252 A CN116563252 A CN 116563252A
Authority
CN
China
Prior art keywords
image
network
target
esophageal
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310532910.3A
Other languages
Chinese (zh)
Inventor
李小霞
孟延宗
周颖玥
刘晓蓉
张晓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Publication of CN116563252A publication Critical patent/CN116563252A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

Aiming at the problems of low contrast of foreground and background of the esophageal early cancer focus and irregular shapes, the invention provides an esophageal early cancer focus segmentation method based on attention double-branch feature fusion, wherein the network structure is characterized by combining the feature fusion of an attention mechanism with a double-branch sampling network (AMFF-DUNet), and combining a channel and space double-attention mechanism with multi-scale feature fusion and double-branch sampling. The method comprises the following steps: step 1, an AMFF-DUNet network is built, and a pyramid guidance feature fusion module (PGFM) and a double-branch sampling module (DBUM) are added into a backbone network; step 2, reading an endoscope image, and preprocessing a clipping and color space conversion image; step 3, performing accurate semantic segmentation on the esophageal endoscope image by using AMFF-DUNet; and step 4, comparing and analyzing the experimental result with the current advanced esophageal early cancer lesion segmentation method. The result shows that the method improves the segmentation accuracy of the esophageal early cancer lesions with unobvious edge characteristics and different forms.

Description

Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion
The present application claims priority to the present application of patent No. 202210610948.3, entitled "method for segmentation of esophageal early cancer lesions based on attention-deficit-feature fusion", filed on month 01 of 2022, the entire contents of which are incorporated herein by reference.
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to the technical field of machine vision and the technical field of semantic segmentation, and more particularly relates to an esophageal early cancer lesion segmentation method based on attention double-branch feature fusion.
Background
The background of the esophageal image is complex and the diseased areas of the patient are quite different, so that the screening work of esophageal lesions and early cancers is a very challenging task. The foreground and background of the esophageal lesion have low contrast and various and irregular shapes, and in addition, the image of the esophageal lesion is easily subjected to noise influence of endoscopic imaging, so that the lesion is difficult to be segmented in the endoscopic image by the traditional segmentation algorithm. Compared with the traditional method, the segmentation method based on the deep learning can effectively capture the low-level detail features and the high-level semantic features of the image, and has certain advantages for the segmentation of the esophageal image with complex background. In particular, U-Net and successive U-Net variants proposed by Ronneberger et al have been widely used for medical image segmentation, which employ symmetrical structures and jump connections, effectively fusing low-level and high-level image features, and solving the problem of inaccurate positioning of common convolutional neural networks for medical image segmentation.
In recent years, artificial intelligence methods based on deep learning have made remarkable progress in various medical fields, particularly as a medical image screening system. These areas include radiooncology diagnosis, skin cancer classification [, diabetic retinopathy segmentation, histological classification of gastric biopsy specimens, and characterization of large intestine lesions using endoscopes. Deep learning is also a powerful support tool in the field of esophageal early cancer screening. Xue et al developed a model in Caffe for early esophageal cancer detection by microvascular morphology type classification, using convolutional neural networks (Convolutional Neural Networks, CNN) for feature extraction, using support vector machines (Support Vector Machines, SVM) for classification, and opened the way of using deep learning methods for early esophageal cancer screening. Hong et al used CNN to differentiate between gastric and intestinal biochemistry and gastric tumors. The architecture consists of four convolutional layers, two max pooling layers and two fully connected layers (Fully Connected Layers, FC), with a classification accuracy of 80.77%. The 2019 Xu Ruihua professor team identified the diagnostic accuracy in terms of cancerous lesions by using sample rate confidence interval estimation (Clopper-Pearson), ranging from 0.915 to 0.977 in five external validation sets. Professor Hu Bing, 2021, proposed an esophageal cancer diagnosis algorithm based on a deep learning model, which uses 6473 pre-cancerous lesions marked by a specialist physician and esophageal squamous cell carcinoma (Narrow Band Imaging, NBI) images, extracts image features by CNN model SegNet, and constructs an early sieve model with Sensitivity (Se) and Specificity (Sp) of more than 90%. In summary, in the field of esophageal early cancer segmentation, a method for obtaining high accuracy, particularly high sensitivity and high specificity by a deep learning method is still a main direction.
Disclosure of Invention
In view of the above, the invention provides an esophageal early cancer lesion segmentation method based on attention double-branch feature fusion.
The method for segmenting the esophageal early cancer lesion based on the attention double-branch feature fusion comprises the following steps: preprocessing an esophageal endoscope image to obtain a target endoscope image; processing the target endoscope image by using an encoder network included in the deep learning model to obtain a first characteristic image; processing the first characteristic image by using a cavity space pyramid pooling module included in the deep learning model to obtain a second characteristic image; for each pyramid guidance fusion module in the multiple pyramid guidance fusion modules included in the deep learning model, fusing output features of each of multiple first target stage networks in multiple stage networks included in the encoder network by using the pyramid guidance fusion module included in the deep learning model to obtain a third feature image; and processing the first feature image and the plurality of third feature images by using a decoder network included in the deep learning model to obtain a semantic segmentation image of the esophagoscope image.
According to an embodiment of the present invention, the fusing, by using the pyramid guidance fusing module included in the deep learning model, output features of each of a plurality of first target stage networks in a plurality of stage networks included in the encoder network to obtain a third feature image includes: determining a second target phase network from the plurality of first target phase networks; based on the size of the output characteristics of the second target stage network, sampling the output characteristics of each of the plurality of first target stage networks respectively to obtain a plurality of fifth coding characteristic images; splicing the plurality of fifth coding feature images to obtain a sixth coding feature image; processing the sixth coded feature image by using a plurality of hole convolution layers respectively to obtain a plurality of seventh coded feature images, wherein hole ratios among the plurality of hole convolution layers are different; and splicing the seventh coded characteristic images to obtain the third characteristic image.
According to an embodiment of the present invention, the method further includes: a plurality of third target stage networks are determined from a plurality of network stages comprised by the decoder network based on the size of the output features of each of the plurality of second target stage networks.
According to an embodiment of the present invention, the decoder network includes a dual-branch upsampling module; wherein the processing the first feature image and the plurality of third feature images by using a decoder network included in the deep learning model to obtain a semantic segmentation image of the esophagoscope image includes: for each of a plurality of phase networks included in the decoder network, if the phase network is the third target phase network, splicing an output feature of the phase network and a third feature image corresponding to the third target phase network to obtain a first decoded feature image, and using the first decoded feature image as an input feature of a next phase network of the phase network; under the condition that the output end of the phase network is connected with a double-branch sampling module, the output characteristic of the phase network is processed by the double-branch sampling module to obtain a second decoding characteristic image, and the second decoding characteristic image is used as the input characteristic of the next phase network of the phase network; and obtaining the semantic segmentation image based on the output characteristics of the stage network under the condition that the stage network is the last stage network.
According to an embodiment of the present invention, the processing the output features of the stage network by using the dual-branch upsampling module to obtain a second decoded feature image includes: performing bicubic interpolation processing on the output characteristics of the stage network to obtain a third decoding characteristic image; performing pixel reconstruction up-sampling processing on the output characteristics of the stage network to obtain a fourth decoding characteristic image; and splicing the third decoding characteristic image and the fourth decoding characteristic image to obtain the second decoding characteristic image.
According to an embodiment of the present invention, the preprocessing of an esophageal endoscope image to obtain a target endoscope image includes: performing color space transformation on the esophagoscope image to obtain an intermediate endoscope image; and performing data normalization on the intermediate endoscopic image by using standard normal distribution to obtain the target endoscopic image.
According to an embodiment of the present invention, the encoder network includes a plurality of phase networks including a start phase network, a plurality of intermediate phase networks, and an end phase network, and the encoder network includes a spatial channel dual-attention module; the processing the target endoscope image by using the encoder network included in the deep learning model to obtain a first characteristic image includes: inputting the target endoscope image into the initial stage network to obtain a first coding characteristic image; processing the first coding feature image by using a space channel dual-attention module to obtain a second coding feature image; processing the second coding feature image by using the plurality of intermediate-stage networks to obtain a third coding feature image; processing the third coding feature image by using the space channel dual-attention module to obtain a fourth coding feature image; and inputting the fourth encoded feature image into the ending stage network to obtain the first feature image.
According to the embodiment of the invention, the backbone network of the deep learning model is ResNet101.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 schematically shows a network structure diagram of a deep learning model according to an embodiment of the present invention.
Fig. 2 schematically illustrates a schematic diagram of a pyramid-guided fusion module according to an embodiment of the present invention.
Fig. 3 schematically shows a schematic diagram of a dual-branch upsampling module according to an embodiment of the present invention.
Fig. 4 schematically shows a schematic diagram of a bicubic interpolation method according to an embodiment of the invention.
Fig. 5 schematically shows a thermodynamic diagram comparing a segmentation method of an esophageal early-cancer lesion based on attention-deficit-feature fusion with a related-art method according to an embodiment of the invention.
Fig. 6 schematically shows a segmentation map comparison of an esophageal early-cancer lesion segmentation method based on attention-deficit-feature fusion and a related-art method according to an embodiment of the invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In screening for early esophageal cancer, the endoscopic performance of early esophageal cancer is very slight, and an endoscopist cannot easily and accurately find a focus area. The pyramid guidance feature fusion module is provided by utilizing the cavity convolution and the depth separable convolution, the expression of effective information is enhanced by guiding and fusing different layers of features, the double-branch on-sampling module is provided by utilizing the method of bicubic interpolation and pixel reconstruction, and up-sampling is carried out on the space and the channel at the same time, so that the loss of useful information in the up-sampling process is reduced, the deep LabV3+ network parameters are adjusted to conform to the task of classifying semantic segmentation, and meanwhile, the convolution attention module, the pyramid feature fusion module and the double-branch on-sampling module are combined to build a double-branch feature fusion network combined with an attention mechanism, so that the segmentation precision of the esophageal early cancer lesion area is improved.
In embodiments of the present invention, the data involved (e.g., including but not limited to user personal information) is collected, updated, analyzed, processed, used, transmitted, provided, disclosed, stored, etc., all in compliance with relevant legal regulations, used for legal purposes, and without violating the public welfare. In particular, necessary measures are taken for personal information of the user, illegal access to personal information data of the user is prevented, and personal information security, network security and national security of the user are maintained. In embodiments of the present invention, the user's authorization or consent is obtained before the user's personal information is obtained or collected.
According to an embodiment of the invention, the esophageal early cancer lesion segmentation method based on attention double-branch feature fusion can comprise the following operations:
preprocessing an esophageal endoscope image to obtain a target endoscope image; processing the target endoscope image by utilizing an encoder network included in the deep learning model to obtain a first characteristic image; processing the first characteristic image by using a cavity space pyramid pooling module included in the deep learning model to obtain a second characteristic image; for each pyramid guidance fusion module in the multiple pyramid guidance fusion modules included in the deep learning model, fusing the respective output characteristics of a plurality of first target stage networks in the multiple stage networks included in the encoder network by utilizing the pyramid guidance fusion module included in the deep learning model to obtain a third characteristic image; and processing the first characteristic image and the plurality of third characteristic images by utilizing a decoder network included in the deep learning model to obtain a semantic segmentation image of the esophageal endoscope image.
According to an embodiment of the present invention, fusing output features of each of a plurality of first target stage networks in a plurality of stage networks included in an encoder network by using a pyramid guidance fusion module included in a deep learning model, to obtain a third feature image may include the following operations:
determining a second target phase network from the plurality of first target phase networks; based on the size of the output characteristics of the second target stage network, sampling the respective output characteristics of the first target stage networks to obtain a plurality of fifth coding characteristic images; splicing the plurality of fifth coding feature images to obtain a sixth coding feature image; processing the sixth coding feature image by using a plurality of hole convolution layers respectively to obtain a plurality of seventh coding feature images, wherein the hole ratios among the plurality of hole convolution layers are different; and splicing the seventh coded feature images to obtain a third feature image.
According to an embodiment of the present invention, a plurality of third target stage networks may be determined from a plurality of network stages comprised by the decoder network based on the sizes of the output features of each of the plurality of second target stage networks.
According to an embodiment of the invention, the decoder network may comprise a dual-branch upsampling module.
According to an embodiment of the present invention, processing the first feature image and the plurality of third feature images using a decoder network included in the deep learning model to obtain a semantically segmented image of the esophageal endoscope image may include the operations of:
for each of a plurality of stage networks included in the decoder network, under the condition that the stage network is a third target stage network, splicing the output characteristic of the stage network and a third characteristic image corresponding to the third target stage network to obtain a first decoding characteristic image, and taking the first decoding characteristic image as the input characteristic of the next stage network of the stage network; under the condition that the output end of the phase network is connected with a double-branch sampling module, the output characteristic of the phase network is processed by the double-branch sampling module to obtain a second decoding characteristic image, and the second decoding characteristic image is used as the input characteristic of the next phase network of the phase network; and under the condition that the stage network is the last stage network, obtaining the semantic segmentation image based on the output characteristics of the stage network.
According to an embodiment of the present invention, the processing the output features of the stage network with the dual-branch upsampling module to obtain the second decoded feature image may comprise the following operations:
performing bicubic interpolation processing on the output characteristics of the stage network to obtain a third decoding characteristic image; performing pixel reconstruction up-sampling processing on the output characteristics of the stage network to obtain a fourth decoding characteristic image; and splicing the third decoding characteristic image and the fourth decoding characteristic image to obtain a second decoding characteristic image.
According to an embodiment of the present invention, preprocessing an esophageal endoscope image to obtain a target endoscope image may include the following operations:
performing color space transformation on the esophageal endoscope image to obtain an intermediate endoscope image; and performing data standardization on the intermediate endoscopic image by using standard normal distribution to obtain a target endoscopic image.
According to an embodiment of the invention, the encoder network comprises a plurality of phase networks including a start phase network, a plurality of intermediate phase networks, and an end phase network, the encoder network comprising a spatial channel dual attention module.
According to an embodiment of the present invention, processing the target endoscopic image with the encoder network included in the deep learning model to obtain the first feature image may include the following operations:
inputting the target endoscope image into a network at the beginning stage to obtain a first coding characteristic image; processing the first coding feature image by using a space channel dual-attention module to obtain a second coding feature image; processing the second coding feature image by using a plurality of intermediate-stage networks to obtain a third coding feature image; processing the third coding feature image by using a space channel dual-attention module to obtain a fourth coding feature image; and inputting the fourth coded feature image into the ending stage network to obtain the first feature image.
According to an embodiment of the invention, the backbone network of the deep learning model is ResNet101.
The method for segmenting the esophageal early cancer lesion based on the attention double-branch feature fusion of the invention is further described in detail below with reference to examples and drawings.
Step 1, a deep learning network is built, specifically, a feature fusion of an attention mechanism and a double-branch sampling network (Attention Mechanism combined Feature Fusion and Dual-branch Upsampling Network, AMFF-DUNet) are combined, the network uses ResNet101 as a backbone network, an overall framework is a coder-decoder structure, and an encoder introduces double attention of space and channels and is used for enhancing feature expression capability of a non-obvious focus area; the Pyramid guidance feature fusion module (Pyramid-Guided Feature Fusion Module, PGFM) provided by the invention is used in the middle of a coder and a decoder to guide and fuse different layers of features so as to enhance the expression of effective information; the two-way up-sampling module (Dual-Branch Upsampling Module, DBUM) proposed by the present invention is used in the decoder stage to reduce the information loss during up-sampling.
Fig. 1 schematically shows a network structure diagram of a deep learning model according to an embodiment of the present invention.
As shown in fig. 1, the legend name of the related operation is listed on the right side of the figure, the network input size is 224×224×3, the network structure uses deep labv3+ as a basic frame, and the whole network is divided into five stages from stage1 to stage 5. The use of convolution attention modules (Convolutional Block Attention Module, CBAM) at the beginning and end of the encoder introduces channel and spatial dual attention, enhancing the feature expression of non-significant lesion areas. Context information and enhanced feature expression are captured at multiple scales between codecs using the pyramid guided feature fusion module (PGFM) proposed by the present invention and the network's own hole space pyramid pooling (Atrous Spatial Pyramid Pooling, ASPP). The double-branch up-sampling module (DBUM) provided by the invention is used in the decoder stage, and the module reduces the loss of detail information in the up-sampling process by fusing the space and channel information of the image, so that the network segmentation capability is enhanced.
The modules proposed by the invention are explained in detail as follows:
(1) Multiscale feature extraction is performed using a pyramid guided feature fusion module (PGFM). In order to extract global context information from feature maps of different levels, and prevent spatial information from being lost in an up-sampling process, a pyramid guidance feature fusion module proposed in the step 1 maps features of different stages of ResNet101 to the same channel space as a selected stage through regular 3×3 convolution, up-samples the generated feature maps to the same size and splices the feature maps; and then, using the hole convolution superposition with different hole ratios to enlarge the receptive field and compensate the loss of correlation, and considering that the operation can increase the model parameters and influence the calculation speed of the network, performing depth separable convolution on the feature mapping after splicing before performing the hole convolution, and finally, using common convolution to obtain final feature mapping.
Fig. 2 schematically illustrates a schematic diagram of a pyramid-guided fusion module according to an embodiment of the present invention.
As shown in fig. 2, PGFM maps the features of stage 3 and stage 4 to the same channel space as stage 2 by regular 3 x 3 convolution. The generated feature maps F3 and F4 are up-sampled to the same size as F2 and stitched. Then, in order to extract global context information from different levels of feature maps and prevent spatial information from being lost in the up-sampling process, a hole convolution superposition with hole rates of r=1, r=2 and r=4 is used, the receptive field is enlarged, the loss of correlation is compensated, and the calculation speed of the network is influenced by increasing model parameters in consideration of the operation, so that depth separable convolution is carried out on the feature maps after splicing before the hole convolution is carried out. Finally, a final feature map is obtained using a common convolution. The PGFM module output is as formula (1):
where Pk represents the output of PGFM inserted into the kth stage, F k A feature map encoder representing a kth stage,indicating a rate of 2 i-k Conv 3×3 Representing a 3 x 3 convolution, conv ds Representing depth separable convolutions, conv dc R2 i-k Indicating an expansion ratio of 2 i-k Is used, cat represents the Concat operation, and m represents the number of stages involved in feature guidance.
(2) A dual-branch upsampling module (DBUM) is used to reduce loss of detail information during upsampling. In step 1, in order to reduce loss of image detail information in the upsampling process, the dual-branch upsampling module (DBUM) provided by the invention uses bicubic interpolation (BiCubic interpolation, biC) and pixel reconstruction (PixelShuffle, PS) to perform parallel upsampling in the upsampling stage of the decoder, and captures the required characteristics of the network on the space and the channel at the same time, so that the network generates a high-resolution characteristic map with rich detail and semantic information.
Fig. 3 schematically shows a schematic diagram of a dual-branch upsampling module according to an embodiment of the present invention.
As shown in fig. 3, bicubic interpolation (BiCubic interpolation, biC) and pixel reconstruction (PixelShuffle, PS) are used for parallel upsampling. PS in fig. 3 represents a pixel reconstruction up-sampling method, which first obtains r by convolution 2 And r is the image magnification of the feature map of each channel. And r of each pixel in the low resolution image is selected by a period screening mode 2 The channels are spread out to r x r large pixels and are recombined together so the number of channels r 2 C is reduced to C and the image size hxw is expanded to rH x rW. General comingIn other words, DBUM is to spatially perform bicubic interpolation up-sampling on the input to obtain a feature map F s Performing pixel reconstruction up-sampling on the channel to obtain F c F is to F s And F c And adding and fusing to obtain output. The output of the sampling module on the double branch is as shown in the formula (2):
Output=Bicubic(Input)+Conv(PS(Input)) (2)
where Conv denotes convolution operation, biCubic denotes BiCubic interpolation, PS denotes pixel reconstruction (PixelShuffle), input denotes an Input image, and Output denotes an Output image.
In contrast to bilinear interpolation, bicubic interpolation not only considers the pixel values of 4 directly adjacent points, but also uses the pixel values of 16 adjacent points around the point to be sampled for bicubic interpolation.
Fig. 4 schematically shows a schematic diagram of a bicubic interpolation method according to an embodiment of the invention.
As shown in fig. 4, the point P is a source image coordinate point corresponding to the point B (x, y) of the amplified target image, the pixel value of P is obtained by calculating the coefficients of 16 points around the point P, taking the point a in the upper left corner as an example, the distance from the point P is (1+u, 1+v), and the most commonly used bicubic interpolation basis function in the formula (3) is substituted:
the corresponding coefficient of the point A is k 00 =f (1+u) f (1+v). The coefficients of the other 15 adjacent points can be obtained by the same method, and the pixel values of the 16 adjacent points are multiplied by the corresponding coefficients respectively and then added to obtain the pixel value of the P point. Because bicubic interpolation considers the influence of the pixel value change of each adjacent point on the image, the method can obtain a high-resolution image with richer detail information.
And 2, reading the endoscopic image, and preprocessing the clipping and color space conversion image. The data set used herein is a self-built data set consisting of white light endoscopic image, lugol's solution stained endoscopic image and NBI endoscopic image, and is 3503 total, 783 white light images, 791 NBI images, 1929 iodine stained images, provided by four hospitals of four 0, mianyang, sichuan, and histologically confirmed in all cases. The data are all acquired by doctors when the patients do gastroscopy, have randomness, generality and authenticity, the data are marked by determining focus areas according to endoscope reports through gastroenterology specialists in hospitals, fine marking work is carried out on the data through Lableme software, and the image is cut into 224 multiplied by 224 and then is preprocessed. In order to reduce the influence of light reflection and low contrast on a model, the training efficiency of the model is quickened, the generalization capability of the model is enhanced, and the following pretreatment is carried out aiming at the characteristics of an endoscopic image:
(1) The esophageal early cancer is caused to appear at different positions by using random horizontal overturn and random cutting, so that the dependence of a model on the occurrence position of a focus is reduced;
(2) Converting the RGB image into an HSV image;
(3) In order to reduce the influence of reflection light and strong light on an endoscopic image, the brightness and the contrast of the image processed in the step (2) are randomly adjusted between 0.8 and 1.2, and the sensitivity of the model to high brightness and low contrast is reduced;
(4) And (3) carrying out data standardization by using a standard normal distribution method, and accelerating model convergence.
And 3, performing accurate semantic segmentation on the esophageal endoscope image by using AMFF-DUNet. The experiment relies on a Pytorch platform to build a neural network with version 1.8.0 and Python version 3.6.5. The training strategy is as follows: dividing the data set into training set, verification set and test set according to the ratio of 7:2:1, using SGD as optimizer in training process, and setting initial learning rate to 0.5X10 3 And pre-heat learning is used during the first 10 training rounds to speed up model convergence. Each iteration stores 1 network model for 1 round, and the total iteration is 300 rounds. And storing the model with the best test result. Because the distribution of positive and negative samples of the esophageal early cancer data is unbalanced, focal loss is selected as a loss function, the weight of the difficult-to-separate positive samples in the loss function is increased, and the segmentation accuracy of the overall positive samples is improved. Focal loss is shown in formula (4), pt represents the degree of approaching the true value, and the larger Pt indicates the closer to the true value, namely the more accurate classification is, gamma is an adjustable factor, the value is between 0 and 1, L f1 Representing Focal loss.
L f1 =-(1-P t ) γ log(P t ) (4)
Tables 1 and 2 are the results of the ablation experiments of PGFM and DBUM on the self-established dataset, with deeplabvv3+ as the reference network, respectively. From the ablation experiments of Table 1, it can be seen that two models of PGFM are added, namely, the PGFM in the table 2 Average cross-over ratios (Mean Intersection over Union, MIoU), se and Sp indexes on the self-established dataset were raised from 79.10%, 85.40% and 89.53% to 79.86%, 87.89% and 90.26%, respectively. Based on the best model of the index shown in table 1, table 2 shows the model precision change of the base model after the combination of the transposed convolution (TransposedConvolution, TC), the Bicubic Interpolation (BiC) and the pixel reconstruction (PS), and it can be seen from table 2 that the transposed convolution generates a "checkerboard effect" during the upsampling process due to the 0-compensating operation, and the setting of the parameters thereof requires a large number of attempts to achieve the best effect, so that the best effect of combining the bicubic interpolation and the pixel reconstruction in experiment 7 is used to raise the MIoU, se and Sp indexes to 80.25%, 88.95% and 92.02%, respectively. According to the ablation experiment, the provided module can improve the segmentation accuracy of the esophageal early cancer, and the combined use effect is better.
Table 1 results of ablation experiments for CBAM and PGFM modules
Table 2 ablation experiment results for different types of upsampling modes in DBUM module
And step 4, comparing and analyzing the experimental result with the current advanced esophageal early cancer lesion segmentation method.
According to the embodiment of the invention, the pyramid guidance feature fusion module (PGFM), the double-branch double-sampling module (DBUM) and the space channel double-attention module are combined to form the feature fusion and double-branch on-sampling network (AMFF-DUNet) combining the attention mechanism, so that the focus region segmentation precision of the esophageal early cancer endoscopy is improved.
Experiments were performed on self-built datasets using mainstream medical image segmentation methods published in recent years. Table 3 shows the experimental results of the different segmentation methods, the method of the present invention performed best on three indices, MIoU, se, F-Score.
Table 3 experimental results of different segmentation methods in the self-built dataset
In addition to the quantified experimental data, grad-CAM visualization results were used for qualitative analysis.
Fig. 5 schematically shows a thermodynamic diagram comparing a segmentation method of an esophageal early-cancer lesion based on attention-deficit-feature fusion with a related-art method according to an embodiment of the invention.
The focus area focused by the AMFF-DUNet method of the present invention is clearly shown in FIG. 5 to better cover the target focus than other methods, indicating that the proposed model is better able to accomplish the focus area segmentation task.
Fig. 6 schematically shows a segmentation map comparison of an esophageal early-cancer lesion segmentation method based on attention-deficit-feature fusion and a related-art method according to an embodiment of the invention.
Fig. 6 shows the comparison result of the segmentation model with the "deep learning assisted early esophageal cancer diagnosis model sharing platform" of the Huaxi hospital, and the segmentation result of the method of the invention can be seen to be closer to the true value (Ground trunk) in five randomly selected pictures.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that the features recited in the various embodiments of the invention and/or in the claims may be combined in various combinations and/or combinations even if such combinations or combinations are not explicitly recited in the invention. In particular, the features recited in the various embodiments of the invention and/or in the claims can be combined in various combinations and/or combinations without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.
The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims (8)

1. An esophageal early cancer lesion segmentation method based on attention double-branch feature fusion comprises the following steps:
preprocessing an esophageal endoscope image to obtain a target endoscope image;
processing the target endoscope image by utilizing an encoder network included in the deep learning model to obtain a first characteristic image;
processing the first characteristic image by using a cavity space pyramid pooling module included in the deep learning model to obtain a second characteristic image;
for each pyramid guidance fusion module in a plurality of pyramid guidance fusion modules included in the deep learning model, fusing respective output features of a plurality of first target stage networks in a plurality of stage networks included in the encoder network by utilizing the pyramid guidance fusion module included in the deep learning model to obtain a third feature image; and
and processing the first characteristic image and the plurality of third characteristic images by using a decoder network included in the deep learning model to obtain a semantic segmentation image of the esophagoscope image.
2. The method of claim 1, wherein the fusing output features of each of a plurality of first target stage networks of a plurality of stage networks included in the encoder network with the pyramid guidance fusing module included in the deep learning model to obtain a third feature image includes:
determining a second target phase network from the plurality of first target phase networks;
based on the size of the output characteristics of the second target stage network, sampling the respective output characteristics of the first target stage networks to obtain a plurality of fifth coding characteristic images;
splicing the plurality of fifth coding feature images to obtain a sixth coding feature image;
processing the sixth coding feature image by using a plurality of hole convolution layers respectively to obtain a plurality of seventh coding feature images, wherein hole rates among the plurality of hole convolution layers are different; and
and splicing the seventh coding feature images to obtain the third feature image.
3. The method of claim 2, further comprising:
a plurality of third target stage networks are determined from a plurality of network stages comprised by the decoder network based on the sizes of the output features of each of the plurality of second target stage networks.
4. A method according to claim 3, wherein the decoder network comprises a dual-branch upsampling module;
the processing the first feature image and the plurality of third feature images by using a decoder network included in the deep learning model to obtain a semantic segmentation image of the esophageal endoscope image includes:
for each of a plurality of phase networks included in the decoder network, if the phase network is the third target phase network, splicing an output characteristic of the phase network and a third characteristic image corresponding to the third target phase network to obtain a first decoding characteristic image, and taking the first decoding characteristic image as an input characteristic of a next phase network of the phase network;
under the condition that the output end of the phase network is connected with a double-branch sampling module, the output characteristic of the phase network is processed by the double-branch sampling module to obtain a second decoding characteristic image, and the second decoding characteristic image is used as the input characteristic of the next phase network of the phase network; and
and under the condition that the stage network is the last stage network, obtaining the semantic segmentation image based on the output characteristics of the stage network.
5. The method of claim 4, wherein said processing the output features of the phase network with the dual-branch upsampling module to obtain a second decoded feature image comprises:
performing bicubic interpolation processing on the output characteristics of the stage network to obtain a third decoding characteristic image;
performing pixel reconstruction up-sampling processing on the output characteristics of the stage network to obtain a fourth decoding characteristic image; and
and splicing the third decoding characteristic image and the fourth decoding characteristic image to obtain the second decoding characteristic image.
6. The method of claim 1, wherein the preprocessing the esophageal endoscope image to obtain a target endoscope image comprises:
performing color space transformation on the esophageal endoscope image to obtain an intermediate endoscope image; and
and carrying out data standardization on the intermediate endoscope image by using standard normal distribution to obtain the target endoscope image.
7. The method of claim 1, wherein the encoder network comprises a plurality of phase networks including a start phase network, a plurality of intermediate phase networks, and an end phase network, the encoder network comprising a spatial channel dual-attention module;
the method for processing the target endoscope image by utilizing the encoder network included in the deep learning model to obtain a first characteristic image comprises the following steps:
inputting the target endoscope image into the initial stage network to obtain a first coding characteristic image;
processing the first coding feature image by using a space channel dual-attention module to obtain a second coding feature image;
processing the second coding feature image by using the plurality of intermediate-stage networks to obtain a third coding feature image;
processing the third coding feature image by using the space channel dual-attention module to obtain a fourth coding feature image; and
and inputting the fourth coding characteristic image into the ending stage network to obtain the first characteristic image.
8. The method of claim 1, wherein the backbone network of the deep learning model is res net101.
CN202310532910.3A 2022-06-01 2023-05-11 Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion Pending CN116563252A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022106109483 2022-06-01
CN202210610948.3A CN114897094A (en) 2022-06-01 2022-06-01 Esophagus early cancer focus segmentation method based on attention double-branch feature fusion

Publications (1)

Publication Number Publication Date
CN116563252A true CN116563252A (en) 2023-08-08

Family

ID=82725173

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210610948.3A Pending CN114897094A (en) 2022-06-01 2022-06-01 Esophagus early cancer focus segmentation method based on attention double-branch feature fusion
CN202310532910.3A Pending CN116563252A (en) 2022-06-01 2023-05-11 Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202210610948.3A Pending CN114897094A (en) 2022-06-01 2022-06-01 Esophagus early cancer focus segmentation method based on attention double-branch feature fusion

Country Status (1)

Country Link
CN (2) CN114897094A (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115131684A (en) * 2022-08-25 2022-09-30 成都国星宇航科技股份有限公司 Landslide identification method and device based on satellite data UNet network model
CN115546766B (en) * 2022-11-30 2023-04-07 广汽埃安新能源汽车股份有限公司 Lane line generation method, lane line generation device, electronic device, and computer-readable medium
CN116503428B (en) * 2023-06-27 2023-09-08 吉林大学 Image feature extraction method and segmentation method based on refined global attention mechanism
CN116703798B (en) * 2023-08-08 2023-10-13 西南科技大学 Esophagus multi-mode endoscope image enhancement fusion method based on self-adaptive interference suppression
CN117745745B (en) * 2024-02-18 2024-05-10 湖南大学 CT image segmentation method based on context fusion perception

Also Published As

Publication number Publication date
CN114897094A (en) 2022-08-12

Similar Documents

Publication Publication Date Title
CN116563252A (en) Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion
Cai et al. Using a deep learning system in endoscopy for screening of early esophageal squamous cell carcinoma (with video)
Pogorelov et al. Deep learning and hand-crafted feature based approaches for polyp detection in medical videos
CN112150428B (en) Medical image segmentation method based on deep learning
CN112785617A (en) Automatic segmentation method for residual UNet rectal cancer tumor magnetic resonance image
CN113256641B (en) Skin lesion image segmentation method based on deep learning
US20220245919A1 (en) Feature quantity extracting device, feature quantity extracting method, identification device, identification method, and program
US20230368379A1 (en) Image processing method and apparatus
JP7499364B2 (en) Multi-scale based whole slide pathological feature fusion extraction method, system, electronic device and storage medium
CN113436173A (en) Abdomen multi-organ segmentation modeling and segmentation method and system based on edge perception
CN116579982A (en) Pneumonia CT image segmentation method, device and equipment
CN114372951A (en) Nasopharyngeal carcinoma positioning and segmenting method and system based on image segmentation convolutional neural network
CN115471470A (en) Esophageal cancer CT image segmentation method
Yue et al. Benchmarking polyp segmentation methods in narrow-band imaging colonoscopy images
CN112489062A (en) Medical image segmentation method and system based on boundary and neighborhood guidance
CN114998644B (en) Tumor diagnosis system, construction method thereof, terminal device and storage medium
CN116168052A (en) Gastric cancer pathological image segmentation method combining self-adaptive attention and feature pyramid
CN115994999A (en) Goblet cell semantic segmentation method and system based on boundary gradient attention network
CN113205454A (en) Segmentation model establishing and segmenting method and device based on multi-scale feature extraction
Pozdeev et al. Anatomical landmarks detection for laparoscopic surgery based on deep learning technology
CN114049934A (en) Auxiliary diagnosis method, device, system, equipment and medium
KR102591587B1 (en) Medical image segmentation apparatus and method for segmentating medical image
Wei et al. Application of U-net with variable fractional order gradient descent method in rectal tumor segmentation
CN117611806B (en) Prostate cancer operation incisal margin positive prediction system based on images and clinical characteristics
Chen et al. Single-Modality Endoscopic Polyp Segmentation via Random Color Reversal Synthesis and Two-Branched Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination