CN116957968A - Method, system, equipment and medium for enhancing digestive tract endoscope image - Google Patents
Method, system, equipment and medium for enhancing digestive tract endoscope image Download PDFInfo
- Publication number
- CN116957968A CN116957968A CN202310894600.6A CN202310894600A CN116957968A CN 116957968 A CN116957968 A CN 116957968A CN 202310894600 A CN202310894600 A CN 202310894600A CN 116957968 A CN116957968 A CN 116957968A
- Authority
- CN
- China
- Prior art keywords
- branch
- image
- feature
- digestive tract
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 210000001035 gastrointestinal tract Anatomy 0.000 title claims abstract description 36
- 230000002708 enhancing effect Effects 0.000 title claims abstract description 15
- 238000000605 extraction Methods 0.000 claims abstract description 49
- 238000012545 processing Methods 0.000 claims description 40
- 238000010586 diagram Methods 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 18
- 238000011176 pooling Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 238000005286 illumination Methods 0.000 abstract description 11
- 201000010099 disease Diseases 0.000 abstract description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 5
- 238000003745 diagnosis Methods 0.000 abstract description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000001839 endoscopy Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 206010009944 Colon cancer Diseases 0.000 description 2
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000011229 interlayer Substances 0.000 description 2
- 239000010410 layer Substances 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101000685982 Homo sapiens NAD(+) hydrolase SARM1 Proteins 0.000 description 1
- 102100023356 NAD(+) hydrolase SARM1 Human genes 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 238000004195 computer-aided diagnosis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000000968 intestinal effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000000849 selective androgen receptor modulator Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10068—Endoscopic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
Abstract
The application belongs to the field of computer-aided disease diagnosis, and discloses a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, which comprise the following steps: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence. The technical scheme of the application can improve the adaptability of the network under the condition of uneven illumination, and gradually recover high-quality endoscope images by fusing global and local features with different scales.
Description
Technical Field
The application belongs to the field of computer-aided disease diagnosis, and particularly relates to a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope.
Background
Colorectal cancer is one of the life threatening diseases of humans today. It ranks second in mortality among all cancers, third in morbidity. Clinically, endoscopy is an effective method of screening for colorectal diseases and preventing early colorectal cancer. In the course of endoscopy, endoscopic images play a critical role in diagnosis and therapy, providing physicians with visual information related to biological tissue. In general, various factors affect imaging quality (e.g., confined space, reflection, intestinal wall, etc.), which can introduce complex distortions into the captured endoscopic image. These distortions include abnormal exposure, low contrast, blurring, ghosting, and mixed effects. Among them, low contrast typically presents a low light appearance (with uneven illumination and noise), and is attracting more and more attention from researchers. In addition, severe degradation of the endoscopic image can also affect subsequent computer-aided diagnosis tasks, such as classification and segmentation of colorectal diseases. Conversely, high quality endoscopic images make it easier for a physician to identify tissue details. Therefore, it is important to improve the quality of endoscopic images by low-Light Image Enhancement (LIE) techniques to improve the accuracy of diagnostic results and to provide a reliable basis for subsequent diagnostic work. This is because there are some limitations to the existing endoscopic image enhancement methods. On the one hand, most methods analyze images mainly in a single-scale manner, and thus have limited non-uniform illumination capability for a comprehensive understanding of dimensional changes. On the other hand, most methods do not make full use of the context information in the feature extraction process, which is detrimental to understanding semantic information for noise suppression.
Over the past few years, many image enhancement methods have been proposed from different perspectives. In the early stages, researchers have invested a great deal of effort in designing histogram-based methods. These methods mainly extend the dynamic range of the input image to obtain a more uniform pixel intensity distribution. However, most histogram-based methods do not adjust the illumination well, and may produce excessive or insufficient enhancement results. Furthermore, these methods typically ignore noise in weak areas. Another popular solution to the Lie task is to decompose the image into illumination and reflectance based on Retinex theory. Existing Retinex-based methods are mainly based on removing the illumination and taking the reflectivity as the result of the enhancement, or combining the adjusted illumination with the reflection. However, ambiguity of the decomposition result tends to result in unnatural output. With the success of Deep Learning (DL) in various visual tasks, the image enhancement problem has translated into an image translation problem that does not rely on physical assumptions. However, existing work has focused mainly on low-light natural images, and little research has been done on low-Light Endoscopic Image Enhancement (LEIE).
Disclosure of Invention
The application aims to provide a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, so as to solve the problems existing in the prior art.
In order to achieve the above object, the present application provides a method for enhancing an image of an digestive tract endoscope, comprising:
acquiring an initial image of an digestive tract endoscope;
inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope;
the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
Optionally, the training method of the image enhancement model specifically includes:
acquiring training data; the training data comprises a digestive tract endoscope training image and a corresponding enhanced image;
and respectively inputting the training data into a main branch, a first auxiliary branch and a second auxiliary branch for image enhancement, carrying out feature fusion on the enhanced images of the three branches, and training with the minimum structural acquaintance loss function between the initial training result after fusion and the reference enhanced image corresponding to the digestive tract endoscope training image as a target to obtain the image enhancement model.
Optionally, the processing procedure of the scale-space feature extraction block includes:
inputting the initial image of the digestive tract endoscope into the context feature extraction module for context information extraction and isolated noise denoising processing to obtain denoised image data;
and taking the denoising image data as the input of the spatial residual error attention module, and performing self-adaptive focusing processing to obtain the scale space characteristic image data.
Optionally, inputting the initial image of the digestive tract endoscope into the context feature extraction module for extracting context information and filtering isolated noise to obtain denoising image data, which specifically includes:
the context feature extraction module comprises a lower branch structure and an upper branch structure which are identical in structure, and the initial image of the digestive tract endoscope is used as an input featureRespectively at the lower partsPerforming 3×3 convolution processing on the input features in the branch structure and the upper branch structure to obtain a first upper branch feature map +.>And a first lower branch characteristic diagram F l 5,1 In said upper branch, for said first upper branch profile +.>Input features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second upper branch characteristic diagram +.>For the first upper branch feature map in the lower branchInput features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second lower branch characteristic diagram F l 5,2 For the second upper branch feature map +.>Performing 3×3 convolution processing to obtain third upper branch feature diagram +.>-third upper branch feature map->And the second lower branch feature diagram F l 5,2 Performing tandem processing, and associating the processing result with the input feature +.>Forward connection is carried out to obtain a fourth upper branch specialSyndrome/pattern of->For the fourth upper branch feature mapPerforming a 3 x3 convolution process and combining the result with said input feature +.>First upper branch feature map->And third upper branch feature map->Is subjected to a series processing, and the result of the series processing is subjected to a 1 x 1 convolution processing to obtain the output characteristic of the upper branch ∈ ->
Information interaction is carried out in the lower branch by utilizing the characteristic diagrams of the upper branch to obtain the output characteristics of the lower branch
Output characteristics of the upper branchAnd output characteristics of the lower branch->Performing concatenation and sequentially performing 1×1 convolution and a 3×3 convolution, and combining the processing result with the input feature +.>Combining to obtain the output of the context feature extraction module CFEM>Namely denoising the image data;
wherein, obtainTaking the output of the context feature extraction module CFEMThe calculation formula of (2) is as follows:
in the formula Conv 1 And Conv 3 Representing 1 x 1 and 3 x3 convolution operations, respectively;representing the connection operation.
Optionally, the denoising image data is used as input of the spatial residual error attention module to perform adaptive focusing processing to obtain scale space feature image data, which specifically includes:
the spatial residual attention module comprises a left branch and a right branch which have the same structure, the denoising image data is subjected to maximum pooling processing in the left branch, and the spatial residual attention processing is performed to obtain a left branch processing result;
carrying out average pooling treatment on the denoising image data in the right branch, and carrying out space residual error attention treatment to obtain a right branch treatment result;
acquiring scale space feature image data based on the left branch processing result and the right branch processing result;
the calculation formula for acquiring the scale space characteristic image data is as follows:
in the method, in the process of the application,for the output of the context feature extraction module CFEM,/i>For the feature map after the maximum pooling treatment, < > is given>Note the output of the left branch in the module for spatial residual, < >>Is +.>Gamma is learning parameter, < >>Note the output of the right branch in the module for spatial residual, < >>Is +.> Sigma is a Sigmoid function, < >>The final output of the module, i.e. the scale-space feature image data, is noted for the spatial residual.
A low-light endoscopic image enhancement system, comprising:
the data acquisition module is used for acquiring an initial image of the digestive tract endoscope;
the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform an image enhancement method of an endoscopic image of the alimentary canal.
A computer readable storage medium storing a computer program which, when executed by a processor, implements an image enhancement method of an endoscopic image of the alimentary canal.
The application has the technical effects that:
the application provides a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, wherein the method comprises the following steps: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
The present application proposes a novel depth pyramid enhancement network (DPENet) for enhancing endoscopic images in low light. Specifically, dpeneet has an image pyramid structure, consisting of three parallel branches to extract global and local features on different scales. The jump connection strategy and the cross-branch compensation strategy are used in each branch, so that intra-layer and inter-layer fusion is realized, and the multi-scale characteristics are fully utilized. Such a structure helps the network understand the uneven illumination in the image. To suppress isolated noise, dpeneet has multiple scale-Space Feature Extraction Blocks (SFEBs) set in each branch. SFEB is composed of a Context Feature Extraction Module (CFEM) and a Spatial Residual Attention Module (SRAM). CFEM mines semantic information by extracting context information to filter orphan noise. SRAM utilizes a spatial attention mechanism to help the network adaptively focus on dark areas. DPENT gradually restores high quality endoscopic images by fusing global and local features of different scales.
The present application uses residual connection to connect the input image with the cascading features of the three branches to generate an enhanced image. Residual connection helps to mitigate excessive dependence of the model on noise or low frequency signals, thereby generating a more realistic image.
In each branch, the present application utilizes a jump connection from the encoder transfer unit to the decoder within the same branch to reduce the problems of gradient vanishing and network degradation. Furthermore, the same size units of different branches are connected by offset connections to integrate global and local information. The pyramid structure is beneficial to extracting and aggregating global and local features on different scales by the network, and improves the adaptability of the network under uneven illumination conditions.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a block diagram of a network in accordance with an embodiment of the present application;
FIG. 2 is a block diagram of a context feature extraction module CFEM network in an embodiment of the present application;
fig. 3 is a block diagram of a spatial residual attention module SRAM network in an embodiment of the present application.
Detailed Description
Various exemplary embodiments of the application will now be described in detail, which should not be considered as limiting the application, but rather as more detailed descriptions of certain aspects, features and embodiments of the application.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. In addition, for numerical ranges in this disclosure, it is understood that each intermediate value between the upper and lower limits of the ranges is also specifically disclosed. Every smaller range between any stated value or stated range, and any other stated value or intermediate value within the stated range, is also encompassed within the application. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. Although the application has been described with reference to a preferred method, any method similar or equivalent to those described herein can be used in the practice or testing of the present application. All documents mentioned in this specification are incorporated by reference for the purpose of disclosing and describing the methodologies associated with the documents. In case of conflict with any incorporated document, the present specification will control.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the application described herein without departing from the scope or spirit of the application. Other embodiments will be apparent to those skilled in the art from consideration of the specification of the present application. The specification and examples of the present application are exemplary only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are intended to be inclusive and mean an inclusion, but not limited to.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
Example 1
As shown in fig. 1 to 3, the present embodiment provides a method, a system, an apparatus and a medium for enhancing an image of an alimentary canal endoscope, wherein the method includes: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
A low-light endoscopic image enhancement system, comprising: the data acquisition module is used for acquiring an initial image of the digestive tract endoscope; the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform an image enhancement method of an endoscopic image of the alimentary canal.
A computer readable storage medium storing a computer program which, when executed by a processor, implements an image enhancement method of an endoscopic image of the alimentary canal.
The present embodiment proposes a high-performance endoscopic image enhancement method based on uneven brightness of an endoscopic image, aiming at the problem of low-illuminance enhancement of an endoscopic image. Aims to assist endoscopists to accurately and rapidly complete endoscopy and diagnosis operations through an efficient endoscopic image enhancement method. The system flow is as follows: this embodiment proposes a new network named DPENet for enhancing low-light endoscopic images. In this context, the present embodiment proposes a novel depth pyramid enhancement network (dpene) for enhancing endoscopic images in low light. Specifically, dpeneet has an image pyramid structure, consisting of three parallel branches to extract global and local features on different scales. The jump connection strategy and the cross-branch compensation strategy are used in each branch, so that intra-layer and inter-layer fusion is realized, and the multi-scale characteristics are fully utilized. Such a structure helps the network understand the uneven illumination in the image. To suppress isolated noise, dpeneet has multiple scale-Space Feature Extraction Blocks (SFEBs) set in each branch. SFEB is composed of a Context Feature Extraction Module (CFEM) and a Spatial Residual Attention Module (SRAM). CFEM mines semantic information by extracting context information to filter orphan noise. SRAM utilizes a spatial attention mechanism to help the network adaptively focus on dark areas. By fusing global and local features of different scales, the DPENet of the present embodiment gradually restores high quality endoscopic images.
The structure of DPENet is shown in fig. 1. DPENet has a pyramidal structure, comprising one main branch and two auxiliary branches. Each branch follows an encoder-decoder structure, where the present embodiment provides a plurality of cells. The unit includes a plurality of densely connected SFEB modules, and SFEBs are aggregated by a multi-scale fusion operation for enhanced feature representation. In general, the main branch reserves 30 SFEBs, the first auxiliary branch reserves 10 SFEBs, and the second auxiliary branch reservesLeaving 2 SFEBs. SFEB aims to suppress noise, comprising two key modules: CFEM and SARM. In view of the positive role of global and local features in understanding uneven illumination, the present embodiment will input image I in Downsampling at a ratio of 2 times to 4 times, and respectively obtaining imagesAnd->As input to the first and second auxiliary branches. Such an operation facilitates the network to extract global and local features on different scales. After processing the image through three branches, the present embodiment uses residual connection to process the input image I in And the enhancement image is generated by connecting with the cascading characteristics of the three branches. Residual connection helps to mitigate excessive dependence of the model on noise or low frequency signals, thereby generating a more realistic image.
In each branch, the present embodiment utilizes a jump connection from the encoder to the decoder within the same branch to reduce the problems of gradient extinction and network degradation, and in addition, the present embodiment connects the same-sized units of different branches through an offset connection to integrate global and local information, such a pyramid structure helps the network extract and aggregate global and local features on different scales, improving the adaptability of the network under uneven lighting conditions.
The details of the network operation are as follows:
1. context feature extraction module CFEM:
fig. 2 shows the architecture of CFEM proposed in this embodiment. As shown, the CFEM has a dual-branch structure. The upper branch first processes the input features using a 3 x3 convolutionObtain->Then, will->And feature map F from lower branch l 5,1 And->Sequentially connecting the two channels in series according to the channel size to obtain a characteristic diagram +.>Such operations facilitate interactions of contextual features on different scales. Next, will->By 3 x3 convolution (get +.>) And feature map F from the lower branch l 5,2 Is treated in series and +.>And->Is connected in the forward direction. Finally, the obtained profile +.>By 3X 3 convolution (give F u 3,3 ) Processing and->And->Is connected in series. By processing the series characteristic map using 1×1 convolution, the output +.>Here, the use of forward connections aims at reusing the features of the different phases, enhancing the representational capacity of the network. The above operations can be expressed as:
wherein the method comprises the steps of
In equations 1 and 2, conv 1 And Conv 3 Representing 1 x 1 and 3 x3 convolution operations respectively,representing the connection operation, the lower branch and the upper branch have the same structure, but the characteristic diagram of the upper branch is used for information interaction, as shown in fig. 2;
in obtaining output from upper and lower branchesAnd->) After that, the present embodiment further connects them and sequentially processes the connected feature map by a 1×1 convolution and a 3×3 convolution, and then CFEM output +.>By combining the obtained features with the input features +.>And (3) adding:
2. space residual attention module SRAM
The present embodiment proposes an SRAM to help the network focus on dark areas, inspired by the visual attention mechanism. Fig. 3 shows the structure of the proposed SRAM, which has a dual-branch structure. In the left branch of the SRAM, the input features (i.e., the output features of the CFEM) First go through maximumThe MaxPool () process is pooled and then subjected to Spatial Residual Attention (SRA) block processing. In SRA, feature map->Obtaining +.about.3 through a 3×3 convolution, a ReLU function and a 3×3 convolution>At the same time (I)>Also by a 1X 1 convolution, to obtain +.>The present embodiment will->And->And adding to obtain the final product. Then, the learnable parameters γ and +.>Multiplying to obtain +.>Gamma is used for adjusting->Is a weight of (2). The above operations can be summarized as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,
in the case of the equation 4 of the present application,representing element-wise multiplication.
The right branch and the left branch of the SRAM have similar structures, but there are two differences. One difference is that the right branch uses average pooling instead of maximum pooling to process the input features. The motivation for this is that the two pooling operations compensate each other, while using them more helps to extract rich features than using only one. Another difference is the output of the SRA in the right branchThe weight of (2) is set to 1-gamma.
After processing the input features using two branches, the final output features of the SRAM can be obtained by:
wherein, in the formula, the chemical formula,for the output of the context feature extraction module CFEM,/i>For the feature map after the maximum pooling treatment, < > is given>Note the output of the left branch in the module for spatial residual, < >>Is +.>Gamma is learning parameter, < >>Note the output of the right branch in the module for spatial residual, < >>Is +.> Sigma is a Sigmoid function, < >>The final output of the module is noted for spatial residual.
In the training stage, parameters such as a loss function, a batch size, an epoch size, a proportion of data set division and the like are set: in order to better improve the performance of the network, the embodiment adopts a deep supervised learning mode to train the model, and although the L2 loss is easy to optimize in the training process, excessive smoothness of the background and ghost artifacts can be caused. In this context, the present embodiment employs Structural Similarity Index (SSIM) loss as an alternative approach. SSIM loss L S Similarity between reconstructed image and original image is measured by considering structural similarity: . The calculation formula of the loss function is as follows:
where M represents the mth pixel and M is the number of pixels in the image. I out And I gt Representing the reconstructed image and the real image, respectively. S (·, ·) is the SSIM index, defined as:
wherein mu p (m) and mu g (m) is the mean of blocks around the mth pixel in the reconstructed image and the real image, respectively.And->Representing the variance of the two blocks, σ p,g (m) represents their covariance. Constant c 1 And c 2 For preventing division by zero.
The present embodiment implements a network using a pyrerch framework, and the present embodiment trains the network for 350 epochs using an ADAM optimizer. The learning rate was set to 0.005 and reduced 10-fold at 210 th and 280 th epoch, and the input size and batch size were set to 128 x 28 and 12, respectively. All experiments were performed on a server equipped with two NVIDIAGTX3090 GPUs and two IntelXeonSilver4214RCPU@2.40GHz. In the test stage, the average value of the output results of the 10 network models is taken as a final evaluation value. Here, four widely used evaluation indexes were selected in this example: PSNR, SSIM, LPIPS and VIF.
The experimental data set and Endo4IE were used as the present example, and the endoscope data set collected in the present example had 2000 pieces of size 512×512, and 1056 pieces of data set were used for the Endo4IE data set, of size 5212×512.
The network proposed in this embodiment will be compared with classical image enhancement aNet, wherein the structures of the data set and the Endo4IE data set proposed in this embodiment are shown in the figure, and it can be seen from table 1 that the network in this embodiment performs optimally in each index, where table 1 is the experimental result of this embodiment.
TABLE 1
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.
Claims (8)
1. A method of enhancing an image of an alimentary canal endoscope, comprising:
acquiring an initial image of an digestive tract endoscope;
inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope;
the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
2. The method for enhancing an image of an alimentary canal endoscope according to claim 1, characterized in that said method for training an image enhancement model comprises:
acquiring training data; the training data comprises a digestive tract endoscope training image and a corresponding enhanced image;
and respectively inputting the training data into a main branch, a first auxiliary branch and a second auxiliary branch for image enhancement, carrying out feature fusion on the enhanced images of the three branches, and training with the minimum structural acquaintance loss function between the initial training result after fusion and the reference enhanced image corresponding to the digestive tract endoscope training image as a target to obtain the image enhancement model.
3. The method of claim 1, wherein the processing of the scale-space feature extraction block comprises:
inputting the initial image of the digestive tract endoscope into the context feature extraction module for context information extraction and isolated noise denoising processing to obtain denoised image data;
and taking the denoising image data as the input of the spatial residual error attention module, and performing self-adaptive focusing processing to obtain the scale space characteristic image data.
4. The method for enhancing an image of an alimentary canal endoscope according to claim 3, wherein inputting the initial image of the alimentary canal endoscope into the contextual feature extraction module performs contextual information extraction and isolated noise filtering processing to obtain de-noised image data, specifically comprising:
the context feature extraction module comprises a lower branch structure and an upper branch structure which are identical in structure, and the initial image of the digestive tract endoscope is used as an input featurePerforming 3×3 convolution processing on the input feature in the lower branch structure and the upper branch structure respectively to obtain a first upper branch feature map +.>And a first lower branch characteristic diagram F l 5,1 In said upper branch, for said first upper branch profile +.>Input features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second upper branch characteristic diagram +.>In the lower branch, the first upper branch feature map +.>Input features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second lower branch characteristic diagram F l 5,2 For the second upper branch feature map +.>By 3X 3Convolution processing to obtain a third upper branch feature map +.>-third upper branch feature map->And the second lower branch feature diagram F l 5,2 Performing tandem processing, and associating the processing result with the input feature +.>Performing forward connection to obtain a fourth upper branch characteristic diagram +.>For the fourth upper branch feature map +.>Performing a 3 x3 convolution process and combining the result with said input feature +.>First upper branch feature map->And a third upper branch feature mapIs subjected to a series processing, and the result of the series processing is subjected to a 1 x 1 convolution processing to obtain the output characteristic of the upper branch ∈ ->
Information interaction is carried out in the lower branch by utilizing the characteristic diagrams of the upper branch to obtain the output characteristics of the lower branch
The output of the upper branch is speciallySign of signAnd output characteristics of the lower branch->Performing concatenation and sequentially performing 1×1 convolution and a 3×3 convolution, and combining the processing result with the input feature +.>Combining to obtain the output of the context feature extraction module CFEM>Namely denoising the image data;
wherein the output of the context feature extraction module CFEM is obtainedThe calculation formula of (2) is as follows:
in the formula Conv 1 And Conv 3 Representing 1 x 1 and 3 x3 convolution operations, respectively;representing the connection operation.
5. The method of claim 3, wherein the step of performing adaptive focusing processing on the de-noised image data as an input to the spatial residual attention module to obtain scale-space feature image data comprises:
the spatial residual attention module comprises a left branch and a right branch which have the same structure, the denoising image data is subjected to maximum pooling processing in the left branch, and the spatial residual attention processing is performed to obtain a left branch processing result;
carrying out average pooling treatment on the denoising image data in the right branch, and carrying out space residual error attention treatment to obtain a right branch treatment result;
acquiring scale space feature image data based on the left branch processing result and the right branch processing result;
the calculation formula for acquiring the scale space characteristic image data is as follows:
in the method, in the process of the application,for the output of the context feature extraction module CFEM,/i>F, obtaining a feature map after the maximum pooling treatment m Note the output of the left branch in the module for spatial residual, < >>Is F after weight adjustment m Gamma is a learnable parameter for adjusting the weight;F a note the output of the right branch in the module for spatial residual, < >>Is F after weight adjustment a ,F a o =(1-γ)F a Sigma is a Sigmoid function, ++>The final output of the module, i.e. the scale-space feature image data, is noted for the spatial residual.
6. A low-light endoscopic image enhancement system, comprising:
the data acquisition module is used for acquiring an initial image of the digestive tract endoscope;
the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the image enhancement method of an endoscopic image of the alimentary canal as set forth in claims 1-5.
8. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the image enhancement method of an endoscopic image of the alimentary canal as set forth in claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310894600.6A CN116957968B (en) | 2023-07-20 | 2023-07-20 | Method, system, equipment and medium for enhancing digestive tract endoscope image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310894600.6A CN116957968B (en) | 2023-07-20 | 2023-07-20 | Method, system, equipment and medium for enhancing digestive tract endoscope image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116957968A true CN116957968A (en) | 2023-10-27 |
CN116957968B CN116957968B (en) | 2024-04-05 |
Family
ID=88445628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310894600.6A Active CN116957968B (en) | 2023-07-20 | 2023-07-20 | Method, system, equipment and medium for enhancing digestive tract endoscope image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116957968B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017183307A1 (en) * | 2016-04-20 | 2017-10-26 | 富士フイルム株式会社 | Endoscope system, image processing device, and image processing device operation method |
WO2021067591A2 (en) * | 2019-10-04 | 2021-04-08 | Covidien Lp | Systems and methods for use of stereoscopy and color change magnification to enable machine learning for minimally invasive robotic surgery |
GB202104506D0 (en) * | 2021-03-30 | 2021-05-12 | Ucl Business Plc | Medical Image Analysis Using Neural Networks |
CN113658201A (en) * | 2021-08-02 | 2021-11-16 | 天津大学 | Deep learning colorectal cancer polyp segmentation device based on enhanced multi-scale features |
CN114742848A (en) * | 2022-05-20 | 2022-07-12 | 深圳大学 | Method, device, equipment and medium for segmenting polyp image based on residual double attention |
WO2022246677A1 (en) * | 2021-05-26 | 2022-12-01 | 深圳高性能医疗器械国家研究院有限公司 | Method for reconstructing enhanced ct image |
CN116188340A (en) * | 2022-12-21 | 2023-05-30 | 上海大学 | Intestinal endoscope image enhancement method based on image fusion |
-
2023
- 2023-07-20 CN CN202310894600.6A patent/CN116957968B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017183307A1 (en) * | 2016-04-20 | 2017-10-26 | 富士フイルム株式会社 | Endoscope system, image processing device, and image processing device operation method |
WO2021067591A2 (en) * | 2019-10-04 | 2021-04-08 | Covidien Lp | Systems and methods for use of stereoscopy and color change magnification to enable machine learning for minimally invasive robotic surgery |
GB202104506D0 (en) * | 2021-03-30 | 2021-05-12 | Ucl Business Plc | Medical Image Analysis Using Neural Networks |
WO2022208060A2 (en) * | 2021-03-30 | 2022-10-06 | Ucl Business Ltd | Medical image analysis using neural networks |
WO2022246677A1 (en) * | 2021-05-26 | 2022-12-01 | 深圳高性能医疗器械国家研究院有限公司 | Method for reconstructing enhanced ct image |
CN113658201A (en) * | 2021-08-02 | 2021-11-16 | 天津大学 | Deep learning colorectal cancer polyp segmentation device based on enhanced multi-scale features |
CN114742848A (en) * | 2022-05-20 | 2022-07-12 | 深圳大学 | Method, device, equipment and medium for segmenting polyp image based on residual double attention |
CN116188340A (en) * | 2022-12-21 | 2023-05-30 | 上海大学 | Intestinal endoscope image enhancement method based on image fusion |
Non-Patent Citations (2)
Title |
---|
ZIHENG AN 等: "EIEN: Endoscopic Image Enhancement Network Based on Retinex Theory", 《SENSORS》, 31 December 2022 (2022-12-31), pages 9 - 10 * |
廉炜雯: "基于深度学习的单幅图像超分辨率重建方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2023 (2023-01-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN116957968B (en) | 2024-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Medical image fusion based on convolutional neural networks and non-subsampled contourlet transform | |
WO2021077997A1 (en) | Multi-generator generative adversarial network learning method for image denoising | |
CN110675462B (en) | Gray image colorization method based on convolutional neural network | |
CN110232653A (en) | The quick light-duty intensive residual error network of super-resolution rebuilding | |
CN113379661B (en) | Double-branch convolution neural network device for fusing infrared and visible light images | |
CN112837244B (en) | Low-dose CT image denoising and artifact removing method based on progressive generation confrontation network | |
CN112669248B (en) | Hyperspectral and panchromatic image fusion method based on CNN and Laplacian pyramid | |
Chen et al. | Blood vessel enhancement via multi-dictionary and sparse coding: Application to retinal vessel enhancing | |
CN110503614A (en) | A kind of Magnetic Resonance Image Denoising based on sparse dictionary study | |
CN114187214A (en) | Infrared and visible light image fusion system and method | |
Li et al. | Single image dehazing with an independent detail-recovery network | |
CN113012163A (en) | Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network | |
CN115170410A (en) | Image enhancement method and device integrating wavelet transformation and attention mechanism | |
CN116739899A (en) | Image super-resolution reconstruction method based on SAUGAN network | |
Li et al. | An adaptive self-guided wavelet convolutional neural network with compound loss for low-dose CT denoising | |
Li et al. | Adaptive weighted multiscale retinex for underwater image enhancement | |
CN113781489A (en) | Polyp image semantic segmentation method and device | |
Zhou et al. | Physical-priors-guided DehazeFormer | |
Qayyum et al. | Single-shot retinal image enhancement using deep image priors | |
CN116957968B (en) | Method, system, equipment and medium for enhancing digestive tract endoscope image | |
Liu et al. | Non-homogeneous haze data synthesis based real-world image dehazing with enhancement-and-restoration fused CNNs | |
Yue et al. | Deep Pyramid Network for Low-light Endoscopic Image Enhancement | |
CN113808057A (en) | Endoscope image enhancement method based on unsupervised learning | |
CN111462004A (en) | Image enhancement method and device, computer equipment and storage medium | |
Shuang et al. | Algorithms for improving the quality of underwater optical images: A comprehensive review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |