CN116957968A - Method, system, equipment and medium for enhancing digestive tract endoscope image - Google Patents

Method, system, equipment and medium for enhancing digestive tract endoscope image Download PDF

Info

Publication number
CN116957968A
CN116957968A CN202310894600.6A CN202310894600A CN116957968A CN 116957968 A CN116957968 A CN 116957968A CN 202310894600 A CN202310894600 A CN 202310894600A CN 116957968 A CN116957968 A CN 116957968A
Authority
CN
China
Prior art keywords
branch
image
feature
digestive tract
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310894600.6A
Other languages
Chinese (zh)
Other versions
CN116957968B (en
Inventor
岳广辉
高杰
武泓吕
周天薇
汪天富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202310894600.6A priority Critical patent/CN116957968B/en
Publication of CN116957968A publication Critical patent/CN116957968A/en
Application granted granted Critical
Publication of CN116957968B publication Critical patent/CN116957968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Abstract

The application belongs to the field of computer-aided disease diagnosis, and discloses a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, which comprise the following steps: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence. The technical scheme of the application can improve the adaptability of the network under the condition of uneven illumination, and gradually recover high-quality endoscope images by fusing global and local features with different scales.

Description

Method, system, equipment and medium for enhancing digestive tract endoscope image
Technical Field
The application belongs to the field of computer-aided disease diagnosis, and particularly relates to a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope.
Background
Colorectal cancer is one of the life threatening diseases of humans today. It ranks second in mortality among all cancers, third in morbidity. Clinically, endoscopy is an effective method of screening for colorectal diseases and preventing early colorectal cancer. In the course of endoscopy, endoscopic images play a critical role in diagnosis and therapy, providing physicians with visual information related to biological tissue. In general, various factors affect imaging quality (e.g., confined space, reflection, intestinal wall, etc.), which can introduce complex distortions into the captured endoscopic image. These distortions include abnormal exposure, low contrast, blurring, ghosting, and mixed effects. Among them, low contrast typically presents a low light appearance (with uneven illumination and noise), and is attracting more and more attention from researchers. In addition, severe degradation of the endoscopic image can also affect subsequent computer-aided diagnosis tasks, such as classification and segmentation of colorectal diseases. Conversely, high quality endoscopic images make it easier for a physician to identify tissue details. Therefore, it is important to improve the quality of endoscopic images by low-Light Image Enhancement (LIE) techniques to improve the accuracy of diagnostic results and to provide a reliable basis for subsequent diagnostic work. This is because there are some limitations to the existing endoscopic image enhancement methods. On the one hand, most methods analyze images mainly in a single-scale manner, and thus have limited non-uniform illumination capability for a comprehensive understanding of dimensional changes. On the other hand, most methods do not make full use of the context information in the feature extraction process, which is detrimental to understanding semantic information for noise suppression.
Over the past few years, many image enhancement methods have been proposed from different perspectives. In the early stages, researchers have invested a great deal of effort in designing histogram-based methods. These methods mainly extend the dynamic range of the input image to obtain a more uniform pixel intensity distribution. However, most histogram-based methods do not adjust the illumination well, and may produce excessive or insufficient enhancement results. Furthermore, these methods typically ignore noise in weak areas. Another popular solution to the Lie task is to decompose the image into illumination and reflectance based on Retinex theory. Existing Retinex-based methods are mainly based on removing the illumination and taking the reflectivity as the result of the enhancement, or combining the adjusted illumination with the reflection. However, ambiguity of the decomposition result tends to result in unnatural output. With the success of Deep Learning (DL) in various visual tasks, the image enhancement problem has translated into an image translation problem that does not rely on physical assumptions. However, existing work has focused mainly on low-light natural images, and little research has been done on low-Light Endoscopic Image Enhancement (LEIE).
Disclosure of Invention
The application aims to provide a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, so as to solve the problems existing in the prior art.
In order to achieve the above object, the present application provides a method for enhancing an image of an digestive tract endoscope, comprising:
acquiring an initial image of an digestive tract endoscope;
inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope;
the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
Optionally, the training method of the image enhancement model specifically includes:
acquiring training data; the training data comprises a digestive tract endoscope training image and a corresponding enhanced image;
and respectively inputting the training data into a main branch, a first auxiliary branch and a second auxiliary branch for image enhancement, carrying out feature fusion on the enhanced images of the three branches, and training with the minimum structural acquaintance loss function between the initial training result after fusion and the reference enhanced image corresponding to the digestive tract endoscope training image as a target to obtain the image enhancement model.
Optionally, the processing procedure of the scale-space feature extraction block includes:
inputting the initial image of the digestive tract endoscope into the context feature extraction module for context information extraction and isolated noise denoising processing to obtain denoised image data;
and taking the denoising image data as the input of the spatial residual error attention module, and performing self-adaptive focusing processing to obtain the scale space characteristic image data.
Optionally, inputting the initial image of the digestive tract endoscope into the context feature extraction module for extracting context information and filtering isolated noise to obtain denoising image data, which specifically includes:
the context feature extraction module comprises a lower branch structure and an upper branch structure which are identical in structure, and the initial image of the digestive tract endoscope is used as an input featureRespectively at the lower partsPerforming 3×3 convolution processing on the input features in the branch structure and the upper branch structure to obtain a first upper branch feature map +.>And a first lower branch characteristic diagram F l 5,1 In said upper branch, for said first upper branch profile +.>Input features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second upper branch characteristic diagram +.>For the first upper branch feature map in the lower branchInput features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second lower branch characteristic diagram F l 5,2 For the second upper branch feature map +.>Performing 3×3 convolution processing to obtain third upper branch feature diagram +.>-third upper branch feature map->And the second lower branch feature diagram F l 5,2 Performing tandem processing, and associating the processing result with the input feature +.>Forward connection is carried out to obtain a fourth upper branch specialSyndrome/pattern of->For the fourth upper branch feature mapPerforming a 3 x3 convolution process and combining the result with said input feature +.>First upper branch feature map->And third upper branch feature map->Is subjected to a series processing, and the result of the series processing is subjected to a 1 x 1 convolution processing to obtain the output characteristic of the upper branch ∈ ->
Information interaction is carried out in the lower branch by utilizing the characteristic diagrams of the upper branch to obtain the output characteristics of the lower branch
Output characteristics of the upper branchAnd output characteristics of the lower branch->Performing concatenation and sequentially performing 1×1 convolution and a 3×3 convolution, and combining the processing result with the input feature +.>Combining to obtain the output of the context feature extraction module CFEM>Namely denoising the image data;
wherein, obtainTaking the output of the context feature extraction module CFEMThe calculation formula of (2) is as follows:
in the formula Conv 1 And Conv 3 Representing 1 x 1 and 3 x3 convolution operations, respectively;representing the connection operation.
Optionally, the denoising image data is used as input of the spatial residual error attention module to perform adaptive focusing processing to obtain scale space feature image data, which specifically includes:
the spatial residual attention module comprises a left branch and a right branch which have the same structure, the denoising image data is subjected to maximum pooling processing in the left branch, and the spatial residual attention processing is performed to obtain a left branch processing result;
carrying out average pooling treatment on the denoising image data in the right branch, and carrying out space residual error attention treatment to obtain a right branch treatment result;
acquiring scale space feature image data based on the left branch processing result and the right branch processing result;
the calculation formula for acquiring the scale space characteristic image data is as follows:
in the method, in the process of the application,for the output of the context feature extraction module CFEM,/i>For the feature map after the maximum pooling treatment, < > is given>Note the output of the left branch in the module for spatial residual, < >>Is +.>Gamma is learning parameter, < >>Note the output of the right branch in the module for spatial residual, < >>Is +.> Sigma is a Sigmoid function, < >>The final output of the module, i.e. the scale-space feature image data, is noted for the spatial residual.
A low-light endoscopic image enhancement system, comprising:
the data acquisition module is used for acquiring an initial image of the digestive tract endoscope;
the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform an image enhancement method of an endoscopic image of the alimentary canal.
A computer readable storage medium storing a computer program which, when executed by a processor, implements an image enhancement method of an endoscopic image of the alimentary canal.
The application has the technical effects that:
the application provides a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, wherein the method comprises the following steps: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
The present application proposes a novel depth pyramid enhancement network (DPENet) for enhancing endoscopic images in low light. Specifically, dpeneet has an image pyramid structure, consisting of three parallel branches to extract global and local features on different scales. The jump connection strategy and the cross-branch compensation strategy are used in each branch, so that intra-layer and inter-layer fusion is realized, and the multi-scale characteristics are fully utilized. Such a structure helps the network understand the uneven illumination in the image. To suppress isolated noise, dpeneet has multiple scale-Space Feature Extraction Blocks (SFEBs) set in each branch. SFEB is composed of a Context Feature Extraction Module (CFEM) and a Spatial Residual Attention Module (SRAM). CFEM mines semantic information by extracting context information to filter orphan noise. SRAM utilizes a spatial attention mechanism to help the network adaptively focus on dark areas. DPENT gradually restores high quality endoscopic images by fusing global and local features of different scales.
The present application uses residual connection to connect the input image with the cascading features of the three branches to generate an enhanced image. Residual connection helps to mitigate excessive dependence of the model on noise or low frequency signals, thereby generating a more realistic image.
In each branch, the present application utilizes a jump connection from the encoder transfer unit to the decoder within the same branch to reduce the problems of gradient vanishing and network degradation. Furthermore, the same size units of different branches are connected by offset connections to integrate global and local information. The pyramid structure is beneficial to extracting and aggregating global and local features on different scales by the network, and improves the adaptability of the network under uneven illumination conditions.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a block diagram of a network in accordance with an embodiment of the present application;
FIG. 2 is a block diagram of a context feature extraction module CFEM network in an embodiment of the present application;
fig. 3 is a block diagram of a spatial residual attention module SRAM network in an embodiment of the present application.
Detailed Description
Various exemplary embodiments of the application will now be described in detail, which should not be considered as limiting the application, but rather as more detailed descriptions of certain aspects, features and embodiments of the application.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. In addition, for numerical ranges in this disclosure, it is understood that each intermediate value between the upper and lower limits of the ranges is also specifically disclosed. Every smaller range between any stated value or stated range, and any other stated value or intermediate value within the stated range, is also encompassed within the application. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. Although the application has been described with reference to a preferred method, any method similar or equivalent to those described herein can be used in the practice or testing of the present application. All documents mentioned in this specification are incorporated by reference for the purpose of disclosing and describing the methodologies associated with the documents. In case of conflict with any incorporated document, the present specification will control.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the application described herein without departing from the scope or spirit of the application. Other embodiments will be apparent to those skilled in the art from consideration of the specification of the present application. The specification and examples of the present application are exemplary only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are intended to be inclusive and mean an inclusion, but not limited to.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
Example 1
As shown in fig. 1 to 3, the present embodiment provides a method, a system, an apparatus and a medium for enhancing an image of an alimentary canal endoscope, wherein the method includes: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
A low-light endoscopic image enhancement system, comprising: the data acquisition module is used for acquiring an initial image of the digestive tract endoscope; the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform an image enhancement method of an endoscopic image of the alimentary canal.
A computer readable storage medium storing a computer program which, when executed by a processor, implements an image enhancement method of an endoscopic image of the alimentary canal.
The present embodiment proposes a high-performance endoscopic image enhancement method based on uneven brightness of an endoscopic image, aiming at the problem of low-illuminance enhancement of an endoscopic image. Aims to assist endoscopists to accurately and rapidly complete endoscopy and diagnosis operations through an efficient endoscopic image enhancement method. The system flow is as follows: this embodiment proposes a new network named DPENet for enhancing low-light endoscopic images. In this context, the present embodiment proposes a novel depth pyramid enhancement network (dpene) for enhancing endoscopic images in low light. Specifically, dpeneet has an image pyramid structure, consisting of three parallel branches to extract global and local features on different scales. The jump connection strategy and the cross-branch compensation strategy are used in each branch, so that intra-layer and inter-layer fusion is realized, and the multi-scale characteristics are fully utilized. Such a structure helps the network understand the uneven illumination in the image. To suppress isolated noise, dpeneet has multiple scale-Space Feature Extraction Blocks (SFEBs) set in each branch. SFEB is composed of a Context Feature Extraction Module (CFEM) and a Spatial Residual Attention Module (SRAM). CFEM mines semantic information by extracting context information to filter orphan noise. SRAM utilizes a spatial attention mechanism to help the network adaptively focus on dark areas. By fusing global and local features of different scales, the DPENet of the present embodiment gradually restores high quality endoscopic images.
The structure of DPENet is shown in fig. 1. DPENet has a pyramidal structure, comprising one main branch and two auxiliary branches. Each branch follows an encoder-decoder structure, where the present embodiment provides a plurality of cells. The unit includes a plurality of densely connected SFEB modules, and SFEBs are aggregated by a multi-scale fusion operation for enhanced feature representation. In general, the main branch reserves 30 SFEBs, the first auxiliary branch reserves 10 SFEBs, and the second auxiliary branch reservesLeaving 2 SFEBs. SFEB aims to suppress noise, comprising two key modules: CFEM and SARM. In view of the positive role of global and local features in understanding uneven illumination, the present embodiment will input image I in Downsampling at a ratio of 2 times to 4 times, and respectively obtaining imagesAnd->As input to the first and second auxiliary branches. Such an operation facilitates the network to extract global and local features on different scales. After processing the image through three branches, the present embodiment uses residual connection to process the input image I in And the enhancement image is generated by connecting with the cascading characteristics of the three branches. Residual connection helps to mitigate excessive dependence of the model on noise or low frequency signals, thereby generating a more realistic image.
In each branch, the present embodiment utilizes a jump connection from the encoder to the decoder within the same branch to reduce the problems of gradient extinction and network degradation, and in addition, the present embodiment connects the same-sized units of different branches through an offset connection to integrate global and local information, such a pyramid structure helps the network extract and aggregate global and local features on different scales, improving the adaptability of the network under uneven lighting conditions.
The details of the network operation are as follows:
1. context feature extraction module CFEM:
fig. 2 shows the architecture of CFEM proposed in this embodiment. As shown, the CFEM has a dual-branch structure. The upper branch first processes the input features using a 3 x3 convolutionObtain->Then, will->And feature map F from lower branch l 5,1 And->Sequentially connecting the two channels in series according to the channel size to obtain a characteristic diagram +.>Such operations facilitate interactions of contextual features on different scales. Next, will->By 3 x3 convolution (get +.>) And feature map F from the lower branch l 5,2 Is treated in series and +.>And->Is connected in the forward direction. Finally, the obtained profile +.>By 3X 3 convolution (give F u 3,3 ) Processing and->And->Is connected in series. By processing the series characteristic map using 1×1 convolution, the output +.>Here, the use of forward connections aims at reusing the features of the different phases, enhancing the representational capacity of the network. The above operations can be expressed as:
wherein the method comprises the steps of
In equations 1 and 2, conv 1 And Conv 3 Representing 1 x 1 and 3 x3 convolution operations respectively,representing the connection operation, the lower branch and the upper branch have the same structure, but the characteristic diagram of the upper branch is used for information interaction, as shown in fig. 2;
in obtaining output from upper and lower branchesAnd->) After that, the present embodiment further connects them and sequentially processes the connected feature map by a 1×1 convolution and a 3×3 convolution, and then CFEM output +.>By combining the obtained features with the input features +.>And (3) adding:
2. space residual attention module SRAM
The present embodiment proposes an SRAM to help the network focus on dark areas, inspired by the visual attention mechanism. Fig. 3 shows the structure of the proposed SRAM, which has a dual-branch structure. In the left branch of the SRAM, the input features (i.e., the output features of the CFEM) First go through maximumThe MaxPool () process is pooled and then subjected to Spatial Residual Attention (SRA) block processing. In SRA, feature map->Obtaining +.about.3 through a 3×3 convolution, a ReLU function and a 3×3 convolution>At the same time (I)>Also by a 1X 1 convolution, to obtain +.>The present embodiment will->And->And adding to obtain the final product. Then, the learnable parameters γ and +.>Multiplying to obtain +.>Gamma is used for adjusting->Is a weight of (2). The above operations can be summarized as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,
in the case of the equation 4 of the present application,representing element-wise multiplication.
The right branch and the left branch of the SRAM have similar structures, but there are two differences. One difference is that the right branch uses average pooling instead of maximum pooling to process the input features. The motivation for this is that the two pooling operations compensate each other, while using them more helps to extract rich features than using only one. Another difference is the output of the SRA in the right branchThe weight of (2) is set to 1-gamma.
After processing the input features using two branches, the final output features of the SRAM can be obtained by:
wherein, in the formula, the chemical formula,for the output of the context feature extraction module CFEM,/i>For the feature map after the maximum pooling treatment, < > is given>Note the output of the left branch in the module for spatial residual, < >>Is +.>Gamma is learning parameter, < >>Note the output of the right branch in the module for spatial residual, < >>Is +.> Sigma is a Sigmoid function, < >>The final output of the module is noted for spatial residual.
In the training stage, parameters such as a loss function, a batch size, an epoch size, a proportion of data set division and the like are set: in order to better improve the performance of the network, the embodiment adopts a deep supervised learning mode to train the model, and although the L2 loss is easy to optimize in the training process, excessive smoothness of the background and ghost artifacts can be caused. In this context, the present embodiment employs Structural Similarity Index (SSIM) loss as an alternative approach. SSIM loss L S Similarity between reconstructed image and original image is measured by considering structural similarity: . The calculation formula of the loss function is as follows:
where M represents the mth pixel and M is the number of pixels in the image. I out And I gt Representing the reconstructed image and the real image, respectively. S (·, ·) is the SSIM index, defined as:
wherein mu p (m) and mu g (m) is the mean of blocks around the mth pixel in the reconstructed image and the real image, respectively.And->Representing the variance of the two blocks, σ p,g (m) represents their covariance. Constant c 1 And c 2 For preventing division by zero.
The present embodiment implements a network using a pyrerch framework, and the present embodiment trains the network for 350 epochs using an ADAM optimizer. The learning rate was set to 0.005 and reduced 10-fold at 210 th and 280 th epoch, and the input size and batch size were set to 128 x 28 and 12, respectively. All experiments were performed on a server equipped with two NVIDIAGTX3090 GPUs and two IntelXeonSilver4214RCPU@2.40GHz. In the test stage, the average value of the output results of the 10 network models is taken as a final evaluation value. Here, four widely used evaluation indexes were selected in this example: PSNR, SSIM, LPIPS and VIF.
The experimental data set and Endo4IE were used as the present example, and the endoscope data set collected in the present example had 2000 pieces of size 512×512, and 1056 pieces of data set were used for the Endo4IE data set, of size 5212×512.
The network proposed in this embodiment will be compared with classical image enhancement aNet, wherein the structures of the data set and the Endo4IE data set proposed in this embodiment are shown in the figure, and it can be seen from table 1 that the network in this embodiment performs optimally in each index, where table 1 is the experimental result of this embodiment.
TABLE 1
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (8)

1. A method of enhancing an image of an alimentary canal endoscope, comprising:
acquiring an initial image of an digestive tract endoscope;
inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope;
the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
2. The method for enhancing an image of an alimentary canal endoscope according to claim 1, characterized in that said method for training an image enhancement model comprises:
acquiring training data; the training data comprises a digestive tract endoscope training image and a corresponding enhanced image;
and respectively inputting the training data into a main branch, a first auxiliary branch and a second auxiliary branch for image enhancement, carrying out feature fusion on the enhanced images of the three branches, and training with the minimum structural acquaintance loss function between the initial training result after fusion and the reference enhanced image corresponding to the digestive tract endoscope training image as a target to obtain the image enhancement model.
3. The method of claim 1, wherein the processing of the scale-space feature extraction block comprises:
inputting the initial image of the digestive tract endoscope into the context feature extraction module for context information extraction and isolated noise denoising processing to obtain denoised image data;
and taking the denoising image data as the input of the spatial residual error attention module, and performing self-adaptive focusing processing to obtain the scale space characteristic image data.
4. The method for enhancing an image of an alimentary canal endoscope according to claim 3, wherein inputting the initial image of the alimentary canal endoscope into the contextual feature extraction module performs contextual information extraction and isolated noise filtering processing to obtain de-noised image data, specifically comprising:
the context feature extraction module comprises a lower branch structure and an upper branch structure which are identical in structure, and the initial image of the digestive tract endoscope is used as an input featurePerforming 3×3 convolution processing on the input feature in the lower branch structure and the upper branch structure respectively to obtain a first upper branch feature map +.>And a first lower branch characteristic diagram F l 5,1 In said upper branch, for said first upper branch profile +.>Input features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second upper branch characteristic diagram +.>In the lower branch, the first upper branch feature map +.>Input features->And the first lower branch feature diagram F l 5,1 Sequentially connected in series according to the channel size to obtain a second lower branch characteristic diagram F l 5,2 For the second upper branch feature map +.>By 3X 3Convolution processing to obtain a third upper branch feature map +.>-third upper branch feature map->And the second lower branch feature diagram F l 5,2 Performing tandem processing, and associating the processing result with the input feature +.>Performing forward connection to obtain a fourth upper branch characteristic diagram +.>For the fourth upper branch feature map +.>Performing a 3 x3 convolution process and combining the result with said input feature +.>First upper branch feature map->And a third upper branch feature mapIs subjected to a series processing, and the result of the series processing is subjected to a 1 x 1 convolution processing to obtain the output characteristic of the upper branch ∈ ->
Information interaction is carried out in the lower branch by utilizing the characteristic diagrams of the upper branch to obtain the output characteristics of the lower branch
The output of the upper branch is speciallySign of signAnd output characteristics of the lower branch->Performing concatenation and sequentially performing 1×1 convolution and a 3×3 convolution, and combining the processing result with the input feature +.>Combining to obtain the output of the context feature extraction module CFEM>Namely denoising the image data;
wherein the output of the context feature extraction module CFEM is obtainedThe calculation formula of (2) is as follows:
in the formula Conv 1 And Conv 3 Representing 1 x 1 and 3 x3 convolution operations, respectively;representing the connection operation.
5. The method of claim 3, wherein the step of performing adaptive focusing processing on the de-noised image data as an input to the spatial residual attention module to obtain scale-space feature image data comprises:
the spatial residual attention module comprises a left branch and a right branch which have the same structure, the denoising image data is subjected to maximum pooling processing in the left branch, and the spatial residual attention processing is performed to obtain a left branch processing result;
carrying out average pooling treatment on the denoising image data in the right branch, and carrying out space residual error attention treatment to obtain a right branch treatment result;
acquiring scale space feature image data based on the left branch processing result and the right branch processing result;
the calculation formula for acquiring the scale space characteristic image data is as follows:
in the method, in the process of the application,for the output of the context feature extraction module CFEM,/i>F, obtaining a feature map after the maximum pooling treatment m Note the output of the left branch in the module for spatial residual, < >>Is F after weight adjustment m Gamma is a learnable parameter for adjusting the weight;F a note the output of the right branch in the module for spatial residual, < >>Is F after weight adjustment a ,F a o =(1-γ)F a Sigma is a Sigmoid function, ++>The final output of the module, i.e. the scale-space feature image data, is noted for the spatial residual.
6. A low-light endoscopic image enhancement system, comprising:
the data acquisition module is used for acquiring an initial image of the digestive tract endoscope;
the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.
7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the image enhancement method of an endoscopic image of the alimentary canal as set forth in claims 1-5.
8. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the image enhancement method of an endoscopic image of the alimentary canal as set forth in claims 1-5.
CN202310894600.6A 2023-07-20 2023-07-20 Method, system, equipment and medium for enhancing digestive tract endoscope image Active CN116957968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310894600.6A CN116957968B (en) 2023-07-20 2023-07-20 Method, system, equipment and medium for enhancing digestive tract endoscope image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310894600.6A CN116957968B (en) 2023-07-20 2023-07-20 Method, system, equipment and medium for enhancing digestive tract endoscope image

Publications (2)

Publication Number Publication Date
CN116957968A true CN116957968A (en) 2023-10-27
CN116957968B CN116957968B (en) 2024-04-05

Family

ID=88445628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310894600.6A Active CN116957968B (en) 2023-07-20 2023-07-20 Method, system, equipment and medium for enhancing digestive tract endoscope image

Country Status (1)

Country Link
CN (1) CN116957968B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017183307A1 (en) * 2016-04-20 2017-10-26 富士フイルム株式会社 Endoscope system, image processing device, and image processing device operation method
WO2021067591A2 (en) * 2019-10-04 2021-04-08 Covidien Lp Systems and methods for use of stereoscopy and color change magnification to enable machine learning for minimally invasive robotic surgery
GB202104506D0 (en) * 2021-03-30 2021-05-12 Ucl Business Plc Medical Image Analysis Using Neural Networks
CN113658201A (en) * 2021-08-02 2021-11-16 天津大学 Deep learning colorectal cancer polyp segmentation device based on enhanced multi-scale features
CN114742848A (en) * 2022-05-20 2022-07-12 深圳大学 Method, device, equipment and medium for segmenting polyp image based on residual double attention
WO2022246677A1 (en) * 2021-05-26 2022-12-01 深圳高性能医疗器械国家研究院有限公司 Method for reconstructing enhanced ct image
CN116188340A (en) * 2022-12-21 2023-05-30 上海大学 Intestinal endoscope image enhancement method based on image fusion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017183307A1 (en) * 2016-04-20 2017-10-26 富士フイルム株式会社 Endoscope system, image processing device, and image processing device operation method
WO2021067591A2 (en) * 2019-10-04 2021-04-08 Covidien Lp Systems and methods for use of stereoscopy and color change magnification to enable machine learning for minimally invasive robotic surgery
GB202104506D0 (en) * 2021-03-30 2021-05-12 Ucl Business Plc Medical Image Analysis Using Neural Networks
WO2022208060A2 (en) * 2021-03-30 2022-10-06 Ucl Business Ltd Medical image analysis using neural networks
WO2022246677A1 (en) * 2021-05-26 2022-12-01 深圳高性能医疗器械国家研究院有限公司 Method for reconstructing enhanced ct image
CN113658201A (en) * 2021-08-02 2021-11-16 天津大学 Deep learning colorectal cancer polyp segmentation device based on enhanced multi-scale features
CN114742848A (en) * 2022-05-20 2022-07-12 深圳大学 Method, device, equipment and medium for segmenting polyp image based on residual double attention
CN116188340A (en) * 2022-12-21 2023-05-30 上海大学 Intestinal endoscope image enhancement method based on image fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZIHENG AN 等: "EIEN: Endoscopic Image Enhancement Network Based on Retinex Theory", 《SENSORS》, 31 December 2022 (2022-12-31), pages 9 - 10 *
廉炜雯: "基于深度学习的单幅图像超分辨率重建方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2023 (2023-01-15) *

Also Published As

Publication number Publication date
CN116957968B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
Wang et al. Medical image fusion based on convolutional neural networks and non-subsampled contourlet transform
WO2021077997A1 (en) Multi-generator generative adversarial network learning method for image denoising
CN110675462B (en) Gray image colorization method based on convolutional neural network
CN110232653A (en) The quick light-duty intensive residual error network of super-resolution rebuilding
CN113379661B (en) Double-branch convolution neural network device for fusing infrared and visible light images
CN112837244B (en) Low-dose CT image denoising and artifact removing method based on progressive generation confrontation network
CN112669248B (en) Hyperspectral and panchromatic image fusion method based on CNN and Laplacian pyramid
Chen et al. Blood vessel enhancement via multi-dictionary and sparse coding: Application to retinal vessel enhancing
CN110503614A (en) A kind of Magnetic Resonance Image Denoising based on sparse dictionary study
CN114187214A (en) Infrared and visible light image fusion system and method
Li et al. Single image dehazing with an independent detail-recovery network
CN113012163A (en) Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network
CN115170410A (en) Image enhancement method and device integrating wavelet transformation and attention mechanism
CN116739899A (en) Image super-resolution reconstruction method based on SAUGAN network
Li et al. An adaptive self-guided wavelet convolutional neural network with compound loss for low-dose CT denoising
Li et al. Adaptive weighted multiscale retinex for underwater image enhancement
CN113781489A (en) Polyp image semantic segmentation method and device
Zhou et al. Physical-priors-guided DehazeFormer
Qayyum et al. Single-shot retinal image enhancement using deep image priors
CN116957968B (en) Method, system, equipment and medium for enhancing digestive tract endoscope image
Liu et al. Non-homogeneous haze data synthesis based real-world image dehazing with enhancement-and-restoration fused CNNs
Yue et al. Deep Pyramid Network for Low-light Endoscopic Image Enhancement
CN113808057A (en) Endoscope image enhancement method based on unsupervised learning
CN111462004A (en) Image enhancement method and device, computer equipment and storage medium
Shuang et al. Algorithms for improving the quality of underwater optical images: A comprehensive review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant