CN116957968A

CN116957968A - Method, system, equipment and medium for enhancing digestive tract endoscope image

Info

Publication number: CN116957968A
Application number: CN202310894600.6A
Authority: CN
Inventors: 岳广辉; 高杰; 武泓吕; 周天薇; 汪天富
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-27
Anticipated expiration: 2043-07-20
Also published as: CN116957968B

Abstract

The application belongs to the field of computer-aided disease diagnosis, and discloses a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, which comprise the following steps: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence. The technical scheme of the application can improve the adaptability of the network under the condition of uneven illumination, and gradually recover high-quality endoscope images by fusing global and local features with different scales.

Description

Method, system, equipment and medium for enhancing digestive tract endoscope image

Technical Field

The application belongs to the field of computer-aided disease diagnosis, and particularly relates to a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope.

Background

Colorectal cancer is one of the life threatening diseases of humans today. It ranks second in mortality among all cancers, third in morbidity. Clinically, endoscopy is an effective method of screening for colorectal diseases and preventing early colorectal cancer. In the course of endoscopy, endoscopic images play a critical role in diagnosis and therapy, providing physicians with visual information related to biological tissue. In general, various factors affect imaging quality (e.g., confined space, reflection, intestinal wall, etc.), which can introduce complex distortions into the captured endoscopic image. These distortions include abnormal exposure, low contrast, blurring, ghosting, and mixed effects. Among them, low contrast typically presents a low light appearance (with uneven illumination and noise), and is attracting more and more attention from researchers. In addition, severe degradation of the endoscopic image can also affect subsequent computer-aided diagnosis tasks, such as classification and segmentation of colorectal diseases. Conversely, high quality endoscopic images make it easier for a physician to identify tissue details. Therefore, it is important to improve the quality of endoscopic images by low-Light Image Enhancement (LIE) techniques to improve the accuracy of diagnostic results and to provide a reliable basis for subsequent diagnostic work. This is because there are some limitations to the existing endoscopic image enhancement methods. On the one hand, most methods analyze images mainly in a single-scale manner, and thus have limited non-uniform illumination capability for a comprehensive understanding of dimensional changes. On the other hand, most methods do not make full use of the context information in the feature extraction process, which is detrimental to understanding semantic information for noise suppression.

Over the past few years, many image enhancement methods have been proposed from different perspectives. In the early stages, researchers have invested a great deal of effort in designing histogram-based methods. These methods mainly extend the dynamic range of the input image to obtain a more uniform pixel intensity distribution. However, most histogram-based methods do not adjust the illumination well, and may produce excessive or insufficient enhancement results. Furthermore, these methods typically ignore noise in weak areas. Another popular solution to the Lie task is to decompose the image into illumination and reflectance based on Retinex theory. Existing Retinex-based methods are mainly based on removing the illumination and taking the reflectivity as the result of the enhancement, or combining the adjusted illumination with the reflection. However, ambiguity of the decomposition result tends to result in unnatural output. With the success of Deep Learning (DL) in various visual tasks, the image enhancement problem has translated into an image translation problem that does not rely on physical assumptions. However, existing work has focused mainly on low-light natural images, and little research has been done on low-Light Endoscopic Image Enhancement (LEIE).

Disclosure of Invention

The application aims to provide a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, so as to solve the problems existing in the prior art.

In order to achieve the above object, the present application provides a method for enhancing an image of an digestive tract endoscope, comprising:

acquiring an initial image of an digestive tract endoscope;

inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope;

the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.

Optionally, the training method of the image enhancement model specifically includes:

acquiring training data; the training data comprises a digestive tract endoscope training image and a corresponding enhanced image;

and respectively inputting the training data into a main branch, a first auxiliary branch and a second auxiliary branch for image enhancement, carrying out feature fusion on the enhanced images of the three branches, and training with the minimum structural acquaintance loss function between the initial training result after fusion and the reference enhanced image corresponding to the digestive tract endoscope training image as a target to obtain the image enhancement model.

Optionally, the processing procedure of the scale-space feature extraction block includes:

inputting the initial image of the digestive tract endoscope into the context feature extraction module for context information extraction and isolated noise denoising processing to obtain denoised image data;

and taking the denoising image data as the input of the spatial residual error attention module, and performing self-adaptive focusing processing to obtain the scale space characteristic image data.

Optionally, inputting the initial image of the digestive tract endoscope into the context feature extraction module for extracting context information and filtering isolated noise to obtain denoising image data, which specifically includes:

the context feature extraction module comprises a lower branch structure and an upper branch structure which are identical in structure, and the initial image of the digestive tract endoscope is used as an input featureRespectively at the lower partsPerforming 3×3 convolution processing on the input features in the branch structure and the upper branch structure to obtain a first upper branch feature map +.>And a first lower branch characteristic diagram F _l ^5,1 In said upper branch, for said first upper branch profile +.>Input features->And the first lower branch feature diagram F _l ^5,1 Sequentially connected in series according to the channel size to obtain a second upper branch characteristic diagram +.>For the first upper branch feature map in the lower branchInput features->And the first lower branch feature diagram F _l ^5,1 Sequentially connected in series according to the channel size to obtain a second lower branch characteristic diagram F _l ^5,2 For the second upper branch feature map +.>Performing 3×3 convolution processing to obtain third upper branch feature diagram +.>-third upper branch feature map->And the second lower branch feature diagram F _l ^5,2 Performing tandem processing, and associating the processing result with the input feature +.>Forward connection is carried out to obtain a fourth upper branch specialSyndrome/pattern of->For the fourth upper branch feature mapPerforming a 3 x3 convolution process and combining the result with said input feature +.>First upper branch feature map->And third upper branch feature map->Is subjected to a series processing, and the result of the series processing is subjected to a 1 x 1 convolution processing to obtain the output characteristic of the upper branch ∈ ->

Information interaction is carried out in the lower branch by utilizing the characteristic diagrams of the upper branch to obtain the output characteristics of the lower branch

Output characteristics of the upper branchAnd output characteristics of the lower branch->Performing concatenation and sequentially performing 1×1 convolution and a 3×3 convolution, and combining the processing result with the input feature +.>Combining to obtain the output of the context feature extraction module CFEM>Namely denoising the image data;

wherein, obtainTaking the output of the context feature extraction module CFEMThe calculation formula of (2) is as follows:

in the formula Conv ₁ And Conv ₃ Representing 1 x 1 and 3 x3 convolution operations, respectively;representing the connection operation.

Optionally, the denoising image data is used as input of the spatial residual error attention module to perform adaptive focusing processing to obtain scale space feature image data, which specifically includes:

the spatial residual attention module comprises a left branch and a right branch which have the same structure, the denoising image data is subjected to maximum pooling processing in the left branch, and the spatial residual attention processing is performed to obtain a left branch processing result;

carrying out average pooling treatment on the denoising image data in the right branch, and carrying out space residual error attention treatment to obtain a right branch treatment result;

acquiring scale space feature image data based on the left branch processing result and the right branch processing result;

the calculation formula for acquiring the scale space characteristic image data is as follows:

in the method, in the process of the application,for the output of the context feature extraction module CFEM,/i>For the feature map after the maximum pooling treatment, < > is given>Note the output of the left branch in the module for spatial residual, < >>Is +.>Gamma is learning parameter, < >>Note the output of the right branch in the module for spatial residual, < >>Is +.> Sigma is a Sigmoid function, < >>The final output of the module, i.e. the scale-space feature image data, is noted for the spatial residual.

A low-light endoscopic image enhancement system, comprising:

the data acquisition module is used for acquiring an initial image of the digestive tract endoscope;

the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.

An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform an image enhancement method of an endoscopic image of the alimentary canal.

A computer readable storage medium storing a computer program which, when executed by a processor, implements an image enhancement method of an endoscopic image of the alimentary canal.

The application has the technical effects that:

the application provides a method, a system, equipment and a medium for enhancing an image of an alimentary canal endoscope, wherein the method comprises the following steps: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.

The present application proposes a novel depth pyramid enhancement network (DPENet) for enhancing endoscopic images in low light. Specifically, dpeneet has an image pyramid structure, consisting of three parallel branches to extract global and local features on different scales. The jump connection strategy and the cross-branch compensation strategy are used in each branch, so that intra-layer and inter-layer fusion is realized, and the multi-scale characteristics are fully utilized. Such a structure helps the network understand the uneven illumination in the image. To suppress isolated noise, dpeneet has multiple scale-Space Feature Extraction Blocks (SFEBs) set in each branch. SFEB is composed of a Context Feature Extraction Module (CFEM) and a Spatial Residual Attention Module (SRAM). CFEM mines semantic information by extracting context information to filter orphan noise. SRAM utilizes a spatial attention mechanism to help the network adaptively focus on dark areas. DPENT gradually restores high quality endoscopic images by fusing global and local features of different scales.

The present application uses residual connection to connect the input image with the cascading features of the three branches to generate an enhanced image. Residual connection helps to mitigate excessive dependence of the model on noise or low frequency signals, thereby generating a more realistic image.

In each branch, the present application utilizes a jump connection from the encoder transfer unit to the decoder within the same branch to reduce the problems of gradient vanishing and network degradation. Furthermore, the same size units of different branches are connected by offset connections to integrate global and local information. The pyramid structure is beneficial to extracting and aggregating global and local features on different scales by the network, and improves the adaptability of the network under uneven illumination conditions.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a block diagram of a network in accordance with an embodiment of the present application;

FIG. 2 is a block diagram of a context feature extraction module CFEM network in an embodiment of the present application;

fig. 3 is a block diagram of a spatial residual attention module SRAM network in an embodiment of the present application.

Detailed Description

Various exemplary embodiments of the application will now be described in detail, which should not be considered as limiting the application, but rather as more detailed descriptions of certain aspects, features and embodiments of the application.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. In addition, for numerical ranges in this disclosure, it is understood that each intermediate value between the upper and lower limits of the ranges is also specifically disclosed. Every smaller range between any stated value or stated range, and any other stated value or intermediate value within the stated range, is also encompassed within the application. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. Although the application has been described with reference to a preferred method, any method similar or equivalent to those described herein can be used in the practice or testing of the present application. All documents mentioned in this specification are incorporated by reference for the purpose of disclosing and describing the methodologies associated with the documents. In case of conflict with any incorporated document, the present specification will control.

It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the application described herein without departing from the scope or spirit of the application. Other embodiments will be apparent to those skilled in the art from consideration of the specification of the present application. The specification and examples of the present application are exemplary only.

As used herein, the terms "comprising," "including," "having," "containing," and the like are intended to be inclusive and mean an inclusion, but not limited to.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Example 1

As shown in fig. 1 to 3, the present embodiment provides a method, a system, an apparatus and a medium for enhancing an image of an alimentary canal endoscope, wherein the method includes: acquiring an initial image of an digestive tract endoscope; inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.

A low-light endoscopic image enhancement system, comprising: the data acquisition module is used for acquiring an initial image of the digestive tract endoscope; the image enhancement module is used for inputting the initial image of the digestive tract endoscope into an image enhancement model for feature enhancement to obtain a feature enhancement image of the digestive tract endoscope; the image enhancement model comprises a main branch, a first auxiliary branch and a second auxiliary branch; each branch adopts an encoder-decoder structure, and each branch comprises a plurality of scale space feature extraction blocks connected through a jump connection strategy and a cross-branch compensation strategy; each scale space feature extraction block comprises a context feature extraction module and a space residual attention module which are connected in sequence.

The present embodiment proposes a high-performance endoscopic image enhancement method based on uneven brightness of an endoscopic image, aiming at the problem of low-illuminance enhancement of an endoscopic image. Aims to assist endoscopists to accurately and rapidly complete endoscopy and diagnosis operations through an efficient endoscopic image enhancement method. The system flow is as follows: this embodiment proposes a new network named DPENet for enhancing low-light endoscopic images. In this context, the present embodiment proposes a novel depth pyramid enhancement network (dpene) for enhancing endoscopic images in low light. Specifically, dpeneet has an image pyramid structure, consisting of three parallel branches to extract global and local features on different scales. The jump connection strategy and the cross-branch compensation strategy are used in each branch, so that intra-layer and inter-layer fusion is realized, and the multi-scale characteristics are fully utilized. Such a structure helps the network understand the uneven illumination in the image. To suppress isolated noise, dpeneet has multiple scale-Space Feature Extraction Blocks (SFEBs) set in each branch. SFEB is composed of a Context Feature Extraction Module (CFEM) and a Spatial Residual Attention Module (SRAM). CFEM mines semantic information by extracting context information to filter orphan noise. SRAM utilizes a spatial attention mechanism to help the network adaptively focus on dark areas. By fusing global and local features of different scales, the DPENet of the present embodiment gradually restores high quality endoscopic images.

The structure of DPENet is shown in fig. 1. DPENet has a pyramidal structure, comprising one main branch and two auxiliary branches. Each branch follows an encoder-decoder structure, where the present embodiment provides a plurality of cells. The unit includes a plurality of densely connected SFEB modules, and SFEBs are aggregated by a multi-scale fusion operation for enhanced feature representation. In general, the main branch reserves 30 SFEBs, the first auxiliary branch reserves 10 SFEBs, and the second auxiliary branch reservesLeaving 2 SFEBs. SFEB aims to suppress noise, comprising two key modules: CFEM and SARM. In view of the positive role of global and local features in understanding uneven illumination, the present embodiment will input image I _in Downsampling at a ratio of 2 times to 4 times, and respectively obtaining imagesAnd->As input to the first and second auxiliary branches. Such an operation facilitates the network to extract global and local features on different scales. After processing the image through three branches, the present embodiment uses residual connection to process the input image I _in And the enhancement image is generated by connecting with the cascading characteristics of the three branches. Residual connection helps to mitigate excessive dependence of the model on noise or low frequency signals, thereby generating a more realistic image.

In each branch, the present embodiment utilizes a jump connection from the encoder to the decoder within the same branch to reduce the problems of gradient extinction and network degradation, and in addition, the present embodiment connects the same-sized units of different branches through an offset connection to integrate global and local information, such a pyramid structure helps the network extract and aggregate global and local features on different scales, improving the adaptability of the network under uneven lighting conditions.

The details of the network operation are as follows:

1. context feature extraction module CFEM:

fig. 2 shows the architecture of CFEM proposed in this embodiment. As shown, the CFEM has a dual-branch structure. The upper branch first processes the input features using a 3 x3 convolutionObtain->Then, will->And feature map F from lower branch _l ^5,1 And->Sequentially connecting the two channels in series according to the channel size to obtain a characteristic diagram +.>Such operations facilitate interactions of contextual features on different scales. Next, will->By 3 x3 convolution (get +.>) And feature map F from the lower branch _l ^5,2 Is treated in series and +.>And->Is connected in the forward direction. Finally, the obtained profile +.>By 3X 3 convolution (give F _u ^3,3 ) Processing and->And->Is connected in series. By processing the series characteristic map using 1×1 convolution, the output +.>Here, the use of forward connections aims at reusing the features of the different phases, enhancing the representational capacity of the network. The above operations can be expressed as:

wherein the method comprises the steps of

In equations 1 and 2, conv ₁ And Conv ₃ Representing 1 x 1 and 3 x3 convolution operations respectively,representing the connection operation, the lower branch and the upper branch have the same structure, but the characteristic diagram of the upper branch is used for information interaction, as shown in fig. 2;

in obtaining output from upper and lower branchesAnd->) After that, the present embodiment further connects them and sequentially processes the connected feature map by a 1×1 convolution and a 3×3 convolution, and then CFEM output +.>By combining the obtained features with the input features +.>And (3) adding:

2. space residual attention module SRAM

The present embodiment proposes an SRAM to help the network focus on dark areas, inspired by the visual attention mechanism. Fig. 3 shows the structure of the proposed SRAM, which has a dual-branch structure. In the left branch of the SRAM, the input features (i.e., the output features of the CFEM) First go through maximumThe MaxPool () process is pooled and then subjected to Spatial Residual Attention (SRA) block processing. In SRA, feature map->Obtaining +.about.3 through a 3×3 convolution, a ReLU function and a 3×3 convolution>At the same time (I)>Also by a 1X 1 convolution, to obtain +.>The present embodiment will->And->And adding to obtain the final product. Then, the learnable parameters γ and +.>Multiplying to obtain +.>Gamma is used for adjusting->Is a weight of (2). The above operations can be summarized as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

in the case of the equation 4 of the present application,representing element-wise multiplication.

The right branch and the left branch of the SRAM have similar structures, but there are two differences. One difference is that the right branch uses average pooling instead of maximum pooling to process the input features. The motivation for this is that the two pooling operations compensate each other, while using them more helps to extract rich features than using only one. Another difference is the output of the SRA in the right branchThe weight of (2) is set to 1-gamma.

After processing the input features using two branches, the final output features of the SRAM can be obtained by:

wherein, in the formula, the chemical formula,for the output of the context feature extraction module CFEM,/i>For the feature map after the maximum pooling treatment, < > is given>Note the output of the left branch in the module for spatial residual, < >>Is +.>Gamma is learning parameter, < >>Note the output of the right branch in the module for spatial residual, < >>Is +.> Sigma is a Sigmoid function, < >>The final output of the module is noted for spatial residual.

In the training stage, parameters such as a loss function, a batch size, an epoch size, a proportion of data set division and the like are set: in order to better improve the performance of the network, the embodiment adopts a deep supervised learning mode to train the model, and although the L2 loss is easy to optimize in the training process, excessive smoothness of the background and ghost artifacts can be caused. In this context, the present embodiment employs Structural Similarity Index (SSIM) loss as an alternative approach. SSIM loss L _S Similarity between reconstructed image and original image is measured by considering structural similarity: . The calculation formula of the loss function is as follows:

where M represents the mth pixel and M is the number of pixels in the image. I _out And I _gt Representing the reconstructed image and the real image, respectively. S (·, ·) is the SSIM index, defined as:

wherein mu _p (m) and mu _g (m) is the mean of blocks around the mth pixel in the reconstructed image and the real image, respectively.And->Representing the variance of the two blocks, σ _p,g (m) represents their covariance. Constant c ₁ And c ₂ For preventing division by zero.

The present embodiment implements a network using a pyrerch framework, and the present embodiment trains the network for 350 epochs using an ADAM optimizer. The learning rate was set to 0.005 and reduced 10-fold at 210 th and 280 th epoch, and the input size and batch size were set to 128 x 28 and 12, respectively. All experiments were performed on a server equipped with two NVIDIAGTX3090 GPUs and two IntelXeonSilver4214RCPU@2.40GHz. In the test stage, the average value of the output results of the 10 network models is taken as a final evaluation value. Here, four widely used evaluation indexes were selected in this example: PSNR, SSIM, LPIPS and VIF.

The experimental data set and Endo4IE were used as the present example, and the endoscope data set collected in the present example had 2000 pieces of size 512×512, and 1056 pieces of data set were used for the Endo4IE data set, of size 5212×512.

The network proposed in this embodiment will be compared with classical image enhancement aNet, wherein the structures of the data set and the Endo4IE data set proposed in this embodiment are shown in the figure, and it can be seen from table 1 that the network in this embodiment performs optimally in each index, where table 1 is the experimental result of this embodiment.

TABLE 1

The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method of enhancing an image of an alimentary canal endoscope, comprising:

acquiring an initial image of an digestive tract endoscope;

2. The method for enhancing an image of an alimentary canal endoscope according to claim 1, characterized in that said method for training an image enhancement model comprises:

3. The method of claim 1, wherein the processing of the scale-space feature extraction block comprises:

4. The method for enhancing an image of an alimentary canal endoscope according to claim 3, wherein inputting the initial image of the alimentary canal endoscope into the contextual feature extraction module performs contextual information extraction and isolated noise filtering processing to obtain de-noised image data, specifically comprising:

the context feature extraction module comprises a lower branch structure and an upper branch structure which are identical in structure, and the initial image of the digestive tract endoscope is used as an input featurePerforming 3×3 convolution processing on the input feature in the lower branch structure and the upper branch structure respectively to obtain a first upper branch feature map +.>And a first lower branch characteristic diagram F _l ^5,1 In said upper branch, for said first upper branch profile +.>Input features->And the first lower branch feature diagram F _l ^5,1 Sequentially connected in series according to the channel size to obtain a second upper branch characteristic diagram +.>In the lower branch, the first upper branch feature map +.>Input features->And the first lower branch feature diagram F _l ^5,1 Sequentially connected in series according to the channel size to obtain a second lower branch characteristic diagram F _l ^5,2 For the second upper branch feature map +.>By 3X 3Convolution processing to obtain a third upper branch feature map +.>-third upper branch feature map->And the second lower branch feature diagram F _l ^5,2 Performing tandem processing, and associating the processing result with the input feature +.>Performing forward connection to obtain a fourth upper branch characteristic diagram +.>For the fourth upper branch feature map +.>Performing a 3 x3 convolution process and combining the result with said input feature +.>First upper branch feature map->And a third upper branch feature mapIs subjected to a series processing, and the result of the series processing is subjected to a 1 x 1 convolution processing to obtain the output characteristic of the upper branch ∈ ->

The output of the upper branch is speciallySign of signAnd output characteristics of the lower branch->Performing concatenation and sequentially performing 1×1 convolution and a 3×3 convolution, and combining the processing result with the input feature +.>Combining to obtain the output of the context feature extraction module CFEM>Namely denoising the image data;

wherein the output of the context feature extraction module CFEM is obtainedThe calculation formula of (2) is as follows:

5. The method of claim 3, wherein the step of performing adaptive focusing processing on the de-noised image data as an input to the spatial residual attention module to obtain scale-space feature image data comprises:

in the method, in the process of the application,for the output of the context feature extraction module CFEM,/i>F, obtaining a feature map after the maximum pooling treatment _m Note the output of the left branch in the module for spatial residual, < >>Is F after weight adjustment _m Gamma is a learnable parameter for adjusting the weight;F _a note the output of the right branch in the module for spatial residual, < >>Is F after weight adjustment _a ，F _a ^o ＝(1-γ)F _a Sigma is a Sigmoid function, ++>The final output of the module, i.e. the scale-space feature image data, is noted for the spatial residual.

6. A low-light endoscopic image enhancement system, comprising:

7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the image enhancement method of an endoscopic image of the alimentary canal as set forth in claims 1-5.

8. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the image enhancement method of an endoscopic image of the alimentary canal as set forth in claims 1-5.