CN115018767A - Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning - Google Patents

Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning Download PDF

Info

Publication number
CN115018767A
CN115018767A CN202210477177.5A CN202210477177A CN115018767A CN 115018767 A CN115018767 A CN 115018767A CN 202210477177 A CN202210477177 A CN 202210477177A CN 115018767 A CN115018767 A CN 115018767A
Authority
CN
China
Prior art keywords
image
white light
light image
wli
narrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210477177.5A
Other languages
Chinese (zh)
Inventor
颜波
钟芸诗
谭伟敏
蔡世伦
林青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202210477177.5A priority Critical patent/CN115018767A/en
Publication of CN115018767A publication Critical patent/CN115018767A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of medical image processing, and particularly relates to a cross-mode endoscope image conversion and lesion area segmentation method. The invention converts the white light image of the digestive tract endoscope into a high-quality narrow-band image through the constructed neural network based on the eigen expression learning; obtaining the essential features of the white light image by using an unsupervised training essential feature extractor, and predicting a focus area through a cavity space convolution pooling pyramid network to obtain a segmentation result of the focus area; during testing, the white light image to be tested only needs to be transmitted with one auxiliary narrow-band image in a forward direction once, and the narrow-band image corresponding to the white light image can be obtained. The method adopts an unsupervised learning mode, has good generalization and has excellent effect on different endoscope devices. The invention can provide additional narrow-band imaging for white light endoscope equipment, provides better reference for doctor diagnosis, and can automatically position the focus region based on narrow-band image-assisted focus region segmentation, thereby greatly improving the disease diagnosis efficiency and reducing the morbidity and mortality.

Description

Cross-mode endoscope image conversion and lesion segmentation method based on intrinsic representation learning
Technical Field
The invention belongs to the technical field of medical image processing, and particularly relates to a cross-mode endoscope image conversion and lesion area segmentation method.
Background
With the rapid development of medical technology, endoscopes have been widely used in the fields of clinical diagnosis, treatment, and surgery. The endoscope can be directly inserted into body cavity for observation, and can display tissue form of internal organ [1] . Colon cancer and esophageal cancer are two diseases with high late mortality in the world. Fortunately, they can be diagnosed at an early stage by colonoscopy and enteroscopy, which can effectively improve survival [2,3]
Among the endoscopic imaging techniques, white light endoscopic imaging (WLI) is the most widely used, allowing good visualization of vascular structures and surrounding mucosa. However, lesions of early cancer are generally limited to the mucosal and submucosal layers. WLI is relatively limited in diagnostic efficacy, exhibiting low sensitivity and specificity. Therefore, Narrow Band Imaging (NBI) is becoming a new trend for early cancer diagnosis. NBI can observe the boundary of the lesion margins, clearly observing the morphology and distribution of the microvasculature. Under NBI, superficial blood vessels are brown and central internal blood vessels are blue-green, which enhances visualization of capillary and mucosal morphology. It can significantly increase the difference between the lesion and the surrounding mucosa, thereby improving the judgment of the lesion areaAccuracy of fracture [4]
WLI can help physicians locate lesions, while NBI can better see the boundaries and extent of lesions. Thus, if both WLI and NBI can be seen simultaneously, the advantages of both modes can be exploited. In fact, most endoscopic devices cannot display both WLI and NBI simultaneously, since NBI requires much light to be filtered out. Therefore, the WLI is converted into the NBI through a deep learning technology, and the WLI has great social value.
In the field of computer vision, the task of image-to-image conversion aims at learning the mapping between different domains, so as to generate images similar to the target. Some deep learning methods have made great progress, Pix2Pix [5] A method of pairwise image transformation based on pixel-level correspondence datasets is proposed. CycleGAN due to difficulty in obtaining paired datasets [6] A non-paired image transformation method based on cycle consistency is provided. The existing image conversion method only considers style conversion, does not pay attention to the association between two modes, and cannot learn intrinsic representation related to medical information. Therefore, they cannot serve downstream medical tasks such as anomaly detection and lesion region segmentation.
The invention provides a novel method for converting a WLI image into an NBI image based on an unsupervised eigen expression learning method, fully learns eigen expression between two modes, can convert the WLI image into a high-quality NBI image, provides an effective basis for diagnosis of doctors, and can be used for segmentation of a focus region and improvement of the detection rate of digestive tract diseases.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a cross-mode endoscope image conversion and lesion area segmentation method based on unsupervised intrinsic expression learning, so as to eliminate the influence of human factors, realize the conversion from a WLI endoscope image to an NBI endoscope image and simultaneously perform the segmentation of a lesion area.
The invention provides a cross-mode endoscope image conversion and focus region segmentation method based on unsupervised intrinsic representation learning, which comprises the following specific steps:
converting a White Light Image (WLI) of a digestive tract endoscope into a high-quality narrow-band image (NBI) through a constructed neural network based on eigen-representation learning;
and (II) acquiring the essential features of the white light image by using the essential feature extractor obtained in the step (I) through a cavity space convolution pooling pyramid network (ASPP), and predicting the focus region of the white light image to obtain the segmentation result of the focus region.
In step (one), for a given White Light Image (WLI), the goal is to generate a corresponding narrowband image (NBI). Previous image translation methods generally considered cross-modality image conversion as a style transfer task. This process, while producing a realistic looking result, may destroy the original high-level information and the basic features of the image that are not applicable to medical images. This may not only result in loss of useful information, but may also affect downstream tasks or the judgment of the physician. Depending on the light model, the present invention assumes that the endoscopic images can be decoupled into optical information and essential features. According to the invention, the neural network is adopted to recombine the optical information of the other mode with the essential characteristics of the mode to obtain the corresponding cross-mode image.
The neural network is a symmetrical network structure. For white light image I WLI Passing through a specific modal characterization encoder E for obtaining optical information MS Obtaining the optical characteristics of white light
Figure BDA0003626302080000021
At the same time, white light image I WLI Modal invariant feature encoder for obtaining modal invariant features
Figure BDA0003626302080000022
Obtaining essential characteristics of white light image
Figure BDA0003626302080000023
Similarly, for narrow-band image I NBI Passing through a modal-specific feature encoder E for obtaining optical information MS Obtaining optical characteristics of NBI
Figure BDA0003626302080000024
A mode-invariant feature encoder for obtaining mode-invariant features
Figure BDA0003626302080000025
Obtaining essential features of NBI images
Figure BDA0003626302080000026
Figure BDA0003626302080000027
And
Figure BDA0003626302080000028
combined input white light image generator G WLI In generating white light images
Figure BDA0003626302080000029
Figure BDA00036263020800000210
And
Figure BDA00036263020800000211
combined input narrowband image generator G NBI In generating narrow-band imaging images
Figure BDA00036263020800000212
Two essential characteristics
Figure BDA00036263020800000213
And
Figure BDA00036263020800000214
respectively input into an intrinsic generator G Eigen All can generate an intrinsic representation I Eigen
Figure BDA00036263020800000215
And
Figure BDA00036263020800000216
the weights are shared. White light pattern to be generated
Figure BDA00036263020800000217
And the NBI image are respectively sent to a discriminator D for distinguishing the generated image from the real image Gen And obtaining a classification result for generating a vivid medical image against learning.
In order to increase the diversity of samples to be popularized to different devices and modes, the neural network adopted by the invention has a loop structure, namely, is a loop network, and does not use pixel-level loss to constrain the transformed image. Specifically, the cycle is as follows: white light image to be generated
Figure BDA00036263020800000218
Passing through a modal-specific feature encoder E for obtaining optical information MS Obtaining new white light optical characteristics
Figure BDA00036263020800000219
Modal invariant feature encoder by obtaining modal invariant features
Figure BDA00036263020800000220
Obtaining new essential characteristics of white light image
Figure BDA0003626302080000031
Similarly, new optical characteristics of NBI images can be obtained
Figure BDA0003626302080000032
And essential characteristics
Figure BDA0003626302080000033
Figure BDA0003626302080000034
And
Figure BDA0003626302080000035
combined input white light image generator G WLI In the method, white light image is generated
Figure BDA0003626302080000036
And
Figure BDA0003626302080000037
combined input narrowband image generator G NBI In the method, narrow-band image is generated
Figure BDA0003626302080000038
White light image obtained after circulation
Figure BDA0003626302080000039
And narrow band imaging
Figure BDA00036263020800000310
White light image I to be input originally WLI And narrow band image I NBI Consistently, i.e., pixel-level penalties may be used to constrain.
Further, since intrinsic features cannot be directly constrained, in order to better acquire medical intrinsic features, the invention employs an intrinsic countervailing learning strategy to generate reasonable intrinsic images. Because the intrinsic image has no real ground truth, the invention adopts a circulating mode to lead the network learning to be represented efficiently and reasonably. First, to ensure E MI Extracting meaningful essential representation, and providing an intrinsic image generator G Eigen . Hope for E MI The method has good universality, and the nature of the endoscope image can be observed, so that the cross-modal and cross-device images can keep better consistency. Taking WLI as an example, the essential characteristics are assumed
Figure BDA00036263020800000311
Can effectively express the organization in the WLI image, and we can obtain an essence map
Figure BDA00036263020800000312
With the same depth characteristics as the input WLI. Similarly, the essence diagrams
Figure BDA00036263020800000313
Feeding into E MI Intermediate features may be encoded that have the same essential features as the input WLI.
Figure BDA00036263020800000314
Generated only by essential features and should not contain any optical information. Therefore, will be essentially illustrated
Figure BDA00036263020800000315
Feeding into E MS A complete zero vector can be generated.
Further, although the cyclic network can learn a reasonable expression, the deep neural network tends to become lazy in the training process, thereby making the generated essential image unreal. Furthermore, due to the lighting conditions and limitations of the display device, we cannot obtain a true intrinsic image. Thus, the present invention constructs a feature discriminator D Eigen And classifying the generated characteristic images. Since the intrinsic images exhibit true common features of both WLI and NBI images, we expect the intrinsic distributions to be close to both WLI and NBI. D Eigen The generated feature images are distinguished from the real WLI and NBI images, and G Eigen To generate a true essence image to confuse the discriminator. Through antagonism learning, not only can a more realistic intrinsic image be obtained, but also E can be encouraged MI The essential features of WLI and NBI are better captured for downstream medical tasks.
In the invention, during the neural network test, an endoscope white light image I is input WLI Through an intrinsic feature extractor E MI Extracting essential features, inputting an additional endoscope narrow-band imaging image, extracting narrow-band optical features through an optical feature extractor, combining the two features, and sending the combined features to an input narrow-band image generator G NBI In the method, a narrow-band imaging image can be obtained through one-time forward propagation
Figure BDA00036263020800000316
In the invention, the loss of the neural network used in the training process is specifically designed as follows:
two essential characteristics
Figure BDA00036263020800000317
And
Figure BDA00036263020800000318
input eigen generator G Eigen Respectively generating a white light image eigenrepresentation
Figure BDA00036263020800000319
And narrowband image intrinsic representation
Figure BDA00036263020800000320
These two eigenrepresentations are constrained using the LPIPS penalty as follows:
Figure BDA0003626302080000041
in a narrow-band imaging branch, optical features
Figure BDA0003626302080000042
And narrow band nature
Figure BDA0003626302080000043
Combined input white light image generator G WLI In generating white light images
Figure BDA0003626302080000044
Pass through discriminator D gen Calculating the GAN loss L GAN . White light image intrinsic representation
Figure BDA0003626302080000045
Re-input intrinsic feature extractor E MI Extracting again the eigenrepresentation
Figure BDA0003626302080000046
Optical characteristics of the original white light image
Figure BDA0003626302080000047
Combined input white light image generator G WLI To generate a reconstructed white light image
Figure BDA0003626302080000048
In that
Figure BDA0003626302080000049
And the original white light image I WLI Calculates the cyclic loss L between cycle . Generated white light image intrinsic representation
Figure BDA00036263020800000410
And narrowband image intrinsic representation
Figure BDA00036263020800000411
Input eigen discriminator D Eigen In (1), calculating a modal invariance loss as follows:
L Eigen =E[logD Eigen (I WLI ,I NBI )]+E[log(1-D Eigen (I Eigen ))], (2)
the ideal essential features should not contain optical features and therefore the white light image to be generated is intrinsically represented
Figure BDA00036263020800000412
Re-input into the optical feature extractor E MS Extracting the optical feature of the intrinsic feature
Figure BDA00036263020800000413
The feature loss was calculated as follows:
Figure BDA00036263020800000414
therefore, when the neural network is trained, the final loss function is:
L=λ 1 L perceptual2 L Eigen3 L feature4 L cycle5 L GAN , (4)
wherein λ is i Is the weight used to balance the individual loss functions, i ═ 1,2, …, 5; in the present invention, based on experience, λ is taken 1 =10,λ 2 =1,λ 3 =1,λ 4 =10,λ 5 =1。
In the step (II), the essential characteristics of the white light image are obtained by using the essential characteristic extractor obtained in the step (I), and the method adopts a cavity space convolution pooling pyramid (ASPP) network [7] Dividing the focus area of the digestive tract; the specific process is as follows:
the void space convolution pooling pyramid (ASPP) network is connected with a fixed essential feature extractor E MI A rear side; ASPP convolves parallel samples for a given input with holes of different sampling rates. Endoscopic white light image pass E MI And (4) providing essential characteristics, inputting the essential characteristics into an ASPP network, and directly outputting a segmentation result of the lesion area. Unlike conventional semantic segmentation networks, first, the present invention does not take an image as an input to the segmentation network, but uses an essential feature extractor E MI The extracted intermediate features. Second, an intrinsic feature extractor E MI Is trained in an unsupervised way, and when training a semantic segmentation task, the intrinsic feature extractor E MI This greatly reduces the amount of parameters and time required for training. Finally, due to the advantage of unsupervised training in generalization capability, the semantic segmentation network provided by the invention can be directly applied to other data sets without any retraining or fine tuning, so that a better cross-device effect is realized.
In the invention, the intrinsic feature extractor can map images of different modalities into the same feature space, so that for images of cross-equipment, the invention can generate similar intrinsic features, and the focus detection of the invention has better cross-equipment effect.
In the invention, the converted narrow-band image can be obtained by using the white light image conversion network, and the narrow-band image generated by the network and the original white light image are simultaneously used as the input of the segmentation network, so that the accurate lesion segmentation result of the white light image can be output. And using Unet as a basic network framework, extracting the characteristics of the white light image by one encoder, extracting the characteristics of the narrow-band image obtained by converting the white light image by the other encoder, and inputting the two characteristics into a decoder together to predict the segmentation result of the lesion region.
The present invention also provides the first multi-modal (WLI and NBI) esophageal endoscopic video dataset and pixel-level aligned paired esophageal endoscopic image dataset. The data set used was collected based on a binge (Pentax) device, including 34 videos and 8700 pairs of WLI and NBI images, and the images used in the test set were not coincident with the corresponding patient and training set. The construction of the data set specifically comprises:
(1) selection of a data object. The lack of large-scale paired datasets is one of the key challenges that hinder the endoscopic image conversion task. Due to the special structural design, the binge endoscope can display WLI and NBI almost synchronously in real time. Therefore, we collected a video data set of esophageal endoscopy containing 34 videos including 29 videos of normal esophagus and 5 videos of abnormal esophagus from affiliated hospitals of certain university in Shanghai. The time span is from 4 months to 5 months in 2021. Each video is about 1 to 5 minutes in length, with abnormal esophageal video lengths generally being longer. The total duration of the normal video is 39 minutes 57 seconds, and the total duration of the abnormal video is 12 minutes 52 seconds;
(2) images were taken by endoscopy. All patients received gastroscopy after intravenous anesthesia. The camera tool employs a binge EPK-i 7000. The endoscopist selects Pentax's dual mode to implement WLI and NBI dual screen displays. Esophageal endoscopy was performed by regression and recorded. The endoscopist starts the examination from the cardia (40 cm from the incisors) and then slowly pushes the mirror image until the esophagus begins (15 cm from the incisors). For suspected lesions, the endoscopist may repeat the viewing and filming and take multiple shots at different angles. Each set of videos was used as a case and included in the analysis. Please note that in order to ensure the quality of the recorded video, the endoscopist tries to avoid the out-of-focus during the shooting and to avoid the interference caused by breathing, heartbeat, mucus, air bubbles, blood, etc. If the video is fuzzy, shooting again;
(3) construction of paired multimodal datasets. The WLI image of Pentax is read by three color signals, while the NBI image is acquired by digital image technology, which requires some time to process. When the lens is moved slowly, this processing time is negligible, and therefore a nearly aligned cross mode image can be obtained. However, when the lens moves rapidly or the patient's breathing and heartbeat cause severe esophageal jitter, the images of the two modes displayed in the video frame will appear significantly distorted, blurred, etc. Therefore, we pre-process the captured video frames.
First, we manually remove some of the apparently blurred frames, as well as the beginning and ending frames of each video. These frames typically show anomalous images due to the debugging equipment. Second, according to the imaging principle and motion law of video frames, WLI will occur before NBI at the same location. Thus, three adjacent frames of WLI and NBI are compared and the image residuals are used to find the corresponding WLI and NBI images. Based on the video dataset, 11 normal and 5 abnormal videos were used to balance the samples and a paired multimodality endoscopy dataset was constructed containing 8700 pairs of WLI and NBI images. WLI and NBI images can almost achieve pixel level alignment. To our knowledge, this is the first multi-modal (WLI and NBI) esophageal endoscopic video data set and pixel-level aligned paired esophageal endoscopic image data set.
The invention has the beneficial effects that: the invention designs an image conversion network based on eigen expression learning, which can convert a WLI image of an alimentary tract endoscope into a high-quality NBI image, and meanwhile, the NBI image can assist the WLI image to realize the segmentation of a focus area. The WLI image to be detected can obtain the NBI image corresponding to the WLI image only by one-time forward propagation with one auxiliary NBI image. The method adopts an unsupervised learning mode, has good generalization performance and has excellent effect on different endoscope devices. The invention can provide extra NBI imaging for WLI endoscope equipment, provides better reference for doctor diagnosis, and can automatically position the focus region based on NBI image-assisted focus region segmentation, thereby greatly improving the disease diagnosis efficiency and reducing the morbidity and mortality.
Drawings
FIG. 1 is a network framework diagram of the present invention.
Fig. 2 is a paired WLI and NBI data set presentation.
Fig. 3 shows the result of image transformation for the same data sets WLI and NBI.
Fig. 4 is the image transformation result across datasets WLI and NBI.
Fig. 5 is the result of transforming the generated NBI map to assist WLI lesion segmentation.
Fig. 6 is the result of intrinsic feature used for lesion segmentation.
Detailed Description
The embodiments of the present invention are described in detail below, but the scope of the present invention is not limited to the examples.
By adopting the network structure in FIG. 1, 6947 is used to train the image transformation network on WLI and NBI images, and a trained image transformation network and an essential feature extractor are obtained.
The method comprises the following specific steps:
(1) during testing, an endoscope white light image I is input WLI Through an intrinsic feature extractor E MI Extracting essential features, inputting an additional endoscope narrow-band imaging image, extracting narrow-band optical features through an optical feature extractor, combining the two features, and sending the combined features to an input narrow-band image generator G NBI In the method, a narrow-band imaging image can be obtained through one-time forward propagation
Figure BDA0003626302080000061
(2) Fixed essential feature extractor E MI And then adding an ASPP network for semantic segmentation. Endoscopic white light image pass through E MI Providing essential characteristics, inputting the essential characteristics into an ASPP network, and directly outputting a segmentation result of a focus area;
(3) the converted narrow-band image can be obtained by using the white light image conversion network, the narrow-band image generated by the network and the original white light image are simultaneously used as the input of the segmentation network, and the accurate white light image focus segmentation result can be output. And using Unet as a basic network framework, extracting the characteristics of the white light image by one encoder, extracting the characteristics of the narrow-band image obtained by converting the white light image by the other encoder, and inputting the two characteristics into a decoder together to predict the segmentation result of the lesion region.
Fig. 3 and 4 show the result of image transformation on the same data set and across data sets WLI and NBI. Since the Pix2Pix and CycleGAN methods treat the task directly as a style conversion, without taking into account the essential features of the medical image, the result is more like a hue conversion with some artifacts, without generating useful information. Therefore, their cross-dataset results are poor. The invention has stable performance both within data set and in cross data set experiments.
Fig. 5 is the result of transforming the generated NBI map to assist WLI lesion segmentation. The invention can generate more real NBI images and accurate lesion segmentation results. This indicates that we generated NBI images that are closest to the true NBI images, and that important information from the endoscopic images can be focused by feature extraction, thus facilitating downstream tasks.
Fig. 6 is the result of intrinsic feature used for lesion segmentation. Even supervised trained end-to-end segmentation methods have difficulty distinguishing certain flat lesions. In the present invention, the feature representation of the endoscopic image is obtained by unsupervised training, which ignores the variations brought to the medical image by different devices and maps all images to the same feature domain. Thus, our intrinsic features can output good lesion prediction results even without fine tuning.
Reference to the literature
[1]Mamonov A V,Figueiredo I N,Figueiredo P N,et al.Automated polyp detection in colon capsule endoscopy[J].IEEE Transactions on Medical Imaging,2014,33(7):1488-1502.
[2]GhatwaryN,Zolgharni M,Ye X.Early esophageal adenocarcinoma detection using deep learning methods[J].International Journal of Computer Assisted Radiology and Surgery,2019,14(4):611-621.
[3]MesejoP,Pizarro D,Abergel A,et al.Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy[J].IEEE Transactions on Medical Imaging,2016,35(9):2051.
[4]Gai W,Jin X F,Du R,et al.Efficacy of narrow-band imaging in detecting early esophageal cancer and risk factors for its occurrence[J].Indian Journal of Gastroenterology,2018.
[5]P.Isola,J.Zhu,T.Zhou and A.A.Efros,Image-to-Image Translation with Conditional Adversarial Networks[C].2017IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2017,pp.5967-5976,doi:10.1109/CVPR.2017.632.
[6]J.Zhu,T.Park,P.Isola and A.A.Efros,Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks[C].2017IEEE International Conference on Computer Vision(ICCV),2017,pp.2242-2251,doi:10.1109/ICCV.2017.244.
[7]He K,Zhang X,Ren S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015。

Claims (6)

1. A cross-mode endoscope image conversion and lesion segmentation method based on intrinsic representation learning is characterized by comprising the following specific steps:
converting a White Light Image (WLI) of a digestive tract endoscope into a high-quality narrow-band image (NBI) through a constructed neural network based on eigen-representation learning;
secondly, acquiring the essential features of the white light image by using the essential feature extractor obtained in the step one through a cavity space convolution pooling pyramid network (ASPP), and predicting the focus area of the white light image to obtain the segmentation result of the focus area;
in step (one), for a given White Light Image (WLI), the goal is to generate a corresponding narrowband image (NBI); according to the light model, it is assumed that the endoscopic image can be decoupled into optical information and essential features; then, recombining the optical information of the other mode with the essential characteristics of the mode by adopting a neural network to obtain a corresponding cross-mode image;
the neural network is a symmetrical network structure; for white light image I WLI Passing through a specific modal characterization encoder E for obtaining optical information MS To obtain a white light image I WLI Optical characteristics of
Figure FDA0003626302070000011
At the same time, white light image I WLI A mode-invariant feature encoder for obtaining mode-invariant features
Figure FDA0003626302070000012
Obtaining a white light image I WLI Essential characteristics of
Figure FDA0003626302070000013
Also for narrow-band images I NBI Passing through a specific modal characterization encoder E for obtaining optical information MS Obtaining a narrow-band image I NBI Optical characteristics of
Figure FDA0003626302070000014
A mode-invariant feature encoder for obtaining mode-invariant features
Figure FDA0003626302070000015
Obtaining a narrow-band image I NBI Essential characteristics of
Figure FDA0003626302070000016
Figure FDA0003626302070000017
And
Figure FDA0003626302070000018
combined input to a white light image generator G WLI In generating white light images
Figure FDA0003626302070000019
Figure FDA00036263020700000110
And
Figure FDA00036263020700000111
combined input narrowband image generator G NBI In generating narrow-band image
Figure FDA00036263020700000112
Two essential characteristics
Figure FDA00036263020700000113
And
Figure FDA00036263020700000114
respectively input an intrinsic generator G Eigen All can generate an intrinsic representation I Eigen
Figure FDA00036263020700000115
And
Figure FDA00036263020700000116
sharing the weight; white light pattern to be generated
Figure FDA00036263020700000117
And narrow band image I NBI The images are fed into a discriminator D which distinguishes the generated image from the real image Gen Obtaining a classification result for generating a vivid medical image by resisting learning;
the neural network has a circulation structure, namely a circulation network, and the circulation mode is as follows: white light image to be generated
Figure FDA00036263020700000118
Passing through a modal-specific feature encoder E for obtaining optical information MS To obtain new white light optical characteristics
Figure FDA00036263020700000119
Modal invariant feature encoder by obtaining modal invariant features
Figure FDA00036263020700000120
Obtaining new essential characteristics of white light image
Figure FDA00036263020700000121
Similarly, new optical characteristics of narrow-band images can be obtained
Figure FDA00036263020700000122
And essential characteristics
Figure FDA00036263020700000123
Figure FDA00036263020700000124
And
Figure FDA00036263020700000125
combined input white light image generator G WLI In the middle, white light image is obtained
Figure FDA00036263020700000126
Figure FDA00036263020700000127
And
Figure FDA00036263020700000128
combined input narrowband image generator G NBI In the method, a narrow band image is obtained
Figure FDA00036263020700000129
White light image obtained after circulation
Figure FDA00036263020700000130
And narrow band imaging
Figure FDA00036263020700000131
White light image I to be input originally WLI And NBI image I NBI Consistently, i.e., pixel-level penalties may be used to constrain.
2. The method of claim 1, wherein for better obtaining essential medical features, an intrinsic countervailing learning strategy is used to generate reasonable intrinsic images, and a loop mode is used to make the network learn an efficient and reasonable representation, and the process is as follows: first, to ensure E MI Extracting a meaningful essence representation using an intrinsic image generator G Eigen Let E MI The method has good universality, and the essence of the endoscope image can be observed, so that the cross-modal and cross-equipment images can keep better consistency; for White Light Images (WLI), intrinsic features are assumed
Figure FDA0003626302070000021
Can effectively express the tissue in the white light image and can obtain an essence map
Figure FDA0003626302070000022
Having the same depth characteristics as the input White Light Image (WLI); likewise, the essence diagrams
Figure FDA0003626302070000023
Feeding into E MI Intermediate features having the same essential features as the input White Light Image (WLI) can be encoded;
Figure FDA0003626302070000024
generated only by essential features, not containing any light information, and therefore will be an essential diagram
Figure FDA0003626302070000025
Feeding into E MS A complete zero vector can be generated.
3. The method of claim 2, wherein said feature identifier D Eigen Classifying the generated characteristic images; since the intrinsic image exhibits a true common feature of the White Light Image (WLI) and the narrow-band image (NBI), it is expected that the intrinsic distribution can approach both the White Light Image (WLI) and the narrow-band image (NBI); feature discriminator D Eigen The generated characteristic image is distinguished from the true White Light Image (WLI) and the Narrow Band Image (NBI), while the eigengenerator G Eigen To generate a true essence image to confuse the discriminator; through antagonism learning, more real essence images can be obtained, and E can be encouraged MI The essential features of WLI and NBI are better captured for downstream medical tasks.
4. The method of claim 3, wherein the endoscopic white light image I is inputted during the neural network test WLI Through an intrinsic feature extractor E MI Extracting essential features, inputting an additional endoscope narrow-band imaging image, extracting narrow-band optical features through an optical feature extractor, combining the two features, and sending the combined features to an input narrow-band image generator G NBI In the method, a narrow-band imaging image can be obtained through one-time forward propagation
Figure FDA0003626302070000026
5. The method of claim 4, wherein the neural network, the losses used in its training process are as follows:
two essential characteristics
Figure FDA0003626302070000027
And
Figure FDA0003626302070000028
input eigen generator G Eigen Respectively generating a white light image eigenrepresentation
Figure FDA0003626302070000029
And narrowband image intrinsic representation
Figure FDA00036263020700000210
These two eigenrepresentations are constrained using LPIPS penalty, as follows:
Figure FDA00036263020700000211
in narrow-band imaging branches, optical features
Figure FDA00036263020700000212
And narrow band nature
Figure FDA00036263020700000213
Combined input white light image generator G WLI In generating white light images
Figure FDA00036263020700000214
Pass through discriminator D gen Calculating the GAN loss L GAN (ii) a White light image intrinsic representation
Figure FDA00036263020700000215
Re-input intrinsic feature extractor E MI Extracting again the eigenrepresentation
Figure FDA0003626302070000031
Optical characteristics of the original white light image
Figure FDA0003626302070000032
Combined input white light image generator G WLI To generate a reconstructed white light image
Figure FDA0003626302070000033
In that
Figure FDA0003626302070000034
And the original white light image I WLI Calculates the cyclic loss L between cycle (ii) a Generated white light image intrinsic representation
Figure FDA0003626302070000035
And narrowband image intrinsic representation
Figure FDA0003626302070000036
Input eigen discriminator D Eigen In (1), calculating a modal invariance loss as follows:
L Eigen =E[logD Eigen (I WLI ,I NBI )]+E[log(1-D Eigen (I Eigen ))], (2)
the ideal essential features should not contain optical features and therefore the white light image to be generated is intrinsically represented
Figure FDA0003626302070000037
Re-input into the optical feature extractor E MS Extracting the optical feature of the intrinsic feature
Figure FDA0003626302070000038
The feature loss was calculated as follows:
Figure FDA0003626302070000039
therefore, when the neural network is trained, the final loss function is:
L=λ 1 L perceptual2 L Eigen3 L feature4 L cycle5 L GAN , (4)
wherein λ is i Is the weight used to balance the individual loss functions, i ═ 1,2, …, 5.
6. The method according to claim 5, wherein in the second step, the intrinsic feature extractor trained in the first step is used to obtain the intrinsic features of the white light image, and the alimentary tract lesion region is segmented by a hollow space convolution pooling pyramid network (ASPP); the specific process is as follows: the cavity space convolution pooling pyramid network is connected with an essential feature extractor E MI A rear side; the ASPP performs convolution parallel sampling on given input holes at different sampling rates; endoscopic white light image pass E MI And (4) providing essential characteristics, inputting the essential characteristics into an ASPP network, and directly outputting a segmentation result of the lesion area.
CN202210477177.5A 2022-05-03 2022-05-03 Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning Pending CN115018767A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210477177.5A CN115018767A (en) 2022-05-03 2022-05-03 Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210477177.5A CN115018767A (en) 2022-05-03 2022-05-03 Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning

Publications (1)

Publication Number Publication Date
CN115018767A true CN115018767A (en) 2022-09-06

Family

ID=83067163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210477177.5A Pending CN115018767A (en) 2022-05-03 2022-05-03 Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning

Country Status (1)

Country Link
CN (1) CN115018767A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117398042A (en) * 2023-12-14 2024-01-16 深圳市博盛医疗科技有限公司 AI-assisted detection 3D endoscope system and imaging method
CN117726642A (en) * 2024-02-07 2024-03-19 中国科学院宁波材料技术与工程研究所 High reflection focus segmentation method and device for optical coherence tomography image
CN117789185A (en) * 2024-02-28 2024-03-29 浙江驿公里智能科技有限公司 Automobile oil hole gesture recognition system and method based on deep learning
CN118199941A (en) * 2024-03-04 2024-06-14 北京中科网芯科技有限公司 Network visualization method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117398042A (en) * 2023-12-14 2024-01-16 深圳市博盛医疗科技有限公司 AI-assisted detection 3D endoscope system and imaging method
CN117398042B (en) * 2023-12-14 2024-03-19 深圳市博盛医疗科技有限公司 AI-assisted detection 3D endoscope system and imaging method
CN117726642A (en) * 2024-02-07 2024-03-19 中国科学院宁波材料技术与工程研究所 High reflection focus segmentation method and device for optical coherence tomography image
CN117726642B (en) * 2024-02-07 2024-05-31 中国科学院宁波材料技术与工程研究所 High reflection focus segmentation method and device for optical coherence tomography image
CN117789185A (en) * 2024-02-28 2024-03-29 浙江驿公里智能科技有限公司 Automobile oil hole gesture recognition system and method based on deep learning
CN117789185B (en) * 2024-02-28 2024-05-10 浙江驿公里智能科技有限公司 Automobile oil hole gesture recognition system and method based on deep learning
CN118199941A (en) * 2024-03-04 2024-06-14 北京中科网芯科技有限公司 Network visualization method

Similar Documents

Publication Publication Date Title
JP6657480B2 (en) Image diagnosis support apparatus, operation method of image diagnosis support apparatus, and image diagnosis support program
Ohmori et al. Endoscopic detection and differentiation of esophageal lesions using a deep neural network
CN115018767A (en) Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning
Nakagawa et al. Classification for invasion depth of esophageal squamous cell carcinoma using a deep neural network compared with experienced endoscopists
Pogorelov et al. Deep learning and hand-crafted feature based approaches for polyp detection in medical videos
US9514556B2 (en) System and method for displaying motility events in an in vivo image stream
US20220296081A1 (en) Method for real-time detection of objects, structures or patterns in a video, an associated system and an associated computer readable medium
WO2006100808A1 (en) Capsule endoscope image display controller
JP7550409B2 (en) Image diagnosis device, image diagnosis method, and image diagnosis program
CN107705852A (en) Real-time the lesion intelligent identification Method and device of a kind of medical electronic endoscope
Wang et al. Learning two-stream CNN for multi-modal age-related macular degeneration categorization
Masmoudi et al. Optimal feature extraction and ulcer classification from WCE image data using deep learning
Mackiewicz Capsule endoscopy-state of the technology and computer vision tools after the first decade
Pornvoraphat et al. Real-time gastric intestinal metaplasia diagnosis tailored for bias and noisy-labeled data with multiple endoscopic imaging
Lin et al. Lesion-decoupling-based segmentation with large-scale colon and esophageal datasets for early cancer diagnosis
JP7533881B2 (en) Image Classification Method Based on Semantic Segmentation
Bernal et al. Building up the future of colonoscopy–a synergy between clinicians and computer scientists
Phillips et al. Video capsule endoscopy: pushing the boundaries with software technology
WO2022049577A1 (en) Systems and methods for comparing images of event indicators
CN114581408A (en) Gastroscope polyp detection method based on YOLOV5
Kanakatte et al. Precise bleeding and red lesions localization from capsule endoscopy using compact u-net
Auzine et al. Classification of artefacts in endoscopic images using deep neural network
US20230162356A1 (en) Diagnostic imaging device, diagnostic imaging method, diagnostic imaging program, and learned model
Katayama et al. Development of Computer-Aided Diagnosis System Using Single FCN Capable for Indicating Detailed Inference Results in Colon NBI Endoscopy
Nguyen et al. Automatic classification of upper gastrointestinal tract diseases from endoscopic images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination