CN115018767A - Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning - Google Patents
Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning Download PDFInfo
- Publication number
- CN115018767A CN115018767A CN202210477177.5A CN202210477177A CN115018767A CN 115018767 A CN115018767 A CN 115018767A CN 202210477177 A CN202210477177 A CN 202210477177A CN 115018767 A CN115018767 A CN 115018767A
- Authority
- CN
- China
- Prior art keywords
- image
- white light
- light image
- wli
- narrow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of medical image processing, and particularly relates to a cross-mode endoscope image conversion and lesion area segmentation method. The invention converts the white light image of the digestive tract endoscope into a high-quality narrow-band image through the constructed neural network based on the eigen expression learning; obtaining the essential features of the white light image by using an unsupervised training essential feature extractor, and predicting a focus area through a cavity space convolution pooling pyramid network to obtain a segmentation result of the focus area; during testing, the white light image to be tested only needs to be transmitted with one auxiliary narrow-band image in a forward direction once, and the narrow-band image corresponding to the white light image can be obtained. The method adopts an unsupervised learning mode, has good generalization and has excellent effect on different endoscope devices. The invention can provide additional narrow-band imaging for white light endoscope equipment, provides better reference for doctor diagnosis, and can automatically position the focus region based on narrow-band image-assisted focus region segmentation, thereby greatly improving the disease diagnosis efficiency and reducing the morbidity and mortality.
Description
Technical Field
The invention belongs to the technical field of medical image processing, and particularly relates to a cross-mode endoscope image conversion and lesion area segmentation method.
Background
With the rapid development of medical technology, endoscopes have been widely used in the fields of clinical diagnosis, treatment, and surgery. The endoscope can be directly inserted into body cavity for observation, and can display tissue form of internal organ [1] . Colon cancer and esophageal cancer are two diseases with high late mortality in the world. Fortunately, they can be diagnosed at an early stage by colonoscopy and enteroscopy, which can effectively improve survival [2,3] 。
Among the endoscopic imaging techniques, white light endoscopic imaging (WLI) is the most widely used, allowing good visualization of vascular structures and surrounding mucosa. However, lesions of early cancer are generally limited to the mucosal and submucosal layers. WLI is relatively limited in diagnostic efficacy, exhibiting low sensitivity and specificity. Therefore, Narrow Band Imaging (NBI) is becoming a new trend for early cancer diagnosis. NBI can observe the boundary of the lesion margins, clearly observing the morphology and distribution of the microvasculature. Under NBI, superficial blood vessels are brown and central internal blood vessels are blue-green, which enhances visualization of capillary and mucosal morphology. It can significantly increase the difference between the lesion and the surrounding mucosa, thereby improving the judgment of the lesion areaAccuracy of fracture [4] 。
WLI can help physicians locate lesions, while NBI can better see the boundaries and extent of lesions. Thus, if both WLI and NBI can be seen simultaneously, the advantages of both modes can be exploited. In fact, most endoscopic devices cannot display both WLI and NBI simultaneously, since NBI requires much light to be filtered out. Therefore, the WLI is converted into the NBI through a deep learning technology, and the WLI has great social value.
In the field of computer vision, the task of image-to-image conversion aims at learning the mapping between different domains, so as to generate images similar to the target. Some deep learning methods have made great progress, Pix2Pix [5] A method of pairwise image transformation based on pixel-level correspondence datasets is proposed. CycleGAN due to difficulty in obtaining paired datasets [6] A non-paired image transformation method based on cycle consistency is provided. The existing image conversion method only considers style conversion, does not pay attention to the association between two modes, and cannot learn intrinsic representation related to medical information. Therefore, they cannot serve downstream medical tasks such as anomaly detection and lesion region segmentation.
The invention provides a novel method for converting a WLI image into an NBI image based on an unsupervised eigen expression learning method, fully learns eigen expression between two modes, can convert the WLI image into a high-quality NBI image, provides an effective basis for diagnosis of doctors, and can be used for segmentation of a focus region and improvement of the detection rate of digestive tract diseases.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a cross-mode endoscope image conversion and lesion area segmentation method based on unsupervised intrinsic expression learning, so as to eliminate the influence of human factors, realize the conversion from a WLI endoscope image to an NBI endoscope image and simultaneously perform the segmentation of a lesion area.
The invention provides a cross-mode endoscope image conversion and focus region segmentation method based on unsupervised intrinsic representation learning, which comprises the following specific steps:
converting a White Light Image (WLI) of a digestive tract endoscope into a high-quality narrow-band image (NBI) through a constructed neural network based on eigen-representation learning;
and (II) acquiring the essential features of the white light image by using the essential feature extractor obtained in the step (I) through a cavity space convolution pooling pyramid network (ASPP), and predicting the focus region of the white light image to obtain the segmentation result of the focus region.
In step (one), for a given White Light Image (WLI), the goal is to generate a corresponding narrowband image (NBI). Previous image translation methods generally considered cross-modality image conversion as a style transfer task. This process, while producing a realistic looking result, may destroy the original high-level information and the basic features of the image that are not applicable to medical images. This may not only result in loss of useful information, but may also affect downstream tasks or the judgment of the physician. Depending on the light model, the present invention assumes that the endoscopic images can be decoupled into optical information and essential features. According to the invention, the neural network is adopted to recombine the optical information of the other mode with the essential characteristics of the mode to obtain the corresponding cross-mode image.
The neural network is a symmetrical network structure. For white light image I WLI Passing through a specific modal characterization encoder E for obtaining optical information MS Obtaining the optical characteristics of white lightAt the same time, white light image I WLI Modal invariant feature encoder for obtaining modal invariant featuresObtaining essential characteristics of white light imageSimilarly, for narrow-band image I NBI Passing through a modal-specific feature encoder E for obtaining optical information MS Obtaining optical characteristics of NBIA mode-invariant feature encoder for obtaining mode-invariant featuresObtaining essential features of NBI images Andcombined input white light image generator G WLI In generating white light images Andcombined input narrowband image generator G NBI In generating narrow-band imaging imagesTwo essential characteristicsAndrespectively input into an intrinsic generator G Eigen All can generate an intrinsic representation I Eigen 。Andthe weights are shared. White light pattern to be generatedAnd the NBI image are respectively sent to a discriminator D for distinguishing the generated image from the real image Gen And obtaining a classification result for generating a vivid medical image against learning.
In order to increase the diversity of samples to be popularized to different devices and modes, the neural network adopted by the invention has a loop structure, namely, is a loop network, and does not use pixel-level loss to constrain the transformed image. Specifically, the cycle is as follows: white light image to be generatedPassing through a modal-specific feature encoder E for obtaining optical information MS Obtaining new white light optical characteristicsModal invariant feature encoder by obtaining modal invariant featuresObtaining new essential characteristics of white light imageSimilarly, new optical characteristics of NBI images can be obtainedAnd essential characteristics Andcombined input white light image generator G WLI In the method, white light image is generatedAndcombined input narrowband image generator G NBI In the method, narrow-band image is generatedWhite light image obtained after circulationAnd narrow band imagingWhite light image I to be input originally WLI And narrow band image I NBI Consistently, i.e., pixel-level penalties may be used to constrain.
Further, since intrinsic features cannot be directly constrained, in order to better acquire medical intrinsic features, the invention employs an intrinsic countervailing learning strategy to generate reasonable intrinsic images. Because the intrinsic image has no real ground truth, the invention adopts a circulating mode to lead the network learning to be represented efficiently and reasonably. First, to ensure E MI Extracting meaningful essential representation, and providing an intrinsic image generator G Eigen . Hope for E MI The method has good universality, and the nature of the endoscope image can be observed, so that the cross-modal and cross-device images can keep better consistency. Taking WLI as an example, the essential characteristics are assumedCan effectively express the organization in the WLI image, and we can obtain an essence mapWith the same depth characteristics as the input WLI. Similarly, the essence diagramsFeeding into E MI Intermediate features may be encoded that have the same essential features as the input WLI.Generated only by essential features and should not contain any optical information. Therefore, will be essentially illustratedFeeding into E MS A complete zero vector can be generated.
Further, although the cyclic network can learn a reasonable expression, the deep neural network tends to become lazy in the training process, thereby making the generated essential image unreal. Furthermore, due to the lighting conditions and limitations of the display device, we cannot obtain a true intrinsic image. Thus, the present invention constructs a feature discriminator D Eigen And classifying the generated characteristic images. Since the intrinsic images exhibit true common features of both WLI and NBI images, we expect the intrinsic distributions to be close to both WLI and NBI. D Eigen The generated feature images are distinguished from the real WLI and NBI images, and G Eigen To generate a true essence image to confuse the discriminator. Through antagonism learning, not only can a more realistic intrinsic image be obtained, but also E can be encouraged MI The essential features of WLI and NBI are better captured for downstream medical tasks.
In the invention, during the neural network test, an endoscope white light image I is input WLI Through an intrinsic feature extractor E MI Extracting essential features, inputting an additional endoscope narrow-band imaging image, extracting narrow-band optical features through an optical feature extractor, combining the two features, and sending the combined features to an input narrow-band image generator G NBI In the method, a narrow-band imaging image can be obtained through one-time forward propagation
In the invention, the loss of the neural network used in the training process is specifically designed as follows:
two essential characteristicsAndinput eigen generator G Eigen Respectively generating a white light image eigenrepresentationAnd narrowband image intrinsic representationThese two eigenrepresentations are constrained using the LPIPS penalty as follows:
in a narrow-band imaging branch, optical featuresAnd narrow band natureCombined input white light image generator G WLI In generating white light imagesPass through discriminator D gen Calculating the GAN loss L GAN . White light image intrinsic representationRe-input intrinsic feature extractor E MI Extracting again the eigenrepresentationOptical characteristics of the original white light imageCombined input white light image generator G WLI To generate a reconstructed white light imageIn thatAnd the original white light image I WLI Calculates the cyclic loss L between cycle . Generated white light image intrinsic representationAnd narrowband image intrinsic representationInput eigen discriminator D Eigen In (1), calculating a modal invariance loss as follows:
L Eigen =E[logD Eigen (I WLI ,I NBI )]+E[log(1-D Eigen (I Eigen ))], (2)
the ideal essential features should not contain optical features and therefore the white light image to be generated is intrinsically representedRe-input into the optical feature extractor E MS Extracting the optical feature of the intrinsic featureThe feature loss was calculated as follows:
therefore, when the neural network is trained, the final loss function is:
L=λ 1 L perceptual +λ 2 L Eigen +λ 3 L feature +λ 4 L cycle +λ 5 L GAN , (4)
wherein λ is i Is the weight used to balance the individual loss functions, i ═ 1,2, …, 5; in the present invention, based on experience, λ is taken 1 =10,λ 2 =1,λ 3 =1,λ 4 =10,λ 5 =1。
In the step (II), the essential characteristics of the white light image are obtained by using the essential characteristic extractor obtained in the step (I), and the method adopts a cavity space convolution pooling pyramid (ASPP) network [7] Dividing the focus area of the digestive tract; the specific process is as follows:
the void space convolution pooling pyramid (ASPP) network is connected with a fixed essential feature extractor E MI A rear side; ASPP convolves parallel samples for a given input with holes of different sampling rates. Endoscopic white light image pass E MI And (4) providing essential characteristics, inputting the essential characteristics into an ASPP network, and directly outputting a segmentation result of the lesion area. Unlike conventional semantic segmentation networks, first, the present invention does not take an image as an input to the segmentation network, but uses an essential feature extractor E MI The extracted intermediate features. Second, an intrinsic feature extractor E MI Is trained in an unsupervised way, and when training a semantic segmentation task, the intrinsic feature extractor E MI This greatly reduces the amount of parameters and time required for training. Finally, due to the advantage of unsupervised training in generalization capability, the semantic segmentation network provided by the invention can be directly applied to other data sets without any retraining or fine tuning, so that a better cross-device effect is realized.
In the invention, the intrinsic feature extractor can map images of different modalities into the same feature space, so that for images of cross-equipment, the invention can generate similar intrinsic features, and the focus detection of the invention has better cross-equipment effect.
In the invention, the converted narrow-band image can be obtained by using the white light image conversion network, and the narrow-band image generated by the network and the original white light image are simultaneously used as the input of the segmentation network, so that the accurate lesion segmentation result of the white light image can be output. And using Unet as a basic network framework, extracting the characteristics of the white light image by one encoder, extracting the characteristics of the narrow-band image obtained by converting the white light image by the other encoder, and inputting the two characteristics into a decoder together to predict the segmentation result of the lesion region.
The present invention also provides the first multi-modal (WLI and NBI) esophageal endoscopic video dataset and pixel-level aligned paired esophageal endoscopic image dataset. The data set used was collected based on a binge (Pentax) device, including 34 videos and 8700 pairs of WLI and NBI images, and the images used in the test set were not coincident with the corresponding patient and training set. The construction of the data set specifically comprises:
(1) selection of a data object. The lack of large-scale paired datasets is one of the key challenges that hinder the endoscopic image conversion task. Due to the special structural design, the binge endoscope can display WLI and NBI almost synchronously in real time. Therefore, we collected a video data set of esophageal endoscopy containing 34 videos including 29 videos of normal esophagus and 5 videos of abnormal esophagus from affiliated hospitals of certain university in Shanghai. The time span is from 4 months to 5 months in 2021. Each video is about 1 to 5 minutes in length, with abnormal esophageal video lengths generally being longer. The total duration of the normal video is 39 minutes 57 seconds, and the total duration of the abnormal video is 12 minutes 52 seconds;
(2) images were taken by endoscopy. All patients received gastroscopy after intravenous anesthesia. The camera tool employs a binge EPK-i 7000. The endoscopist selects Pentax's dual mode to implement WLI and NBI dual screen displays. Esophageal endoscopy was performed by regression and recorded. The endoscopist starts the examination from the cardia (40 cm from the incisors) and then slowly pushes the mirror image until the esophagus begins (15 cm from the incisors). For suspected lesions, the endoscopist may repeat the viewing and filming and take multiple shots at different angles. Each set of videos was used as a case and included in the analysis. Please note that in order to ensure the quality of the recorded video, the endoscopist tries to avoid the out-of-focus during the shooting and to avoid the interference caused by breathing, heartbeat, mucus, air bubbles, blood, etc. If the video is fuzzy, shooting again;
(3) construction of paired multimodal datasets. The WLI image of Pentax is read by three color signals, while the NBI image is acquired by digital image technology, which requires some time to process. When the lens is moved slowly, this processing time is negligible, and therefore a nearly aligned cross mode image can be obtained. However, when the lens moves rapidly or the patient's breathing and heartbeat cause severe esophageal jitter, the images of the two modes displayed in the video frame will appear significantly distorted, blurred, etc. Therefore, we pre-process the captured video frames.
First, we manually remove some of the apparently blurred frames, as well as the beginning and ending frames of each video. These frames typically show anomalous images due to the debugging equipment. Second, according to the imaging principle and motion law of video frames, WLI will occur before NBI at the same location. Thus, three adjacent frames of WLI and NBI are compared and the image residuals are used to find the corresponding WLI and NBI images. Based on the video dataset, 11 normal and 5 abnormal videos were used to balance the samples and a paired multimodality endoscopy dataset was constructed containing 8700 pairs of WLI and NBI images. WLI and NBI images can almost achieve pixel level alignment. To our knowledge, this is the first multi-modal (WLI and NBI) esophageal endoscopic video data set and pixel-level aligned paired esophageal endoscopic image data set.
The invention has the beneficial effects that: the invention designs an image conversion network based on eigen expression learning, which can convert a WLI image of an alimentary tract endoscope into a high-quality NBI image, and meanwhile, the NBI image can assist the WLI image to realize the segmentation of a focus area. The WLI image to be detected can obtain the NBI image corresponding to the WLI image only by one-time forward propagation with one auxiliary NBI image. The method adopts an unsupervised learning mode, has good generalization performance and has excellent effect on different endoscope devices. The invention can provide extra NBI imaging for WLI endoscope equipment, provides better reference for doctor diagnosis, and can automatically position the focus region based on NBI image-assisted focus region segmentation, thereby greatly improving the disease diagnosis efficiency and reducing the morbidity and mortality.
Drawings
FIG. 1 is a network framework diagram of the present invention.
Fig. 2 is a paired WLI and NBI data set presentation.
Fig. 3 shows the result of image transformation for the same data sets WLI and NBI.
Fig. 4 is the image transformation result across datasets WLI and NBI.
Fig. 5 is the result of transforming the generated NBI map to assist WLI lesion segmentation.
Fig. 6 is the result of intrinsic feature used for lesion segmentation.
Detailed Description
The embodiments of the present invention are described in detail below, but the scope of the present invention is not limited to the examples.
By adopting the network structure in FIG. 1, 6947 is used to train the image transformation network on WLI and NBI images, and a trained image transformation network and an essential feature extractor are obtained.
The method comprises the following specific steps:
(1) during testing, an endoscope white light image I is input WLI Through an intrinsic feature extractor E MI Extracting essential features, inputting an additional endoscope narrow-band imaging image, extracting narrow-band optical features through an optical feature extractor, combining the two features, and sending the combined features to an input narrow-band image generator G NBI In the method, a narrow-band imaging image can be obtained through one-time forward propagation
(2) Fixed essential feature extractor E MI And then adding an ASPP network for semantic segmentation. Endoscopic white light image pass through E MI Providing essential characteristics, inputting the essential characteristics into an ASPP network, and directly outputting a segmentation result of a focus area;
(3) the converted narrow-band image can be obtained by using the white light image conversion network, the narrow-band image generated by the network and the original white light image are simultaneously used as the input of the segmentation network, and the accurate white light image focus segmentation result can be output. And using Unet as a basic network framework, extracting the characteristics of the white light image by one encoder, extracting the characteristics of the narrow-band image obtained by converting the white light image by the other encoder, and inputting the two characteristics into a decoder together to predict the segmentation result of the lesion region.
Fig. 3 and 4 show the result of image transformation on the same data set and across data sets WLI and NBI. Since the Pix2Pix and CycleGAN methods treat the task directly as a style conversion, without taking into account the essential features of the medical image, the result is more like a hue conversion with some artifacts, without generating useful information. Therefore, their cross-dataset results are poor. The invention has stable performance both within data set and in cross data set experiments.
Fig. 5 is the result of transforming the generated NBI map to assist WLI lesion segmentation. The invention can generate more real NBI images and accurate lesion segmentation results. This indicates that we generated NBI images that are closest to the true NBI images, and that important information from the endoscopic images can be focused by feature extraction, thus facilitating downstream tasks.
Fig. 6 is the result of intrinsic feature used for lesion segmentation. Even supervised trained end-to-end segmentation methods have difficulty distinguishing certain flat lesions. In the present invention, the feature representation of the endoscopic image is obtained by unsupervised training, which ignores the variations brought to the medical image by different devices and maps all images to the same feature domain. Thus, our intrinsic features can output good lesion prediction results even without fine tuning.
Reference to the literature
[1]Mamonov A V,Figueiredo I N,Figueiredo P N,et al.Automated polyp detection in colon capsule endoscopy[J].IEEE Transactions on Medical Imaging,2014,33(7):1488-1502.
[2]GhatwaryN,Zolgharni M,Ye X.Early esophageal adenocarcinoma detection using deep learning methods[J].International Journal of Computer Assisted Radiology and Surgery,2019,14(4):611-621.
[3]MesejoP,Pizarro D,Abergel A,et al.Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy[J].IEEE Transactions on Medical Imaging,2016,35(9):2051.
[4]Gai W,Jin X F,Du R,et al.Efficacy of narrow-band imaging in detecting early esophageal cancer and risk factors for its occurrence[J].Indian Journal of Gastroenterology,2018.
[5]P.Isola,J.Zhu,T.Zhou and A.A.Efros,Image-to-Image Translation with Conditional Adversarial Networks[C].2017IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2017,pp.5967-5976,doi:10.1109/CVPR.2017.632.
[6]J.Zhu,T.Park,P.Isola and A.A.Efros,Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks[C].2017IEEE International Conference on Computer Vision(ICCV),2017,pp.2242-2251,doi:10.1109/ICCV.2017.244.
[7]He K,Zhang X,Ren S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015。
Claims (6)
1. A cross-mode endoscope image conversion and lesion segmentation method based on intrinsic representation learning is characterized by comprising the following specific steps:
converting a White Light Image (WLI) of a digestive tract endoscope into a high-quality narrow-band image (NBI) through a constructed neural network based on eigen-representation learning;
secondly, acquiring the essential features of the white light image by using the essential feature extractor obtained in the step one through a cavity space convolution pooling pyramid network (ASPP), and predicting the focus area of the white light image to obtain the segmentation result of the focus area;
in step (one), for a given White Light Image (WLI), the goal is to generate a corresponding narrowband image (NBI); according to the light model, it is assumed that the endoscopic image can be decoupled into optical information and essential features; then, recombining the optical information of the other mode with the essential characteristics of the mode by adopting a neural network to obtain a corresponding cross-mode image;
the neural network is a symmetrical network structure; for white light image I WLI Passing through a specific modal characterization encoder E for obtaining optical information MS To obtain a white light image I WLI Optical characteristics ofAt the same time, white light image I WLI A mode-invariant feature encoder for obtaining mode-invariant featuresObtaining a white light image I WLI Essential characteristics ofAlso for narrow-band images I NBI Passing through a specific modal characterization encoder E for obtaining optical information MS Obtaining a narrow-band image I NBI Optical characteristics ofA mode-invariant feature encoder for obtaining mode-invariant featuresObtaining a narrow-band image I NBI Essential characteristics of Andcombined input to a white light image generator G WLI In generating white light images Andcombined input narrowband image generator G NBI In generating narrow-band imageTwo essential characteristicsAndrespectively input an intrinsic generator G Eigen All can generate an intrinsic representation I Eigen ;Andsharing the weight; white light pattern to be generatedAnd narrow band image I NBI The images are fed into a discriminator D which distinguishes the generated image from the real image Gen Obtaining a classification result for generating a vivid medical image by resisting learning;
the neural network has a circulation structure, namely a circulation network, and the circulation mode is as follows: white light image to be generatedPassing through a modal-specific feature encoder E for obtaining optical information MS To obtain new white light optical characteristicsModal invariant feature encoder by obtaining modal invariant featuresObtaining new essential characteristics of white light imageSimilarly, new optical characteristics of narrow-band images can be obtainedAnd essential characteristics Andcombined input white light image generator G WLI In the middle, white light image is obtained Andcombined input narrowband image generator G NBI In the method, a narrow band image is obtainedWhite light image obtained after circulationAnd narrow band imagingWhite light image I to be input originally WLI And NBI image I NBI Consistently, i.e., pixel-level penalties may be used to constrain.
2. The method of claim 1, wherein for better obtaining essential medical features, an intrinsic countervailing learning strategy is used to generate reasonable intrinsic images, and a loop mode is used to make the network learn an efficient and reasonable representation, and the process is as follows: first, to ensure E MI Extracting a meaningful essence representation using an intrinsic image generator G Eigen Let E MI The method has good universality, and the essence of the endoscope image can be observed, so that the cross-modal and cross-equipment images can keep better consistency; for White Light Images (WLI), intrinsic features are assumedCan effectively express the tissue in the white light image and can obtain an essence mapHaving the same depth characteristics as the input White Light Image (WLI); likewise, the essence diagramsFeeding into E MI Intermediate features having the same essential features as the input White Light Image (WLI) can be encoded;generated only by essential features, not containing any light information, and therefore will be an essential diagramFeeding into E MS A complete zero vector can be generated.
3. The method of claim 2, wherein said feature identifier D Eigen Classifying the generated characteristic images; since the intrinsic image exhibits a true common feature of the White Light Image (WLI) and the narrow-band image (NBI), it is expected that the intrinsic distribution can approach both the White Light Image (WLI) and the narrow-band image (NBI); feature discriminator D Eigen The generated characteristic image is distinguished from the true White Light Image (WLI) and the Narrow Band Image (NBI), while the eigengenerator G Eigen To generate a true essence image to confuse the discriminator; through antagonism learning, more real essence images can be obtained, and E can be encouraged MI The essential features of WLI and NBI are better captured for downstream medical tasks.
4. The method of claim 3, wherein the endoscopic white light image I is inputted during the neural network test WLI Through an intrinsic feature extractor E MI Extracting essential features, inputting an additional endoscope narrow-band imaging image, extracting narrow-band optical features through an optical feature extractor, combining the two features, and sending the combined features to an input narrow-band image generator G NBI In the method, a narrow-band imaging image can be obtained through one-time forward propagation
5. The method of claim 4, wherein the neural network, the losses used in its training process are as follows:
two essential characteristicsAndinput eigen generator G Eigen Respectively generating a white light image eigenrepresentationAnd narrowband image intrinsic representationThese two eigenrepresentations are constrained using LPIPS penalty, as follows:
in narrow-band imaging branches, optical featuresAnd narrow band natureCombined input white light image generator G WLI In generating white light imagesPass through discriminator D gen Calculating the GAN loss L GAN (ii) a White light image intrinsic representationRe-input intrinsic feature extractor E MI Extracting again the eigenrepresentationOptical characteristics of the original white light imageCombined input white light image generator G WLI To generate a reconstructed white light imageIn thatAnd the original white light image I WLI Calculates the cyclic loss L between cycle (ii) a Generated white light image intrinsic representationAnd narrowband image intrinsic representationInput eigen discriminator D Eigen In (1), calculating a modal invariance loss as follows:
L Eigen =E[logD Eigen (I WLI ,I NBI )]+E[log(1-D Eigen (I Eigen ))], (2)
the ideal essential features should not contain optical features and therefore the white light image to be generated is intrinsically representedRe-input into the optical feature extractor E MS Extracting the optical feature of the intrinsic featureThe feature loss was calculated as follows:
therefore, when the neural network is trained, the final loss function is:
L=λ 1 L perceptual +λ 2 L Eigen +λ 3 L feature +λ 4 L cycle +λ 5 L GAN , (4)
wherein λ is i Is the weight used to balance the individual loss functions, i ═ 1,2, …, 5.
6. The method according to claim 5, wherein in the second step, the intrinsic feature extractor trained in the first step is used to obtain the intrinsic features of the white light image, and the alimentary tract lesion region is segmented by a hollow space convolution pooling pyramid network (ASPP); the specific process is as follows: the cavity space convolution pooling pyramid network is connected with an essential feature extractor E MI A rear side; the ASPP performs convolution parallel sampling on given input holes at different sampling rates; endoscopic white light image pass E MI And (4) providing essential characteristics, inputting the essential characteristics into an ASPP network, and directly outputting a segmentation result of the lesion area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210477177.5A CN115018767A (en) | 2022-05-03 | 2022-05-03 | Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210477177.5A CN115018767A (en) | 2022-05-03 | 2022-05-03 | Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115018767A true CN115018767A (en) | 2022-09-06 |
Family
ID=83067163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210477177.5A Pending CN115018767A (en) | 2022-05-03 | 2022-05-03 | Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115018767A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117398042A (en) * | 2023-12-14 | 2024-01-16 | 深圳市博盛医疗科技有限公司 | AI-assisted detection 3D endoscope system and imaging method |
CN117726642A (en) * | 2024-02-07 | 2024-03-19 | 中国科学院宁波材料技术与工程研究所 | High reflection focus segmentation method and device for optical coherence tomography image |
CN117789185A (en) * | 2024-02-28 | 2024-03-29 | 浙江驿公里智能科技有限公司 | Automobile oil hole gesture recognition system and method based on deep learning |
CN118199941A (en) * | 2024-03-04 | 2024-06-14 | 北京中科网芯科技有限公司 | Network visualization method |
-
2022
- 2022-05-03 CN CN202210477177.5A patent/CN115018767A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117398042A (en) * | 2023-12-14 | 2024-01-16 | 深圳市博盛医疗科技有限公司 | AI-assisted detection 3D endoscope system and imaging method |
CN117398042B (en) * | 2023-12-14 | 2024-03-19 | 深圳市博盛医疗科技有限公司 | AI-assisted detection 3D endoscope system and imaging method |
CN117726642A (en) * | 2024-02-07 | 2024-03-19 | 中国科学院宁波材料技术与工程研究所 | High reflection focus segmentation method and device for optical coherence tomography image |
CN117726642B (en) * | 2024-02-07 | 2024-05-31 | 中国科学院宁波材料技术与工程研究所 | High reflection focus segmentation method and device for optical coherence tomography image |
CN117789185A (en) * | 2024-02-28 | 2024-03-29 | 浙江驿公里智能科技有限公司 | Automobile oil hole gesture recognition system and method based on deep learning |
CN117789185B (en) * | 2024-02-28 | 2024-05-10 | 浙江驿公里智能科技有限公司 | Automobile oil hole gesture recognition system and method based on deep learning |
CN118199941A (en) * | 2024-03-04 | 2024-06-14 | 北京中科网芯科技有限公司 | Network visualization method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6657480B2 (en) | Image diagnosis support apparatus, operation method of image diagnosis support apparatus, and image diagnosis support program | |
Ohmori et al. | Endoscopic detection and differentiation of esophageal lesions using a deep neural network | |
CN115018767A (en) | Cross-modal endoscope image conversion and lesion segmentation method based on eigen expression learning | |
Nakagawa et al. | Classification for invasion depth of esophageal squamous cell carcinoma using a deep neural network compared with experienced endoscopists | |
Pogorelov et al. | Deep learning and hand-crafted feature based approaches for polyp detection in medical videos | |
US9514556B2 (en) | System and method for displaying motility events in an in vivo image stream | |
US20220296081A1 (en) | Method for real-time detection of objects, structures or patterns in a video, an associated system and an associated computer readable medium | |
WO2006100808A1 (en) | Capsule endoscope image display controller | |
JP7550409B2 (en) | Image diagnosis device, image diagnosis method, and image diagnosis program | |
CN107705852A (en) | Real-time the lesion intelligent identification Method and device of a kind of medical electronic endoscope | |
Wang et al. | Learning two-stream CNN for multi-modal age-related macular degeneration categorization | |
Masmoudi et al. | Optimal feature extraction and ulcer classification from WCE image data using deep learning | |
Mackiewicz | Capsule endoscopy-state of the technology and computer vision tools after the first decade | |
Pornvoraphat et al. | Real-time gastric intestinal metaplasia diagnosis tailored for bias and noisy-labeled data with multiple endoscopic imaging | |
Lin et al. | Lesion-decoupling-based segmentation with large-scale colon and esophageal datasets for early cancer diagnosis | |
JP7533881B2 (en) | Image Classification Method Based on Semantic Segmentation | |
Bernal et al. | Building up the future of colonoscopy–a synergy between clinicians and computer scientists | |
Phillips et al. | Video capsule endoscopy: pushing the boundaries with software technology | |
WO2022049577A1 (en) | Systems and methods for comparing images of event indicators | |
CN114581408A (en) | Gastroscope polyp detection method based on YOLOV5 | |
Kanakatte et al. | Precise bleeding and red lesions localization from capsule endoscopy using compact u-net | |
Auzine et al. | Classification of artefacts in endoscopic images using deep neural network | |
US20230162356A1 (en) | Diagnostic imaging device, diagnostic imaging method, diagnostic imaging program, and learned model | |
Katayama et al. | Development of Computer-Aided Diagnosis System Using Single FCN Capable for Indicating Detailed Inference Results in Colon NBI Endoscopy | |
Nguyen et al. | Automatic classification of upper gastrointestinal tract diseases from endoscopic images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |