CN116091446A - Method, system, medium and equipment for detecting abnormality of esophageal endoscope image - Google Patents
Method, system, medium and equipment for detecting abnormality of esophageal endoscope image Download PDFInfo
- Publication number
- CN116091446A CN116091446A CN202310016873.0A CN202310016873A CN116091446A CN 116091446 A CN116091446 A CN 116091446A CN 202310016873 A CN202310016873 A CN 202310016873A CN 116091446 A CN116091446 A CN 116091446A
- Authority
- CN
- China
- Prior art keywords
- image
- esophageal
- memory
- esophageal endoscope
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 30
- 230000005856 abnormality Effects 0.000 title claims description 23
- 238000012549 training Methods 0.000 claims abstract description 43
- 230000002159 abnormal effect Effects 0.000 claims abstract description 39
- 238000001514 detection method Methods 0.000 claims abstract description 38
- 230000004927 fusion Effects 0.000 claims abstract description 26
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 239000013598 vector Substances 0.000 claims description 50
- 230000006870 function Effects 0.000 claims description 28
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 7
- 238000010276 construction Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 13
- 238000004364 calculation method Methods 0.000 description 8
- 201000010099 disease Diseases 0.000 description 8
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 8
- 208000028299 esophageal disease Diseases 0.000 description 8
- 238000013145 classification model Methods 0.000 description 7
- 238000002372 labelling Methods 0.000 description 7
- 210000002569 neuron Anatomy 0.000 description 7
- 238000004195 computer-aided diagnosis Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 3
- 210000003238 esophagus Anatomy 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003902 lesion Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 238000001839 endoscopy Methods 0.000 description 1
- 201000004101 esophageal cancer Diseases 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000004400 mucous membrane Anatomy 0.000 description 1
- 208000025402 neoplasm of esophagus Diseases 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
- G06T7/0014—Biomedical image inspection using an image reference approach
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
- G06V10/763—Non-hierarchical techniques, e.g. based on statistics of modelling distributions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10068—Endoscopic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Radiology & Medical Imaging (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
- Endoscopes (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to an anomaly detection method, a system, a medium and equipment for an esophageal endoscope image, comprising the following steps: acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image; judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score; the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training. The construction and training of the model can be completed under the condition of only normal samples, and the accuracy of anomaly detection is greatly improved through the technologies of multi-scale feature fusion, memory modules, clustering and the like.
Description
Technical Field
The invention relates to the technical field of computer-aided diagnosis, in particular to an anomaly detection method, an anomaly detection system, an anomaly detection medium and anomaly detection equipment for an esophageal endoscope image.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Esophageal endoscopy is an important means for examining and locating diseases such as esophageal tumor, and a doctor can visually examine lesion parts, ranges and forms of digestive tract mucous membrane through an esophageal endoscope image, so that accurate judgment can be made. In large-scale screening, for diseases such as early esophageal cancer and the like which lack obvious clinical symptoms, a computer-aided diagnosis technology is generally relied on, details in an esophageal endoscope image are identified through a computer, and diagnosis of a doctor is aided, so that the workload of the doctor is reduced.
The computer aided diagnosis technology processes the original medical image data through a computer, and identifies and outputs possible results. Taking a common deep neural network as an example, training generally requires a large number of labeled and class-balanced data sets as support, otherwise, the phenomena of over fitting and the like are very easy to occur, and the cost is huge or the medical image is difficult to acquire. For example, in screening such as a large-scale physical examination, a large number of healthy esophageal endoscope images are often collected, but few diseased images are collected, and a disease image with low partial disease incidence may not be collected at all, which means that the characteristics of the disease are difficult to learn by a traditional deep learning model, so that the disease image may be misjudged as a normal image in the future. At the same time, all images need to be marked by a professional doctor before they can be used for training, which is also a costly task.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides the method, the system, the medium and the equipment for detecting the abnormality of the esophageal endoscope image, the construction and the training of a model can be completed under the condition of only a normal sample, and the accuracy rate of abnormality detection is greatly improved through the technologies of multi-scale feature fusion, memory modules, clustering and the like.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a first aspect of the present invention provides a method for detecting abnormalities in an esophageal endoscope image, comprising the steps of:
acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image;
judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score;
the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training.
A cluster memory variational self-encoder for multi-scale feature fusion, comprising:
the multi-scale encoder module is used for extracting the characteristics in the esophageal endoscope images under different resolutions and is provided with a plurality of encoders, and each encoder outputs the variance sigma and the mean mu of Gaussian distribution respectively;
the clustering memory module comprises a plurality of memory vectors with the same dimension, wherein the dimension of the memory vectors is the same as the dimension of the coded characteristic, the memory vectors are input into the memory module as variance sigma and mean mu output by each encoder, and the memory vectors are output into a weighted sum;
the multi-scale feature fusion module fuses the output of the clustering memory module to obtain fused variance sigma and mean mu, and the fused variance sigma and mean mu are used as sampling distribution of the decoder module;
and the decoder module is used for inputting vectors which are randomly sampled in the Gaussian distribution obtained from the multi-scale feature fusion module, and obtaining an image with the same resolution as the original image through multi-layer neural network decoding.
The clustering memory module is a two-dimensional matrix and is provided with a plurality of memory vectors with the same dimension, the memory vectors only memorize the characteristics of normal samples, and the output is only the weighted sum of the characteristics of the normal samples.
The clustering memory module is provided with a clustering algorithm, and the optimization of the memory vector distribution in the feature space is completed through a scattering matrix.
Preprocessing includes dividing a training set from a test set, where the training set contains only healthy images.
The image reconstruction model is trained using a training set and a loss function, the loss function comprising:
the reconstruction error loss is used for ensuring the similarity between the original image and the reconstructed image;
regularization term, which is KL divergence between Gaussian distribution obtained by coding and standard normal distribution;
and the cluster loss function is used for optimizing the distribution of the memory vectors in the cluster memory module.
Obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image, wherein the abnormal score is specifically as follows:
adjusting the resolution of an esophageal endoscope image to be detected, and respectively inputting the resolution into encoders of the trained image reconstruction model;
obtaining an image after model reconstruction, and calculating a reconstruction error with an original image, namely a clustering loss function;
normalizing the obtained reconstruction errors to obtain the anomaly score corresponding to each image to be detected, wherein the calculation formula is as followsWherein e i 、e min And e max The reconstruction error of the sample, the minimum reconstruction error of all samples and the maximum reconstruction error of all samples are respectively.
A second aspect of the present invention provides a system for implementing the above method, comprising:
an anomaly score module configured to: acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image;
an image judgment module configured to: judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score;
the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training.
A third aspect of the present invention provides a computer-readable storage medium.
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the method of detecting an abnormality of an esophageal endoscope image as described above.
A fourth aspect of the invention provides a computer device.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of anomaly detection of an oesophageal endoscopic image as described above when the program is executed.
Compared with the prior art, the above technical scheme has the following beneficial effects:
1. the model training method belongs to an unsupervised algorithm, and only healthy images in esophageal endoscope images are needed to complete model training, and images with lesion parts are not needed, so that the difficulty in data collection and the labeling cost are reduced, all images with different characteristics from those of the healthy images can be detected, namely, good detection effects on all abnormal esophageal states are achieved, and the problem that the traditional classification model is difficult to cover all esophageal image states is effectively solved.
2. The multi-scale feature fusion technology is adopted during encoding, and features of the healthy esophagus image under each scale are extracted by changing the size of the input image, so that more feature information can be obtained, and a better abnormality detection effect is realized.
3. And a cluster memory module is introduced into the coded features, so that the coded features are not directly decoded. The input of the decoder is always the weighted sum of the memory vectors, and only the characteristics of the healthy images exist in the memory module, so that the reconstruction effect of the model on the abnormal images is reduced again, and the generalization capability of the model is inhibited.
4. The clustering algorithm in the clustering memory module can optimize the distribution of the memory vectors in the feature space, so that the features of the healthy images can be better remembered, and the abnormality detection effect is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a schematic flow diagram of an anomaly detection process for an endoscopic image of an esophagus provided by one or more embodiments of the present invention;
FIG. 2 is a schematic diagram of a cluster memory variational self-encoder network for multi-scale feature fusion used in anomaly detection of esophageal endoscopic images provided by one or more embodiments of the invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
As described in the background art, when a deep learning model is used as a computer-aided diagnosis, a large amount of marked original image data is needed, on the one hand, a doctor is required to mark, and on the other hand, a part of images of unusual diseases are difficult to obtain, so that the cost of aided diagnosis is high and the accuracy is low.
The abnormal detection is a process of detecting abnormal samples in a large number of normal samples under the condition of unbalanced positive and negative samples, only the normal samples are in a training set, and no labels exist any more, namely, a classification model is constructed by using one type of data. The anomaly detection model can perfectly solve the problems of difficult collection of a data set and high labeling cost faced in the esophageal endoscope image classification task, and only a healthy image is needed to construct a two-classification model, so that the anomaly detection model can be applied to primary screening of large-scale endoscope images, and assist doctors in completing diagnosis and further labeling of data.
At present, the abnormality detection model mainly comprises four types of distribution-based, reconstruction-based, pseudo-abnormality enhancement-based and distillation-based learning, and experiments show that the esophageal endoscope image has the characteristic that the health image reconstruction effect is better than that of the abnormality image, so that the following embodiment adopts a reconstruction-based method.
The reconstruction-based anomaly detection method mainly uses a self-encoder or a variable self-encoder as a basic structure, and only uses normal data to train so that the normal data can only learn the characteristics of the normal data, thereby the reconstruction error of the normal data is lower than that of the anomaly data, and the anomaly data is detected according to the reconstruction error.
However, many researches show that the deep neural network has extremely strong generalization capability, even though data which does not appear in training can learn the characteristics of the deep neural network through similar data, and partial health and diseased esophageal endoscope images are very similar, so that the most advanced solution to the problem at present is to add a memory module between an encoder and a decoder to inhibit the generalization capability of a model, but how to optimize the memory vector distribution in the memory module faces difficulties; if the optimization strategy is not ideal, the model can not completely memorize the characteristics of the normal sample, so that the reconstruction error of the normal sample is overlarge, or the abnormal characteristics are learned, so that the review error of the abnormal sample is overlarge.
Therefore, the following embodiments provide an anomaly detection method, system, medium and device for esophageal endoscope images, which can complete the construction and training of a model under the condition of only normal samples, and greatly improve the accuracy of anomaly detection through the technologies of multi-scale feature fusion, memory module, clustering and the like.
Embodiment one:
as shown in fig. 1 to 2, the method for detecting abnormality of an esophageal endoscope image includes the steps of:
collecting an esophageal endoscope image and preprocessing;
obtaining an abnormal score of each esophageal endoscope image;
setting a proper threshold value, and judging whether the esophageal endoscope image is abnormal according to the threshold value and the abnormal score of the image;
the abnormal score of each esophageal endoscope image is obtained, specifically: reconstructing the esophageal endoscope images by using a cluster memory variation self-encoder for training multi-scale feature fusion, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error.
Specific:
s1: acquiring an esophageal endoscope image, and preprocessing the endoscope image to obtain a training set and a testing set;
s2: initializing a neural network framework for training;
s3: inputting the training set image obtained in the step S1 into a neural network framework, and completing training of the neural network by using a loss function;
s4: training the obtained neural network by using the step S3 to calculate and obtain an abnormal score of the test set or each esophageal endoscope image to be detected;
s5: and (3) setting a proper threshold according to the proportion or actual requirement of the abnormal images, classifying each image according to the threshold and the abnormal score obtained in the step (S4), and detecting the abnormal images.
Step S1, specifically, includes:
s11: the esophageal endoscope image is collected, and the public data set can be collected in a large-scale physical examination screening or can be directly used;
s12: preprocessing the collected esophageal endoscope image, including removing blurred images, adjusting all images to proper sizes and the like;
s13: dividing a training set and a testing set, wherein the training set only comprises healthy images;
s14: if the training set has insufficient images, data enhancement processing can be performed, namely, the images are rotated, mirrored and the like to increase the number of the images.
Step S2, specifically comprising:
s21: and constructing a neural network framework, namely constructing a cluster memory variation self-encoder for multi-scale feature fusion. The constructed cluster memory variation self-encoder for multi-scale feature fusion comprises a multi-scale encoder module, a cluster memory module, a multi-scale feature fusion module and a decoder module, and the specific structure is shown in figure 2.
The multi-scale encoder module has a plurality of encoders of different sizes for feature extraction of input images of different resolutions, each encoder outputting independently a variance σ and a mean μ of a gaussian distribution. Compared with a common variation self-encoder, the multi-scale encoding can extract the characteristics of input images with different resolutions, namely, multi-scale characteristic extraction is completed, and more complete characteristic information of an original image is obtained.
The clustering memory module is a two-dimensional matrix, namely, a plurality of memory vectors with the same dimension, and the dimension of the memory vectors is the same as the dimension of the coded features. Two cluster memory modules with the same structure are arranged behind each encoder to finish mapping the variance sigma and the mean mu obtained by encoding. The output of the clustering memory module is the weighted sum of the memory vectors, and the memory vectors only can memorize the characteristics of the normal samples because of only the normal samples in the training set, and the output can only be the weighted sum of the characteristics of the normal samples, so that the generalization capability of the model is inhibited, and the reconstruction effect of the model on the abnormal samples is reduced. The clustering memory module also introduces a clustering algorithm, and optimizes the distribution of the memory vectors in the feature space through the scattering matrix, so that the features stored by the memory module are wider and more accurate, and the performance of the model is further improved.
The multi-scale feature fusion module adopts a splicing operation to fuse the output features of all the clustering memory modules to obtain a fused Gaussian distribution variance sigma and a fused mean mu, and the fused Gaussian distribution variance sigma and the mean mu are used as sampling distribution of a decoder.
The input of the decoder is a vector randomly sampled in Gaussian distribution obtained by the multi-scale feature fusion module, and an image with the same resolution as the original image is obtained through multi-layer neural network decoding. The network structure of the decoder decodes the fused features so that the obtained image has the same dimension as the original image.
S22: and (5) constructing and initializing a cluster memory variation self-encoder for completing multi-scale feature fusion. And selecting proper parameters including network dimension, network layer number, convolution kernel size, step length, activation function, hidden variable feature dimension and memory vector number, and completing the establishment and random initialization of the neural network model.
Step S3, specifically comprising:
s31: the method comprises the steps of performing resolution adjustment on an input original esophageal endoscope image to obtain a plurality of images with different resolutions, wherein the images are adaptive to the dimensions of a multi-scale encoder, and the images are respectively input into encoders with different scales;
s32: the encoder neural network carries out calculation transfer on an input picture, the encoder consists of a plurality of layers of neural networks, each layer of neural network consists of a plurality of neurons, and each neuron comprises a parameter weight omega, a bias b and an activation function f;
the calculation formula of the neuron is y=f (Σωx) i (+) where x i For the output of each neuron of the upper layer, namely the input of the neuron, y is the output of the neuron, and the output is transmitted to the neural network of the lower layer to be used as the input of each neuron of the lower layer;
s33: each encoder will eventually output two eigenvectors: sigma and mu represent the variance and mean, respectively, of a gaussian distribution. The feature vector enters a clustering memory module, and the input feature vector is assumed to be z;
subsequently, its weight for each memory vector is obtained by Softmax operation:
finally, the output of the memory module can be obtained through weighted sum, namely, the feature vector after memory mapping:
s33: the mapped feature vectors enter a feature fusion module, a plurality of feature vectors with different scales are fused by adopting splicing operation, so that a variance sigma and a mean mu of Gaussian distribution are finally obtained, and random sampling is carried out in the Gaussian distribution, so that vectors to be decoded are obtained;
s34: and (3) inputting the vector to be decoded obtained in the step (S33) into a decoder, and obtaining the reconstructed esophageal endoscope image through decoding. The decoder is also composed of a multi-layer neural network, and it should be noted that the parameters such as the dimension of the neural network and the convolution kernel of the decoder are set so that the resolution of the reconstructed image obtained by decoding is consistent with that of the original image;
s35: calculating a loss function, wherein the loss function of the model consists of three parts;
the first part is reconstruction error loss for ensuring that the original image is as similar as possible to the reconstructed image, the reconstruction error loss assuming that the resolution of the image is mxnThe calculation formula of the loss function isWherein x is ij And y ij The pixels of the original image and the reconstructed image, respectively.
The second part of the loss function is a regularization term, which is the KL divergence between the Gaussian distribution obtained by encoding and the standard normal distribution, because the prior distribution of the hidden variable is assumed to be the standard normal distribution in the variable self-encoder, the distribution obtained by encoding is similar to the standard normal distribution as much as possible, and the calculation formula of the loss function is as follows:
where σ and μ are the variance and mean, respectively, of the resulting gaussian distribution of the code.
The third term of the loss function is a clustering loss function, and is used for optimizing the distribution of memory vectors in a clustering memory module, the purpose of clustering operation is to improve the sparsity of the memory vectors in space and approach an original feature vector, the memory vector is taken as a clustering center, the original feature vector is taken as a sample to be clustered, and a scattering matrix is used for measuring a clustering result.
Assume that the clustering centers of the two classes are m respectively i And m j The scattering matrix between the two classes isThe total inter-class scattering matrix is:
wherein N is the number of all vectors; the center point of the whole class isWherein K is the number of memory vectors; for a single class, assume that the vector to be clustered (i.e., inputEigenvector of the memory module) is z, then the intra-class scattering matrix isWherein n is j The number of the feature vectors is the number of the feature vectors gathered to the category; the total intra-class scattering matrix of all classes is +.>With the help of the scattering matrix, the calculation formula of the clustering loss function can be obtained as follows
The model total loss function is a weighted sum of the three loss functions, and it should be noted that, because the multi-scale feature extraction method is adopted, gaussian distribution of a plurality of features is obtained, KL divergence is calculated with standard normal distribution respectively, so that the second loss function has a plurality of items.
The calculation formula of the total loss function isWherein T is the number of multi-scale encoders, lambda 1 、λ 2 And lambda (lambda) 3 The weight parameters that balance the three loss functions are respectively.
S36: the neural network is trained using the loss function obtained in S35. And repeating the steps S31 to S35 by using the data in the training set, and continuously optimizing each parameter in the neural network along the gradient descending direction of the loss function, so that the value of the loss function is continuously reduced until the value is converged to a certain value and then tends to be unchanged.
Step S4, specifically comprising:
s41: carrying out resolution adjustment on an esophageal endoscope image to be detected to obtain a plurality of images with different resolutions, which are adaptive to the dimensions of the multi-scale encoder, and respectively inputting the images into different scale encoders of the trained model;
s42: obtaining an image after model reconstruction, and calculating a reconstruction error with an original image, namely a first term in a loss function;
s43: after the reconstruction errors of all the samples to be detected are obtained, carrying out normalization processing on the data to obtain the abnormal score corresponding to each image to be detected, wherein the calculation formula is as followsWherein e i 、e min And e max The reconstruction error of the sample, the minimum reconstruction error of all samples and the maximum reconstruction error of all samples are respectively.
Step S5, specifically comprising:
s51: the threshold is set, and the setting of the threshold can be set according to the proportion of known anomalies in the detection sample or the requirement of an actual task. For example, if the proportion of the esophageal endoscope image health is known to be 70%, the 0.7 quantile of the anomaly score may be set as the threshold; if the task objective is to assist the doctor in detecting the patient with the disease, a smaller threshold value is set, and more abnormal samples are screened as much as possible; if the task objective is to screen normal pictures for expansion and further training of the data set, a larger threshold should be set, and each picture is fully utilized.
S52: classifying the detected pictures according to the anomaly scores obtained in the step S43 and the threshold value set in the step S51, judging samples with anomaly scores higher than the threshold value as anomalies, judging samples with anomaly scores lower than the threshold value as normal, and completing the anomaly detection task.
The process belongs to an unsupervised algorithm, and training of the model can be completed only by using healthy esophageal endoscope images, so that the difficulty in collecting data and the labeling cost are greatly reduced; all images with different characteristics from the healthy images can be detected, namely, the method has good detection effect on all esophageal diseases, and the problem that the traditional classification model is difficult to cover all esophageal diseases is effectively solved.
The multi-scale feature fusion technology is adopted during encoding, and features of the healthy esophagus image under each scale are extracted by changing the size of the input image, so that more feature information can be obtained, and a better abnormality detection effect is realized.
And a cluster memory module is introduced into the coded features, so that the coded features are not directly decoded. The input of the decoder is always the weighted sum of the memory vectors, and only the characteristics of the healthy images exist in the memory module, so that the reconstruction effect of the model on the abnormal images is reduced again, and the generalization capability of the model is inhibited. In addition, the clustering algorithm in the clustering memory module can optimize the distribution of the memory vectors in the feature space, so that the features of the healthy images can be better remembered, and the abnormality detection effect is improved.
Embodiment two:
the system for realizing the method comprises the following steps:
11. an abnormality detection system for a tube endoscope image, comprising:
an anomaly score module configured to: acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image;
an image judgment module configured to: judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score;
the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training.
The system belongs to an unsupervised algorithm, and can complete training of the model only by using healthy esophageal endoscope images, so that the difficulty in collecting data and the labeling cost are greatly reduced; and all images with different characteristics from the healthy images can be detected, namely, the method has good detection effect on all esophageal diseases, and the problem that the traditional classification model is difficult to cover all esophageal diseases is effectively solved.
Embodiment III:
the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the abnormality detection method for an esophageal endoscope image as described in the above embodiment.
The method belongs to an unsupervised algorithm, and the training of the model can be completed only by using healthy esophageal endoscope images, so that the difficulty in collecting data and the labeling cost are greatly reduced; and all images with different characteristics from the healthy images can be detected, namely, the method has good detection effect on all esophageal diseases, and the problem that the traditional classification model is difficult to cover all esophageal diseases is effectively solved.
Embodiment four:
the present embodiment provides a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps in the method for detecting an abnormality of an esophageal endoscope image according to the above embodiment when executing the program.
The method belongs to an unsupervised algorithm, and the training of the model can be completed only by using healthy esophageal endoscope images, so that the difficulty in collecting data and the labeling cost are greatly reduced; and all images with different characteristics from the healthy images can be detected, namely, the method has good detection effect on all esophageal diseases, and the problem that the traditional classification model is difficult to cover all esophageal diseases is effectively solved.
The steps or modules in the second to fourth embodiments correspond to the first embodiment, and the detailed description of the first embodiment may be referred to in the related description section of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. An abnormality detection method for an esophageal endoscope image, comprising the steps of:
acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image;
judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score;
the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training.
2. The method for anomaly detection of esophageal endoscope images of claim 1, wherein the cluster memory variational self-encoder for multi-scale feature fusion comprises:
the multi-scale encoder module is used for extracting the characteristics in the esophageal endoscope images under different resolutions and is provided with a plurality of encoders, and each encoder outputs the variance sigma and the mean mu of Gaussian distribution respectively;
the clustering memory module comprises a plurality of memory vectors with the same dimension, wherein the dimension of the memory vectors is the same as the dimension of the coded characteristic, the memory vectors are input into the memory module as variance sigma and mean mu output by each encoder, and the memory vectors are output into a weighted sum;
the multi-scale feature fusion module fuses the output of the clustering memory module to obtain fused variance sigma and mean mu, and the fused variance sigma and mean mu are used as sampling distribution of the decoder module;
and the decoder module is used for inputting vectors which are randomly sampled in the Gaussian distribution obtained from the multi-scale feature fusion module, and obtaining an image with the same resolution as the original image through multi-layer neural network decoding.
3. The method for detecting abnormalities in an esophageal endoscope image according to claim 2, wherein the cluster memory module is a two-dimensional matrix having a plurality of memory vectors of the same dimension, the memory vectors memorize only features of normal samples, and the output is a weighted sum of features of only normal samples.
4. The method for detecting the abnormality of the esophageal endoscope image according to claim 2, wherein the clustering memory module is provided with a clustering algorithm, and the optimization of the distribution of the memory vectors in the feature space is completed through a scattering matrix.
5. The method of anomaly detection for an esophageal endoscope image of claim 1, wherein the preprocessing comprises partitioning a training set from a test set, wherein the training set comprises only healthy images.
6. The method for anomaly detection of esophageal endoscope images of claim 2, wherein the image reconstruction model is trained using a training set and a loss function, the loss function comprising:
the reconstruction error loss is used for ensuring the similarity between the original image and the reconstructed image;
regularization term, which is KL divergence between Gaussian distribution obtained by coding and standard normal distribution;
and the cluster loss function is used for optimizing the distribution of the memory vectors in the cluster memory module.
7. The abnormality detection method for an esophageal endoscope image according to claim 1, wherein the abnormality score of each esophageal endoscope image is obtained from a reconstruction error between a reconstructed image and an original image, specifically:
adjusting the resolution of an esophageal endoscope image to be detected, and respectively inputting the resolution into encoders of the trained image reconstruction model;
obtaining an image after model reconstruction, and calculating a reconstruction error with an original image, namely a clustering loss function;
normalizing the obtained reconstruction errors to obtain the anomaly scores corresponding to each image to be detected, wherein the formula is as follows:wherein e i 、e min And e max The reconstruction error of the sample, the minimum reconstruction error of all samples and the maximum reconstruction error of all samples are respectively.
8. An abnormality detection system for an esophageal endoscope image, comprising:
an anomaly score module configured to: acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image;
an image judgment module configured to: judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score;
the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training.
9. A computer-readable storage medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, realizes the steps in the abnormality detection method of an esophageal endoscope image as set forth in any one of the preceding claims 1-7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of anomaly detection of an oesophageal endoscopic image as claimed in any one of claims 1 to 7 when the program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310016873.0A CN116091446A (en) | 2023-01-06 | 2023-01-06 | Method, system, medium and equipment for detecting abnormality of esophageal endoscope image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310016873.0A CN116091446A (en) | 2023-01-06 | 2023-01-06 | Method, system, medium and equipment for detecting abnormality of esophageal endoscope image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116091446A true CN116091446A (en) | 2023-05-09 |
Family
ID=86209829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310016873.0A Pending CN116091446A (en) | 2023-01-06 | 2023-01-06 | Method, system, medium and equipment for detecting abnormality of esophageal endoscope image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116091446A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523907A (en) * | 2023-06-28 | 2023-08-01 | 浙江华诺康科技有限公司 | Endoscope imaging quality detection method, device, equipment and storage medium |
-
2023
- 2023-01-06 CN CN202310016873.0A patent/CN116091446A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523907A (en) * | 2023-06-28 | 2023-08-01 | 浙江华诺康科技有限公司 | Endoscope imaging quality detection method, device, equipment and storage medium |
CN116523907B (en) * | 2023-06-28 | 2023-10-31 | 浙江华诺康科技有限公司 | Endoscope imaging quality detection method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Aggarwal et al. | COVID-19 image classification using deep learning: Advances, challenges and opportunities | |
CN111598881B (en) | Image anomaly detection method based on variational self-encoder | |
Shi et al. | COVID-19 automatic diagnosis with radiographic imaging: Explainable attention transfer deep neural networks | |
CN109670510B (en) | Deep learning-based gastroscope biopsy pathological data screening system | |
Gupta | Pneumonia detection using convolutional neural networks | |
CN107851194A (en) | Visual representation study for brain tumor classification | |
Luo et al. | Retinal image classification by self-supervised fuzzy clustering network | |
CN113327258A (en) | Lung CT image identification method based on deep learning | |
CN113012163A (en) | Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network | |
Zhang et al. | Feature-transfer network and local background suppression for microaneurysm detection | |
CN111462082A (en) | Focus picture recognition device, method and equipment and readable storage medium | |
You et al. | Vocal cord leukoplakia classification using deep learning models in white light and narrow band imaging endoscopy images | |
Liu et al. | A cross-lesion attention network for accurate diabetic retinopathy grading with fundus images | |
CN116091446A (en) | Method, system, medium and equipment for detecting abnormality of esophageal endoscope image | |
Tavana et al. | Classification of spinal curvature types using radiography images: deep learning versus classical methods | |
Lafraxo et al. | Computer-aided system for bleeding detection in wce images based on cnn-gru network | |
CN114140437A (en) | Fundus hard exudate segmentation method based on deep learning | |
Saha | Classification of Parkinson’s disease using MRI data and deep learning convolution neural networks | |
Lei et al. | GNN-fused CapsNet with multi-head prediction for diabetic retinopathy grading | |
Xu et al. | Learning group-wise spatial attention and label dependencies for multi-task thoracic disease classification | |
Mahmud et al. | An Interpretable Deep Learning Approach for Skin Cancer Categorization | |
Liu et al. | A gastric cancer recognition algorithm on gastric pathological sections based on multistage attention‐DenseNet | |
CN113033330A (en) | Tongue posture abnormality distinguishing method based on light convolutional neural network | |
Subasi et al. | Alzheimer’s disease detection using artificial intelligence | |
Wu et al. | Mscan: Multi-scale channel attention for fundus retinal vessel segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |