CN116091446A

CN116091446A - Method, system, medium and equipment for detecting abnormality of esophageal endoscope image

Info

Publication number: CN116091446A
Application number: CN202310016873.0A
Authority: CN
Inventors: 赵子健; 于秀梅; 吴雁冰
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-05-09

Abstract

The invention relates to an anomaly detection method, a system, a medium and equipment for an esophageal endoscope image, comprising the following steps: acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image; judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score; the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training. The construction and training of the model can be completed under the condition of only normal samples, and the accuracy of anomaly detection is greatly improved through the technologies of multi-scale feature fusion, memory modules, clustering and the like.

Description

Method, system, medium and equipment for detecting abnormality of esophageal endoscope image

Technical Field

The invention relates to the technical field of computer-aided diagnosis, in particular to an anomaly detection method, an anomaly detection system, an anomaly detection medium and anomaly detection equipment for an esophageal endoscope image.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Esophageal endoscopy is an important means for examining and locating diseases such as esophageal tumor, and a doctor can visually examine lesion parts, ranges and forms of digestive tract mucous membrane through an esophageal endoscope image, so that accurate judgment can be made. In large-scale screening, for diseases such as early esophageal cancer and the like which lack obvious clinical symptoms, a computer-aided diagnosis technology is generally relied on, details in an esophageal endoscope image are identified through a computer, and diagnosis of a doctor is aided, so that the workload of the doctor is reduced.

The computer aided diagnosis technology processes the original medical image data through a computer, and identifies and outputs possible results. Taking a common deep neural network as an example, training generally requires a large number of labeled and class-balanced data sets as support, otherwise, the phenomena of over fitting and the like are very easy to occur, and the cost is huge or the medical image is difficult to acquire. For example, in screening such as a large-scale physical examination, a large number of healthy esophageal endoscope images are often collected, but few diseased images are collected, and a disease image with low partial disease incidence may not be collected at all, which means that the characteristics of the disease are difficult to learn by a traditional deep learning model, so that the disease image may be misjudged as a normal image in the future. At the same time, all images need to be marked by a professional doctor before they can be used for training, which is also a costly task.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides the method, the system, the medium and the equipment for detecting the abnormality of the esophageal endoscope image, the construction and the training of a model can be completed under the condition of only a normal sample, and the accuracy rate of abnormality detection is greatly improved through the technologies of multi-scale feature fusion, memory modules, clustering and the like.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a first aspect of the present invention provides a method for detecting abnormalities in an esophageal endoscope image, comprising the steps of:

acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image;

judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score;

the image reconstruction model is a cluster memory variation self-encoder with multi-scale feature fusion, and healthy images in esophageal endoscope images are used as training sets to complete training.

A cluster memory variational self-encoder for multi-scale feature fusion, comprising:

the multi-scale encoder module is used for extracting the characteristics in the esophageal endoscope images under different resolutions and is provided with a plurality of encoders, and each encoder outputs the variance sigma and the mean mu of Gaussian distribution respectively;

the clustering memory module comprises a plurality of memory vectors with the same dimension, wherein the dimension of the memory vectors is the same as the dimension of the coded characteristic, the memory vectors are input into the memory module as variance sigma and mean mu output by each encoder, and the memory vectors are output into a weighted sum;

the multi-scale feature fusion module fuses the output of the clustering memory module to obtain fused variance sigma and mean mu, and the fused variance sigma and mean mu are used as sampling distribution of the decoder module;

and the decoder module is used for inputting vectors which are randomly sampled in the Gaussian distribution obtained from the multi-scale feature fusion module, and obtaining an image with the same resolution as the original image through multi-layer neural network decoding.

The clustering memory module is a two-dimensional matrix and is provided with a plurality of memory vectors with the same dimension, the memory vectors only memorize the characteristics of normal samples, and the output is only the weighted sum of the characteristics of the normal samples.

The clustering memory module is provided with a clustering algorithm, and the optimization of the memory vector distribution in the feature space is completed through a scattering matrix.

Preprocessing includes dividing a training set from a test set, where the training set contains only healthy images.

The image reconstruction model is trained using a training set and a loss function, the loss function comprising:

the reconstruction error loss is used for ensuring the similarity between the original image and the reconstructed image;

regularization term, which is KL divergence between Gaussian distribution obtained by coding and standard normal distribution;

and the cluster loss function is used for optimizing the distribution of the memory vectors in the cluster memory module.

Obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image, wherein the abnormal score is specifically as follows:

adjusting the resolution of an esophageal endoscope image to be detected, and respectively inputting the resolution into encoders of the trained image reconstruction model;

obtaining an image after model reconstruction, and calculating a reconstruction error with an original image, namely a clustering loss function;

normalizing the obtained reconstruction errors to obtain the anomaly score corresponding to each image to be detected, wherein the calculation formula is as follows

Wherein e _i 、e _min And e _max The reconstruction error of the sample, the minimum reconstruction error of all samples and the maximum reconstruction error of all samples are respectively.

A second aspect of the present invention provides a system for implementing the above method, comprising:

an anomaly score module configured to: acquiring and preprocessing an esophageal endoscope image, reconstructing the preprocessed esophageal endoscope image by using a trained image reconstruction model, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error between the reconstructed image and the original image;

an image judgment module configured to: judging whether the esophageal endoscope image is abnormal or not according to the set threshold value and the abnormal score;

A third aspect of the present invention provides a computer-readable storage medium.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the method of detecting an abnormality of an esophageal endoscope image as described above.

A fourth aspect of the invention provides a computer device.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of anomaly detection of an oesophageal endoscopic image as described above when the program is executed.

Compared with the prior art, the above technical scheme has the following beneficial effects:

1. the model training method belongs to an unsupervised algorithm, and only healthy images in esophageal endoscope images are needed to complete model training, and images with lesion parts are not needed, so that the difficulty in data collection and the labeling cost are reduced, all images with different characteristics from those of the healthy images can be detected, namely, good detection effects on all abnormal esophageal states are achieved, and the problem that the traditional classification model is difficult to cover all esophageal image states is effectively solved.

2. The multi-scale feature fusion technology is adopted during encoding, and features of the healthy esophagus image under each scale are extracted by changing the size of the input image, so that more feature information can be obtained, and a better abnormality detection effect is realized.

3. And a cluster memory module is introduced into the coded features, so that the coded features are not directly decoded. The input of the decoder is always the weighted sum of the memory vectors, and only the characteristics of the healthy images exist in the memory module, so that the reconstruction effect of the model on the abnormal images is reduced again, and the generalization capability of the model is inhibited.

4. The clustering algorithm in the clustering memory module can optimize the distribution of the memory vectors in the feature space, so that the features of the healthy images can be better remembered, and the abnormality detection effect is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a schematic flow diagram of an anomaly detection process for an endoscopic image of an esophagus provided by one or more embodiments of the present invention;

FIG. 2 is a schematic diagram of a cluster memory variational self-encoder network for multi-scale feature fusion used in anomaly detection of esophageal endoscopic images provided by one or more embodiments of the invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

As described in the background art, when a deep learning model is used as a computer-aided diagnosis, a large amount of marked original image data is needed, on the one hand, a doctor is required to mark, and on the other hand, a part of images of unusual diseases are difficult to obtain, so that the cost of aided diagnosis is high and the accuracy is low.

The abnormal detection is a process of detecting abnormal samples in a large number of normal samples under the condition of unbalanced positive and negative samples, only the normal samples are in a training set, and no labels exist any more, namely, a classification model is constructed by using one type of data. The anomaly detection model can perfectly solve the problems of difficult collection of a data set and high labeling cost faced in the esophageal endoscope image classification task, and only a healthy image is needed to construct a two-classification model, so that the anomaly detection model can be applied to primary screening of large-scale endoscope images, and assist doctors in completing diagnosis and further labeling of data.

At present, the abnormality detection model mainly comprises four types of distribution-based, reconstruction-based, pseudo-abnormality enhancement-based and distillation-based learning, and experiments show that the esophageal endoscope image has the characteristic that the health image reconstruction effect is better than that of the abnormality image, so that the following embodiment adopts a reconstruction-based method.

The reconstruction-based anomaly detection method mainly uses a self-encoder or a variable self-encoder as a basic structure, and only uses normal data to train so that the normal data can only learn the characteristics of the normal data, thereby the reconstruction error of the normal data is lower than that of the anomaly data, and the anomaly data is detected according to the reconstruction error.

However, many researches show that the deep neural network has extremely strong generalization capability, even though data which does not appear in training can learn the characteristics of the deep neural network through similar data, and partial health and diseased esophageal endoscope images are very similar, so that the most advanced solution to the problem at present is to add a memory module between an encoder and a decoder to inhibit the generalization capability of a model, but how to optimize the memory vector distribution in the memory module faces difficulties; if the optimization strategy is not ideal, the model can not completely memorize the characteristics of the normal sample, so that the reconstruction error of the normal sample is overlarge, or the abnormal characteristics are learned, so that the review error of the abnormal sample is overlarge.

Therefore, the following embodiments provide an anomaly detection method, system, medium and device for esophageal endoscope images, which can complete the construction and training of a model under the condition of only normal samples, and greatly improve the accuracy of anomaly detection through the technologies of multi-scale feature fusion, memory module, clustering and the like.

Embodiment one:

as shown in fig. 1 to 2, the method for detecting abnormality of an esophageal endoscope image includes the steps of:

collecting an esophageal endoscope image and preprocessing;

obtaining an abnormal score of each esophageal endoscope image;

setting a proper threshold value, and judging whether the esophageal endoscope image is abnormal according to the threshold value and the abnormal score of the image;

the abnormal score of each esophageal endoscope image is obtained, specifically: reconstructing the esophageal endoscope images by using a cluster memory variation self-encoder for training multi-scale feature fusion, and obtaining the abnormal score of each esophageal endoscope image according to the reconstruction error.

Specific:

s1: acquiring an esophageal endoscope image, and preprocessing the endoscope image to obtain a training set and a testing set;

s2: initializing a neural network framework for training;

s3: inputting the training set image obtained in the step S1 into a neural network framework, and completing training of the neural network by using a loss function;

s4: training the obtained neural network by using the step S3 to calculate and obtain an abnormal score of the test set or each esophageal endoscope image to be detected;

s5: and (3) setting a proper threshold according to the proportion or actual requirement of the abnormal images, classifying each image according to the threshold and the abnormal score obtained in the step (S4), and detecting the abnormal images.

Step S1, specifically, includes:

s11: the esophageal endoscope image is collected, and the public data set can be collected in a large-scale physical examination screening or can be directly used;

s12: preprocessing the collected esophageal endoscope image, including removing blurred images, adjusting all images to proper sizes and the like;

s13: dividing a training set and a testing set, wherein the training set only comprises healthy images;

s14: if the training set has insufficient images, data enhancement processing can be performed, namely, the images are rotated, mirrored and the like to increase the number of the images.

Step S2, specifically comprising:

s21: and constructing a neural network framework, namely constructing a cluster memory variation self-encoder for multi-scale feature fusion. The constructed cluster memory variation self-encoder for multi-scale feature fusion comprises a multi-scale encoder module, a cluster memory module, a multi-scale feature fusion module and a decoder module, and the specific structure is shown in figure 2.

The multi-scale encoder module has a plurality of encoders of different sizes for feature extraction of input images of different resolutions, each encoder outputting independently a variance σ and a mean μ of a gaussian distribution. Compared with a common variation self-encoder, the multi-scale encoding can extract the characteristics of input images with different resolutions, namely, multi-scale characteristic extraction is completed, and more complete characteristic information of an original image is obtained.

The clustering memory module is a two-dimensional matrix, namely, a plurality of memory vectors with the same dimension, and the dimension of the memory vectors is the same as the dimension of the coded features. Two cluster memory modules with the same structure are arranged behind each encoder to finish mapping the variance sigma and the mean mu obtained by encoding. The output of the clustering memory module is the weighted sum of the memory vectors, and the memory vectors only can memorize the characteristics of the normal samples because of only the normal samples in the training set, and the output can only be the weighted sum of the characteristics of the normal samples, so that the generalization capability of the model is inhibited, and the reconstruction effect of the model on the abnormal samples is reduced. The clustering memory module also introduces a clustering algorithm, and optimizes the distribution of the memory vectors in the feature space through the scattering matrix, so that the features stored by the memory module are wider and more accurate, and the performance of the model is further improved.

The multi-scale feature fusion module adopts a splicing operation to fuse the output features of all the clustering memory modules to obtain a fused Gaussian distribution variance sigma and a fused mean mu, and the fused Gaussian distribution variance sigma and the mean mu are used as sampling distribution of a decoder.

The input of the decoder is a vector randomly sampled in Gaussian distribution obtained by the multi-scale feature fusion module, and an image with the same resolution as the original image is obtained through multi-layer neural network decoding. The network structure of the decoder decodes the fused features so that the obtained image has the same dimension as the original image.

S22: and (5) constructing and initializing a cluster memory variation self-encoder for completing multi-scale feature fusion. And selecting proper parameters including network dimension, network layer number, convolution kernel size, step length, activation function, hidden variable feature dimension and memory vector number, and completing the establishment and random initialization of the neural network model.

Step S3, specifically comprising:

s31: the method comprises the steps of performing resolution adjustment on an input original esophageal endoscope image to obtain a plurality of images with different resolutions, wherein the images are adaptive to the dimensions of a multi-scale encoder, and the images are respectively input into encoders with different scales;

s32: the encoder neural network carries out calculation transfer on an input picture, the encoder consists of a plurality of layers of neural networks, each layer of neural network consists of a plurality of neurons, and each neuron comprises a parameter weight omega, a bias b and an activation function f;

the calculation formula of the neuron is y=f (Σωx) _i (+) where x _i For the output of each neuron of the upper layer, namely the input of the neuron, y is the output of the neuron, and the output is transmitted to the neural network of the lower layer to be used as the input of each neuron of the lower layer;

s33: each encoder will eventually output two eigenvectors: sigma and mu represent the variance and mean, respectively, of a gaussian distribution. The feature vector enters a clustering memory module, and the input feature vector is assumed to be z;

first, the cosine similarity to each memory vector is calculated:

wherein m is _j Is a memory vector;

subsequently, its weight for each memory vector is obtained by Softmax operation:

finally, the output of the memory module can be obtained through weighted sum, namely, the feature vector after memory mapping:

s33: the mapped feature vectors enter a feature fusion module, a plurality of feature vectors with different scales are fused by adopting splicing operation, so that a variance sigma and a mean mu of Gaussian distribution are finally obtained, and random sampling is carried out in the Gaussian distribution, so that vectors to be decoded are obtained;

s34: and (3) inputting the vector to be decoded obtained in the step (S33) into a decoder, and obtaining the reconstructed esophageal endoscope image through decoding. The decoder is also composed of a multi-layer neural network, and it should be noted that the parameters such as the dimension of the neural network and the convolution kernel of the decoder are set so that the resolution of the reconstructed image obtained by decoding is consistent with that of the original image;

s35: calculating a loss function, wherein the loss function of the model consists of three parts;

the first part is reconstruction error loss for ensuring that the original image is as similar as possible to the reconstructed image, the reconstruction error loss assuming that the resolution of the image is mxnThe calculation formula of the loss function is

Wherein x is _ij And y _ij The pixels of the original image and the reconstructed image, respectively.

The second part of the loss function is a regularization term, which is the KL divergence between the Gaussian distribution obtained by encoding and the standard normal distribution, because the prior distribution of the hidden variable is assumed to be the standard normal distribution in the variable self-encoder, the distribution obtained by encoding is similar to the standard normal distribution as much as possible, and the calculation formula of the loss function is as follows:

where σ and μ are the variance and mean, respectively, of the resulting gaussian distribution of the code.

The third term of the loss function is a clustering loss function, and is used for optimizing the distribution of memory vectors in a clustering memory module, the purpose of clustering operation is to improve the sparsity of the memory vectors in space and approach an original feature vector, the memory vector is taken as a clustering center, the original feature vector is taken as a sample to be clustered, and a scattering matrix is used for measuring a clustering result.

Assume that the clustering centers of the two classes are m respectively _i And m _j The scattering matrix between the two classes is

The total inter-class scattering matrix is:

wherein N is the number of all vectors; the center point of the whole class is

Wherein K is the number of memory vectors; for a single class, assume that the vector to be clustered (i.e., inputEigenvector of the memory module) is z, then the intra-class scattering matrix is

Wherein n is _j The number of the feature vectors is the number of the feature vectors gathered to the category; the total intra-class scattering matrix of all classes is +.>

With the help of the scattering matrix, the calculation formula of the clustering loss function can be obtained as follows

The model total loss function is a weighted sum of the three loss functions, and it should be noted that, because the multi-scale feature extraction method is adopted, gaussian distribution of a plurality of features is obtained, KL divergence is calculated with standard normal distribution respectively, so that the second loss function has a plurality of items.

The calculation formula of the total loss function is

Wherein T is the number of multi-scale encoders, lambda ₁ 、λ ₂ And lambda (lambda) ₃ The weight parameters that balance the three loss functions are respectively.

S36: the neural network is trained using the loss function obtained in S35. And repeating the steps S31 to S35 by using the data in the training set, and continuously optimizing each parameter in the neural network along the gradient descending direction of the loss function, so that the value of the loss function is continuously reduced until the value is converged to a certain value and then tends to be unchanged.

Step S4, specifically comprising:

s41: carrying out resolution adjustment on an esophageal endoscope image to be detected to obtain a plurality of images with different resolutions, which are adaptive to the dimensions of the multi-scale encoder, and respectively inputting the images into different scale encoders of the trained model;

s42: obtaining an image after model reconstruction, and calculating a reconstruction error with an original image, namely a first term in a loss function;

s43: after the reconstruction errors of all the samples to be detected are obtained, carrying out normalization processing on the data to obtain the abnormal score corresponding to each image to be detected, wherein the calculation formula is as follows

Step S5, specifically comprising:

s51: the threshold is set, and the setting of the threshold can be set according to the proportion of known anomalies in the detection sample or the requirement of an actual task. For example, if the proportion of the esophageal endoscope image health is known to be 70%, the 0.7 quantile of the anomaly score may be set as the threshold; if the task objective is to assist the doctor in detecting the patient with the disease, a smaller threshold value is set, and more abnormal samples are screened as much as possible; if the task objective is to screen normal pictures for expansion and further training of the data set, a larger threshold should be set, and each picture is fully utilized.

S52: classifying the detected pictures according to the anomaly scores obtained in the step S43 and the threshold value set in the step S51, judging samples with anomaly scores higher than the threshold value as anomalies, judging samples with anomaly scores lower than the threshold value as normal, and completing the anomaly detection task.

The process belongs to an unsupervised algorithm, and training of the model can be completed only by using healthy esophageal endoscope images, so that the difficulty in collecting data and the labeling cost are greatly reduced; all images with different characteristics from the healthy images can be detected, namely, the method has good detection effect on all esophageal diseases, and the problem that the traditional classification model is difficult to cover all esophageal diseases is effectively solved.

The multi-scale feature fusion technology is adopted during encoding, and features of the healthy esophagus image under each scale are extracted by changing the size of the input image, so that more feature information can be obtained, and a better abnormality detection effect is realized.

And a cluster memory module is introduced into the coded features, so that the coded features are not directly decoded. The input of the decoder is always the weighted sum of the memory vectors, and only the characteristics of the healthy images exist in the memory module, so that the reconstruction effect of the model on the abnormal images is reduced again, and the generalization capability of the model is inhibited. In addition, the clustering algorithm in the clustering memory module can optimize the distribution of the memory vectors in the feature space, so that the features of the healthy images can be better remembered, and the abnormality detection effect is improved.

Embodiment two:

the system for realizing the method comprises the following steps:

11. an abnormality detection system for a tube endoscope image, comprising:

The system belongs to an unsupervised algorithm, and can complete training of the model only by using healthy esophageal endoscope images, so that the difficulty in collecting data and the labeling cost are greatly reduced; and all images with different characteristics from the healthy images can be detected, namely, the method has good detection effect on all esophageal diseases, and the problem that the traditional classification model is difficult to cover all esophageal diseases is effectively solved.

Embodiment III:

the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the abnormality detection method for an esophageal endoscope image as described in the above embodiment.

The method belongs to an unsupervised algorithm, and the training of the model can be completed only by using healthy esophageal endoscope images, so that the difficulty in collecting data and the labeling cost are greatly reduced; and all images with different characteristics from the healthy images can be detected, namely, the method has good detection effect on all esophageal diseases, and the problem that the traditional classification model is difficult to cover all esophageal diseases is effectively solved.

Embodiment four:

the present embodiment provides a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps in the method for detecting an abnormality of an esophageal endoscope image according to the above embodiment when executing the program.

The steps or modules in the second to fourth embodiments correspond to the first embodiment, and the detailed description of the first embodiment may be referred to in the related description section of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An abnormality detection method for an esophageal endoscope image, comprising the steps of:

2. The method for anomaly detection of esophageal endoscope images of claim 1, wherein the cluster memory variational self-encoder for multi-scale feature fusion comprises:

3. The method for detecting abnormalities in an esophageal endoscope image according to claim 2, wherein the cluster memory module is a two-dimensional matrix having a plurality of memory vectors of the same dimension, the memory vectors memorize only features of normal samples, and the output is a weighted sum of features of only normal samples.

4. The method for detecting the abnormality of the esophageal endoscope image according to claim 2, wherein the clustering memory module is provided with a clustering algorithm, and the optimization of the distribution of the memory vectors in the feature space is completed through a scattering matrix.

5. The method of anomaly detection for an esophageal endoscope image of claim 1, wherein the preprocessing comprises partitioning a training set from a test set, wherein the training set comprises only healthy images.

6. The method for anomaly detection of esophageal endoscope images of claim 2, wherein the image reconstruction model is trained using a training set and a loss function, the loss function comprising:

7. The abnormality detection method for an esophageal endoscope image according to claim 1, wherein the abnormality score of each esophageal endoscope image is obtained from a reconstruction error between a reconstructed image and an original image, specifically:

normalizing the obtained reconstruction errors to obtain the anomaly scores corresponding to each image to be detected, wherein the formula is as follows:

8. An abnormality detection system for an esophageal endoscope image, comprising:

9. A computer-readable storage medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, realizes the steps in the abnormality detection method of an esophageal endoscope image as set forth in any one of the preceding claims 1-7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of anomaly detection of an oesophageal endoscopic image as claimed in any one of claims 1 to 7 when the program is executed.