CN113496489B - Training method of endoscope image classification model, image classification method and device - Google Patents

Training method of endoscope image classification model, image classification method and device Download PDF

Info

Publication number
CN113496489B
CN113496489B CN202111039387.8A CN202111039387A CN113496489B CN 113496489 B CN113496489 B CN 113496489B CN 202111039387 A CN202111039387 A CN 202111039387A CN 113496489 B CN113496489 B CN 113496489B
Authority
CN
China
Prior art keywords
image
images
batch
modality
endoscope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111039387.8A
Other languages
Chinese (zh)
Other versions
CN113496489A (en
Inventor
边成
李永会
杨延展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202111039387.8A priority Critical patent/CN113496489B/en
Publication of CN113496489A publication Critical patent/CN113496489A/en
Application granted granted Critical
Publication of CN113496489B publication Critical patent/CN113496489B/en
Priority to PCT/CN2022/117048 priority patent/WO2023030521A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Endoscopes (AREA)

Abstract

A training method of an endoscope image classification model, an image classification method and an image classification device are provided. The method comprises the following steps: acquiring a first set of images, the first set of images being a set of first modality imagery images of one or more objects acquired by an endoscope operating at a first modality; acquiring a second set of images, the second set of images being a set of second modality imagery images of the one or more objects acquired by an endoscope operating at a second modality different from the first modality, the second modality imagery images corresponding one-to-one with the first modality imagery images; and inputting the first image set and the second image set into the endoscope image classification model as training data sets, and training the endoscope image classification model to obtain a trained endoscope image classification model.

Description

Training method of endoscope image classification model, image classification method and device
Technical Field
The application relates to the field of artificial intelligence, in particular to a training method of an endoscope image classification model based on contrast learning, an endoscope image classification method, an endoscope image classification device and a computer readable medium.
Background
Most colorectal cancers begin with neoplasms on the surface of the colorectal intima, called polyps, while some may develop into cancer. Therefore, early detection and identification of polyp types is critical for the prevention and treatment of cancer. However, visual classification of polyps is challenging, with different endoscope illumination conditions, different textures, and appearances leading to identification difficulties.
To alleviate the burden on physicians, there have been some efforts to study the automated implementation of polyp type identification using deep learning. However, these tasks are all based on a fully supervised approach, i.e. a large amount of annotation data is required, and the cost of the annotation data consumption is huge. Furthermore, they are trained using only data from a single modality, and in fact in medical imaging, the information observed from different modalities differs but is very important.
Therefore, an improved method for training an endoscopic image classification model is desired, which can better learn the features of the image at an abstract semantic level and utilize multi-modal feature information under the condition of limited annotation data.
Disclosure of Invention
The present disclosure has been made in view of the above problems. An object of the present disclosure is to provide a training method, apparatus and computer readable medium for semi-supervised training of an endoscopic image classification model based on contrast learning.
The embodiment of the present disclosure provides a training method of an endoscope image classification model based on contrast learning, the method including: acquiring a first set of images, the first set of images being a set of first modality imagery images of one or more objects acquired by an endoscope operating at a first modality; acquiring a second set of images, the second set of images being a set of second modality imagery images of the one or more objects acquired by an endoscope operating at a second modality different from the first modality, the second modality imagery images corresponding one-to-one with the first modality imagery images; and inputting the first image set and the second image set into the endoscope image classification model as training data sets, and training the endoscope image classification model to obtain a trained endoscope image classification model.
For example, a method according to an embodiment of the present disclosure, wherein the training method is a semi-supervised training method, images of a first subset of the first set of images have labels labeling endoscopic image classes, and other images of the first set of images do not have labels labeling endoscopic image classes; and the images of the second subset in the second image set, which correspond to the images of the first subset one by one, have the same label marking the endoscope image category, and the other images of the second image set do not have the label marking the endoscope image category.
For example, a method according to an embodiment of the present disclosure, wherein the endoscopic image classification model comprises: a comparative learning submodel, the comparative learning submodel comprising: a first learning module for receiving the first set of images and learning the first set of images to obtain a first feature representation and a second feature representation of the first set of images; a second learning module for receiving the second set of images and learning the second set of images to obtain a first feature representation and a second feature representation of the second set of images; a memory queue for storing second feature representations of the first set of images generated by the first learning module and second feature representations of the second set of images generated by the second learning module; a classifier submodel comprising: a first classifier submodel for performing classification learning according to the first feature representation of the first image set generated by the first learning module to generate a classification prediction probability distribution of each image in the first image set; and the second classifier submodel is used for performing classification learning according to the first feature representation of the second image set generated by the second learning module so as to generate a classification prediction probability distribution of each image in the second image set.
For example, the method according to the embodiment of the present disclosure, wherein a first learning module includes a first encoder and a first nonlinear mapper connected in sequence, a second learning module includes a second encoder and a second nonlinear mapper connected in sequence, wherein the first encoder and the second encoder have the same structure, and the first nonlinear mapper and the second nonlinear mapper have the same structure,
a first classifier submodel comprises a first classifier connected to an output of the first encoder and a first classifier submodel comprises a second classifier connected to an output of the second encoder, wherein the first classifier and the second classifier are structurally identical.
For example, a method according to an embodiment of the present disclosure, wherein inputting the first set of images and the second set of images as a training dataset into an endoscopic image classification model comprises: at each iterative training: selecting a first batch of first modality image images from the first image set and inputting the first batch of first modality image images into the first learning module; and selecting a second batch of second modality image images which are in one-to-one correspondence with the first batch of first modality image images from the second image set, and inputting the second batch of second modality image images into the second learning module.
For example, a method according to an embodiment of the present disclosure, wherein training the endoscopic image classification model to obtain a trained endoscopic image classification model comprises: and training the endoscope image classification model until the joint loss function of the endoscope image classification model converges to obtain the trained endoscope image classification model.
For example, a method according to an embodiment of the present disclosure, wherein training the endoscopic image classification model until a joint loss function of the endoscopic image classification model converges comprises: carrying out unsupervised contrast learning by utilizing the contrast learning submodel to generate a first characteristic representation of a first batch and a second characteristic representation of the first batch for the first-batch first-mode image images and generate a first characteristic representation of a second batch and a second characteristic representation of the second batch for the second-batch second-mode image images; storing the second feature representation of the first batch and the second feature representation of the second batch into the memory queue based on a first-in-first-out rule; performing classification training by using the classifier submodel to generate a first classification prediction probability distribution for each image in the first batch of first modality image images, thereby obtaining a first classification prediction probability distribution for the first batch, and to generate a second classification prediction probability distribution for each image in the second batch of second modality image images, thereby obtaining a second classification prediction probability distribution for the second batch; calculating a joint loss function based on the second feature representation of the first batch and the second feature representation of the second batch, and the first classification prediction probability distribution of the first batch and the second classification prediction probability distribution of the second batch, and adjusting parameters of the endoscope image classification model according to the joint loss function; determining whether a trusted pseudo-tag is generated for an unlabeled image in the first batch of first modality imagery images and an unlabeled image in the second batch of second modality imagery images; if the credible pseudo labels are determined to be generated for the unlabeled images in the first batch of first-modality image images and the unlabeled images in the second batch of second-modality image images, adding the first-modality image images and the corresponding second-modality image images which generate the credible pseudo labels into the first image set and the second image set respectively to form a new first image set and a new second image set so as to update the training data set; and taking the new first image set and the new second image set as a new training data set to continuously carry out iterative training on the adjusted endoscope image classification model.
For example, a method according to an embodiment of the present disclosure, wherein if it is determined that no authentic pseudo-labels are generated for unlabeled images in the first batch of first-modality image images and unlabeled images in the second batch of second-modality image images, continuing iterative training of the adjusted endoscopic image classification model based on the first set of images and the second set of images as a training dataset.
For example, a method according to an embodiment of the present disclosure, wherein the joint loss function of the endoscope image classification model is a sum of: the loss function of the contrast learning, the loss function when performing classification training for the labeled images in the first batch of first-mode image images, and the loss function when performing classification training for the labeled images in the second batch of second-mode image images.
For example, a method according to an embodiment of the present disclosure, wherein the loss function learned for the contrast is a noise contrast estimation loss function InfoNCE, and the loss function trained for classifying labeled images in the first-batch of first-modality image images and the loss function trained for classifying labeled images in the second-batch of second-modality image images are focus loss functions.
For example, a method according to an embodiment of the present disclosure, wherein performing unsupervised contrast learning using the contrast learning submodel to generate a first feature representation of a first batch and a second feature representation of the first batch for the first batch of first-modality imagery images, and to generate a first feature representation of a second batch and a second feature representation of the second batch for the second batch of second-modality imagery images, includes: converting each image in the first batch of first modality image images into a first feature representation based on the first encoder to obtain a first feature representation of a first batch, and nonlinearly mapping each first feature representation in the first feature representation of the first batch based on a first nonlinear mapper to obtain a second feature representation of the first batch; based on the second encoder, each image in the second batch of second modality image images is converted into a first feature representation to obtain a first feature representation of the second batch, and based on a second nonlinear mapper, each first feature representation in the first feature representation of the second batch is subjected to nonlinear mapping to obtain a second feature representation of the second batch.
For example, a method according to an embodiment of the present disclosure, wherein determining whether to generate authentic pseudo-labels for unlabeled images in the first batch of first-modality imagery images and unlabeled images in the second batch of second-modality imagery images comprises: for each unlabeled first modality video image, determining a first label prediction value for the unlabeled first modality video image based on a first classification prediction probability distribution generated for the unlabeled first modality video image; and determining a second label prediction value of the unlabeled second modality video image for an unlabeled second modality video image that corresponds one-to-one with the unlabeled first modality video image based on a second classification prediction probability distribution generated for the unlabeled second modality video image; determining whether the first tag prediction value and the second tag prediction value are consistent; if not, not generating the credible pseudo label; and if the predicted value of the first label is consistent with the predicted value of the second label, fusing the predicted value of the first label and the predicted value of the second label, generating the credible pseudo label when the fused predicted value of the label is greater than a preset threshold value, and otherwise, not generating the credible pseudo label.
For example, a method according to an embodiment of the present disclosure, wherein fusing the first tag predictor and the second tag predictor comprises: and carrying out weighted average on the first label predicted value and the second label predicted value to obtain the fused label predicted value.
For example, according to a method of an embodiment of the present disclosure, the object is a polyp, and the endoscopic image is a polyp endoscopic image.
For example, a method according to an embodiment of the present disclosure, wherein the signature comprises at least one of hyperplasia, adenoma and cancer.
For example, a method according to an embodiment of the present disclosure, wherein the first modality picture image is a white light picture image and the second modality picture image is a narrowband light picture image.
For example, a method according to an embodiment of the present disclosure, wherein the first modality picture image is a white light picture image and the second modality picture image is an autofluorescence picture image.
For example, a method according to an embodiment of the present disclosure, wherein the encoder is a convolutional layer part of a residual neural network ResNet, the nonlinear mapper is composed of a two-layer multi-layer perceptron MLP, and the classifier is composed of a two-layer multi-layer perceptron MLP.
Embodiments of the present disclosure provide further an endoscope image classification method, including: acquiring an endoscope image to be identified; extracting an image feature representation of the endoscopic image based on an encoder in a trained endoscopic image classification model; inputting the extracted image feature representations into corresponding classifiers in an endoscope image classification model to obtain a classification result of the endoscope image; wherein the trained endoscope image classification model is obtained based on a training method of an endoscope image classification model based on contrast learning according to an embodiment of the disclosure.
Embodiments of the present disclosure provide further provide an endoscopic image classification system, comprising: an image acquisition section for acquiring an endoscopic image to be recognized; the processing component is used for extracting image characteristic representations of the endoscope images based on an encoder in a trained endoscope image classification model and inputting the extracted image characteristic representations into corresponding classifiers in the endoscope image classification model to obtain classification results of the endoscope images; and an output component for outputting the classification result of the image to be recognized, wherein the trained endoscope image classification model is obtained based on the training method of the endoscope image classification model based on contrast learning according to the embodiment of the disclosure.
Embodiments of the present disclosure also provide a training apparatus for an endoscope image classification model based on contrast learning, the apparatus including: an image acquisition component for acquiring a first set of images, the first set of images being a set of first modality imagery images of one or more objects acquired by an endoscope operating at a first modality; and acquiring a second set of images, the second set of images being a set of second modality imagery images of the one or more objects acquired by an endoscope operating in a second modality different from the first modality, the second modality imagery images corresponding one-to-one to the first modality imagery images; and a training component, configured to input the first image set and the second image set into the endoscope image classification model as training data sets, and train the endoscope image classification model to obtain a trained endoscope image classification model.
Embodiments of the present disclosure also provide an electronic device comprising a memory and a processor, wherein the memory has stored thereon a program code readable by the processor, which when executed by the processor performs the method according to any of the above methods.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer-executable instructions for performing the method according to any one of the above-described methods.
According to the training method of the semi-supervised endoscope image classification model based on contrast learning, a new positive and negative example selection mode is provided, and information of images in different endoscope modes is better utilized to enhance the classification accuracy of endoscope image images. In addition, unlike the traditional SimCLR-based contrast learning approach, in order to reduce the computation of the model, the embodiment of the present disclosure adds a memory queue for dynamic negative examples of storage. Finally, the embodiment of the disclosure provides a new semi-supervised learning mode, and data labels are dynamically added in a pseudo label mode to assist training, so that the labeling cost can be saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments of the present disclosure will be briefly described below. It is to be expressly understood that the drawings in the following description are directed to only some embodiments of the disclosure and are not intended as limitations of the disclosure.
FIG. 1 is a schematic diagram illustrating an architecture for applying the endoscopic image classification model training and the endoscopic image classification method in the embodiment of the present disclosure;
fig. 2 shows a schematic diagram of a conventional SimCLR-based contrast learning network architecture;
FIG. 3 shows image images of two modalities of the same polyp, shown in accordance with an embodiment of the present disclosure;
FIG. 4 shows a schematic structure of an endoscopic image classification model 400 based on contrast learning according to an embodiment of the present disclosure;
FIG. 5 shows a flow diagram of a method of training an endoscopic image classification model according to an embodiment of the present disclosure;
FIG. 6 shows a specific exemplary illustration of the implementation depicted in step S503 of FIG. 5;
FIG. 7 depicts a flow chart of an endoscopic image classification method according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram illustrating the structure of an endoscopic image classification system in an embodiment of the present disclosure;
FIG. 9 illustrates a training apparatus for an endoscopic image classification model according to an embodiment of the present disclosure; and
FIG. 10 shows a schematic diagram of a storage medium according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings, and obviously, the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without any creative effort also belong to the protection scope of the present application.
The terms used in the present specification are those general terms currently widely used in the art in consideration of functions related to the present disclosure, but they may be changed according to the intention of a person having ordinary skill in the art, precedent, or new technology in the art. Also, specific terms may be selected by the applicant, and in this case, their detailed meanings will be described in the detailed description of the present disclosure. Therefore, the terms used in the specification should not be construed as simple names but based on the meanings of the terms and the overall description of the present disclosure.
Although various references are made herein to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative and different aspects of the systems and methods may use different modules.
Flowcharts are used herein to illustrate the operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
In the diagnosis of digestive tract diseases, usually, an image of a lesion inside the digestive tract is acquired by a diagnostic tool such as an endoscope, and a medical staff determines the type of lesion by observing the lesion with the human eye. In order to reduce the burden of doctors, some efforts have been made to automatically identify the lesion type by deep learning, but these efforts are based on a fully supervised method, i.e. a large amount of labeled image data is required, and the cost for labeling the image data is enormous. Furthermore, they are trained using only data from a single modality, and in fact in medical imaging, the information observed from different modalities differs but is very important.
Therefore, the present disclosure provides a training method for an endoscope image classification model based on contrast learning, which better utilizes information of images in different endoscope modes by adopting a new positive and negative example selection manner to learn the features of the abstract semantic levels of the images, so as to enhance the classification accuracy of the endoscope image images. In addition, under the condition that the labeled data are limited, the data labels are dynamically added in a pseudo label mode to assist training, and the cost problem of manually collecting and labeling a large number of training sets is better solved.
Fig. 1 is a schematic diagram illustrating an application architecture of an endoscopic image classification model training and an endoscopic image classification method in an embodiment of the present disclosure, and includes a server 100 and a terminal device 200.
The terminal device 200 may be a medical device, and for example, the user may view the endoscope image classification result based on the terminal device 200.
The terminal device 200 and the server 100 can be connected via the internet to communicate with each other. Optionally, the internet described above uses standard communication techniques and/or protocols. The internet is typically the internet, but can be any Network including, but not limited to, Local Area Networks (LANs), Metropolitan Area Networks (MANs), Wide Area Networks (WANs), mobile, wired or wireless networks, private networks, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including Hypertext Markup Language (HTML), Extensible Markup Language (XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec), and so on. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.
The server 100 may provide various network services for the terminal device 200, wherein the server 100 may be a server, a server cluster composed of several servers, or a cloud computing center.
Specifically, the server 100 may include a processor 110 (CPU), a memory 120, an input device 130, an output device 140, and the like, the input device 130 may include a keyboard, a mouse, a touch screen, and the like, and the output device 140 may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.
Memory 120 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides processor 110 with program instructions and data stored in memory 120. In the embodiment of the present disclosure, the memory 120 may be used to store a program of an endoscopic image classification model training method or an endoscopic image classification method in the embodiment of the present disclosure.
The processor 110 is configured to execute the steps of any one of the endoscope image classification model training methods or endoscope image classification methods according to the obtained program instructions by calling the program instructions stored in the memory 120.
For example, in the embodiment of the present disclosure, the endoscopic image classification model training method or the endoscopic image classification method is mainly performed by the server 100, for example, for the endoscopic image classification method, the terminal device 200 may transmit the acquired image images of multiple modalities of a digestive tract lesion (e.g., polyp) to the server 100, perform type recognition on the lesion image by the server 100, and may return a lesion classification result to the terminal device 200.
As shown in fig. 1, the application architecture is described by taking the application to the server 100 side as an example, but it is needless to say that the endoscope image classification method in the embodiment of the present disclosure may be executed by the terminal device 200, and for example, the terminal device 200 may obtain a trained endoscope image classification model from the server 100 side, and perform type recognition on a lesion image based on the endoscope image classification model to obtain a lesion classification result, which is not limited in the embodiment of the present disclosure.
In addition, the application architecture diagram in the embodiment of the present disclosure is for more clearly illustrating the technical solution in the embodiment of the present disclosure, and does not limit the technical solution provided by the embodiment of the present disclosure, and of course, for other application architectures and business applications, the technical solution provided by the embodiment of the present disclosure is also applicable to similar problems.
The various embodiments of the present disclosure are schematically illustrated as applied to the application architecture diagram shown in fig. 1.
First, in order to make the principle of the present disclosure more clearly understood to those skilled in the art, a brief description is given below of the basic concept of comparative learning.
The contrast learning belongs to unsupervised learning and is characterized in that category label information which does not need to be labeled manually is directly used as supervision information to learn the characteristic expression of sample data and is used for downstream tasks, such as classification of types of polyp images. In contrast learning, the representation is learned by making a comparison between input samples. Contrast learning does not learn signals from a single data sample at a time, but rather learns by making comparisons between different samples. A comparison may be made between a positive pair of examples of "similar" inputs and a negative pair of examples of "different" inputs. Contrast learning is learned by simultaneously maximizing the correspondence between different transformed views (e.g., cropping, flipping, color transformation, etc.) of the same image, and minimizing the correspondence between transformed views of different images. In short, after the same image is subjected to various transformations in the comparison learning, the same image can still be identified, so that the similarity of various transformed images is maximized (because the images are obtained from the same image). Conversely, if the images are different (even though they may appear very similar through various transformations), the similarity between them is minimized. With such contrast training, the encoder (encoder) can learn higher-level generic features of the image (e.g., image-level features) rather than image-level generative models (e.g., pixel-level generation).
Fig. 2 shows a schematic diagram of a conventional SimCLR-based contrast learning network architecture.
As shown in fig. 2, the conventional SimCLR model architecture is composed of two symmetric branches (Branch) which are respectively provided with an encoder and a nonlinear mapper symmetrically. The SimCLR provides a mode for constructing positive and negative examples, and the basic idea is as follows: inputting N (N is a positive integer larger than 1) images X =of one batch
Figure 837414DEST_PATH_IMAGE001
,
Figure 754554DEST_PATH_IMAGE002
,
Figure 222707DEST_PATH_IMAGE003
,…,
Figure 617916DEST_PATH_IMAGE004
In the image of one of them
Figure 432289DEST_PATH_IMAGE005
It is transformed (image enhancement, including cropping, flipping, color transformation, and Gaussian blur, for example) randomly to obtain two images
Figure 254751DEST_PATH_IMAGE006
And
Figure 521784DEST_PATH_IMAGE007
then N images X of one batch are enhanced to obtain two batches of images
Figure 658368DEST_PATH_IMAGE008
And
Figure 327246DEST_PATH_IMAGE009
the two batches
Figure 320610DEST_PATH_IMAGE008
And
Figure 74940DEST_PATH_IMAGE009
each containing N images and there is a one-to-one correspondence between the images of the two batches. For example, imagesxTransformed data pair<
Figure 811951DEST_PATH_IMAGE006
,
Figure 522287DEST_PATH_IMAGE007
>Are positive examples of each other
Figure 952132DEST_PATH_IMAGE006
And the remaining 2N-2 images are negative examples of each other. After transformation, the enhanced image is projected into the representation space. The above branch is taken as an example to illustrate, enhancing the image
Figure 193757DEST_PATH_IMAGE006
First through a feature Encoder Encoder (typically using a Deep residual network (Deep resi)dual network, ResNet) as a model structure, here in terms of functions
Figure 468881DEST_PATH_IMAGE010
) Representative), is converted into a corresponding feature representation
Figure 112352DEST_PATH_IMAGE011
. Followed by another Non-linear transformer Non-linear Projector (consisting of a two-layer multi-layer perceptron (MLP)), here in the form of a function
Figure 385201DEST_PATH_IMAGE012
) Representative), further representing the feature
Figure 114123DEST_PATH_IMAGE011
Mapping to vectors in another space
Figure 192937DEST_PATH_IMAGE013
. Thus, pass through
Figure 690915DEST_PATH_IMAGE014
) Two non-linear transformations) the enhanced image is projected into the representation space. The process of the lower branch is similar and will not be described herein.
Unsupervised learning of image features can be achieved by calculating and maximizing the similarity between positive example mapping features and minimizing the similarity between negative example mapping features. The similarity between two enhanced images is calculated in SimCLR using cosine similarity, for the two enhanced images
Figure 462561DEST_PATH_IMAGE006
And
Figure 101615DEST_PATH_IMAGE007
in its projected representation
Figure 984121DEST_PATH_IMAGE013
And
Figure 336605DEST_PATH_IMAGE015
the cosine similarity is calculated. Ideally, an enhanced pair of images (which may be referred to herein as a positive example, for example<
Figure 279153DEST_PATH_IMAGE006
,
Figure 717088DEST_PATH_IMAGE007
>) The similarity between the pair of images and the other images in the two batches will be high and low.
The loss function for contrast learning may be defined based on the similarity between positive and negative examples, and SimCLR uses a contrast loss InfoNCE, as shown in equation (1) below:
Figure 75388DEST_PATH_IMAGE016
(1)
wherein the content of the first and second substances,
Figure 547957DEST_PATH_IMAGE017
representing the features after being subjected to the non-linear mapping,
Figure 661407DEST_PATH_IMAGE018
is shown and
Figure 586638DEST_PATH_IMAGE017
in a corresponding positive example, the first and second,
Figure 810946DEST_PATH_IMAGE019
is shown except that
Figure 872442DEST_PATH_IMAGE017
All other features of (including positive and negative examples). I denotes all pictures.
Figure 343744DEST_PATH_IMAGE020
Indicating a dot product operation.
Figure 756271DEST_PATH_IMAGE021
And representing a temperature parameter for preventing from falling into a local optimal solution in the early stage of model training and helping convergence along with the model training.
By optimizing the above contrast loss function InfoNCE, it is possible to maximize the similarity between positive examples and minimize the similarity between negative examples, and the essential features of the image can be learned in an unsupervised environment.
Conventional contrast loss models (such as the SimCLR model introduced above) have a positive example by enhancing the same image, however, image enhancement methods such as those involving cropping, flipping, color transformation, and gaussian blurring are essentially only a data augmentation of the real image, i.e. generating dummy data, which does not provide more feature information than the original image. However, such a conventional image enhancement method is not suitable for classifying endoscopic images, and due to different endoscope illumination conditions, different textures and appearances cause difficulty in recognition, and polyps, for example, have large differences in color, shape and size, and large color change between polyps and limited visibility of surface texture, so that polyp inspection based on only the image enhancement method results in a high false detection rate.
Since the information observed by different modalities is different and very important in medical imaging, the present disclosure proposes a new selection method for comparing positive and negative examples of learning in order to better learn the essential features of endoscopic imaging. Specifically, different from the conventional image enhancement-based contrast learning method, the method disclosed by the disclosure uses the image images of different modalities of the same digestive tract lesion as a pair of positive examples of contrast learning, so that richer features of the same lesion in different modalities can be obtained, and the method is more beneficial to learning essential features of the lesion. Hereinafter, the technical solutions of the embodiments of the present disclosure will be schematically described by taking a polyp image as an example. It should be noted that the technical solutions provided by the embodiments of the present disclosure are also applicable to other endoscopic images.
Fig. 3 shows a pictorial image of two modalities of the same polyp, according to an embodiment of the present disclosure.
As shown in fig. 3, the image on the left is an observation of a polyp acquired by operating the endoscope in White Light (WL) Imaging mode, and the image on the right is another observation of the same polyp acquired by operating the endoscope in Narrow Band Imaging (NBI) mode.
The broadband spectrum of white light is composed of 3 kinds of light of R/G/B (red/green/blue), and the wavelengths of the light are 605nm, 540nm and 415nm respectively. The high-brightness sharp white light endoscope image is presented in a white light imaging mode, and the observation of the structure of the mucosa deep layer is facilitated. The narrow-band optical mode uses a narrow-band filter to replace the traditional broadband filter, and limits the light with different wavelengths, and only leaves the green and blue narrow-band light waves with the wavelengths of 540nm and 415 nm. The contrast of the image blood vessels generated in the narrow-band light mode relative to the mucosa is obviously enhanced, and the image blood vessels are suitable for observing the blood vessel morphology and the mucosa structure of the mucosa surface layer. The high contrast between blood vessels and surrounding mucosa means that it is helpful to detect and characterize lesions, even suspicious lesions that show high vascularization in deeper tissue layers. The images of the capillaries are less blurred compared to white light endoscopy, and the likelihood of missing lesions is reduced.
According to one embodiment of the present disclosure, by replacing the conventional enhanced image with image images of different modalities of the same polyp (e.g., white light image and narrow band light image), richer features of the polyp can be better learned, which is beneficial for classifying the polyp images based on the learned features.
It should be understood that the modality image herein may also be any other type of modality image, such as autofluorescence image, I-SCAN image, etc., and the present disclosure is not limited thereto.
Fig. 4 shows a schematic structure of an endoscopic image classification model 400 based on contrast learning according to an embodiment of the present disclosure.
As shown in fig. 4, the structure of the endoscope image classification model 400 according to the embodiment of the present disclosure is divided into a contrast learning submodel 401 and a classifier submodel 402, and as shown in the figure, the contrast learning submodel 401 may include, for example, an upper branch and a lower branch (branch). Here, for convenience of description, the upper and lower branches may be referred to as a first learning module 401-1 and a second learning module 401-2, respectively. For example, the first learning module 401-1 includes a first encoder and a first nonlinear mapper connected in sequence, and the second learning module 401-2 includes a second encoder and a second nonlinear mapper connected in sequence.
According to an embodiment of the present disclosure, for example, the first encoder and the second encoder may have the same structure. For example, the encoder here may be a convolutional layer part of a ResNet network. For example, the first nonlinear mapper and the second nonlinear mapper may have the same structure. For example, the nonlinear mapper may be a two-layer Multilayer Perceptron (MLP).
In addition, the contrast learning submodel 401 includes a memory queue for storing feature vectors of a plurality of recently trained batches.
The other classifier submodel 402 comprises two classifiers coupled to the outputs of the two encoders in the contrast learning submodel 401, respectively, for performing further classification tasks based on the feature representations generated by the encoders.
According to one embodiment of the present disclosure, the classifiers herein may have the same structure, for example. For example, the classifier here may be a two-layered multi-layered perceptron MLP.
Those skilled in the art will appreciate that the encoder, linear mapper and classifier used herein may be replaced with other architectures and the disclosure is not limited thereto.
In the following, a method for training an endoscope image classification model and an endoscope classification method provided according to at least one embodiment of the present disclosure are described in a non-limiting manner by using several examples or embodiments, and as described below, different features of these specific examples or embodiments may be combined with each other without mutual conflict, so as to obtain new examples or embodiments, which also belong to the scope of protection of the present disclosure.
Currently, the mainstream methods for automatically identifying polyps based on deep learning are mostly full-supervised learning methods, and such methods rely on manually labeled labels. However, in practice, the obtained polyp images are unmarked, and the cost for marking the data is enormous. Therefore, the present disclosure proposes a semi-supervised training mode, which assists training by dynamically adding data labels in a pseudo label mode. In addition, by using the image images of different modalities of the same polyp, more abundant feature information can be extracted.
Fig. 5 shows a flowchart of a method of training an endoscopic image classification model according to an embodiment of the present disclosure. The endoscopic image classification model is, for example, the endoscopic image classification model 400 as described above with reference to fig. 4. For example, the training method of the endoscope image classification model 400 may be performed by a server, which may be the server 100 shown in fig. 1.
First, in step S501, a first set of images, which is a set of first-modality video images of one or more subjects acquired through an endoscope operating in a first modality, is acquired. Next, in S503, a second set of images is acquired, which is a set of second-modality picture images of the one or more subjects acquired through the endoscope operating in a second modality different from the first modality, the second-modality picture images being in one-to-one correspondence with the first-modality picture images.
For example, one or more of the objects herein may be polyps. For example, the first modality image may be a white light image, and the second modality image may be a narrow band light image. Of course, other modalities of imaging may be used, such as white light imaging for the first modality, autofluorescence imaging or I-SCAN imaging for the second modality, and so on, which is not limited by the present disclosure. For example, the multi-modal images may be obtained by operating an endoscope, may be obtained by downloading through a network, or may be obtained by other ways, which is not limited in this embodiment of the present disclosure.
It should be understood that embodiments of the present disclosure may also be equally applicable to image classification of other digestive tract lesions besides polyps, such as inflammation, ulcers, vascular malformations, and diverticula, etc., and the present disclosure is not limited thereto.
For example, to mimic the reality of real polyp data lacking labels, where a large amount of data in the first and second sets is unlabeled, since the first modality images in the first set and the second modality images in the second set are in one-to-one correspondence, the presence or absence of a label is also in one-to-one correspondence. For example, according to the embodiments of the present disclosure, polyps can be classified herein according to NICE classification criteria as hyperplastic polyps, adenomas (including mucosal carcinoma and submucous superficial invasive carcinoma), submucous deep invasive carcinoma, where we can label the training data briefly as hyperplasia, adenomas, and cancer.
For example, in one implementation of a method for training an endoscopic image classification model according to an embodiment of the present disclosure, the first and second sets of data may include 1302 sheets of white light imagery images and corresponding 1302 sheets of narrow band light imagery images, respectively. In order to adapt to the situation that a large amount of labels are absent in the real data set, 90% of labels can be randomly removed, and only 10% of labels are reserved, so that semi-supervised learning is realized.
It should be understood that the number of data sets and the label ratio used for training the endoscope image classification model according to the embodiment of the present disclosure may be adjusted according to actual situations, and the present disclosure does not limit this. For unlabeled video images, the embodiments of the present disclosure dynamically add data labels to assist training based on the way of pseudo labels, and details will be described later with reference to fig. 6.
Next, in step S505, the first image set and the second image set are input as training data sets into the endoscope image classification model, and the endoscope image classification model is trained to obtain a trained endoscope image classification model.
As is well known to those skilled in the art, machine learning algorithms typically rely on a process of maximizing or minimizing an objective function, often referred to as a loss function. For example, in the training method of the endoscope image classification model according to the embodiment of the present disclosure, training the endoscope image classification model to obtain the trained endoscope image classification model may include: and training the endoscope image classification model until the joint loss function of the endoscope image classification model converges to obtain the trained endoscope image classification model.
As described above, in the conventional contrast learning, at each iterative training, N images are randomly selected from the training set to form a batch, and for each image in the batch, a positive example is constructed by the above image enhancement method, that is, two image enhancement views are generated for each image. Thus, two batches of images, each comprising N images, are generated, with a one-to-one correspondence between the images of the two batches, where each pair of images is an enhanced view of the same original image. In the conventional contrast learning, 2N images of two batches are obtained by performing an image enhancement technique based on the original images, but the data generated in this way are false data. Accordingly, the disclosed embodiments utilize two different modality image images of the same digestive tract lesion (e.g., polyp) instead of two enhanced views in conventional contrast learning, which may provide a richer representation of the features of the polyp, so that a well trained network based on such a training set can more accurately classify the polyp.
For example, at each iterative training, a first batch of first modality image images is selected from the first image set and input into the first learning module 401-1 of fig. 4; and selecting a second batch of second modality image images corresponding to the first batch of first modality image images one by one from the second image set, and inputting the second batch of second modality image images into the second learning module 401-2 of fig. 4.
According to the endoscope classification method based on contrast learning, a new positive and negative example selection mode is adopted, information of images in different endoscope modes is better utilized, the characteristic of the abstract semantic level of the images is learned, and the classification accuracy of the endoscope images is enhanced. Under the condition that the labeled data are limited, data labels are dynamically added for assisting training in a pseudo label mode, and the cost problem of manually collecting and labeling a large number of training sets is better solved.
Referring to fig. 6, the implementation described in step S505 is specifically and exemplarily illustrated in conjunction with the endoscopic image model 400 shown in fig. 4.
As shown in fig. 6, in step S601, unsupervised contrast learning is performed by using the contrast learning submodel to generate a first feature representation of a first batch and a second feature representation of the first batch for the first batch of first-modality image images, and to generate a first feature representation of a second batch and a second feature representation of the second batch for the second batch of second-modality image images.
For example, the comparative learning process here is generally similar to the conventional SimCLR learning process described above. Specifically, referring to fig. 4, taking the first learning module 401-1 (i.e., the upper branch) as an example, after a first batch of first-modality image images is selected from the first image set and input into the first learning module 401-1, the first encoder converts each image in the first batch of first-modality image images into a first feature representation to obtain a first feature representation of the first batch, and then performs a non-linear mapping on each first feature representation in the first feature representation of the first batch based on the first non-linear mapper to obtain a second feature representation of the first batch. The first characteristic expression herein may be, for example, as described above
Figure 784270DEST_PATH_IMAGE011
The second characteristic expression herein may be, for example, the one described above
Figure 965852DEST_PATH_IMAGE013
The process of the second learning module (i.e., the lower branch) is the same as that of the first learning module, and after the second batch of second-modality image images is selected from the second image set and input into the second learning module 401-2, each image in the second batch of second-modality image images is converted into the first feature representation based on the second encoder to obtain the first feature representation of the second batch, and then each first feature representation in the first feature representation of the second batch is subjected to nonlinear mapping based on the second nonlinear mapper to obtain the second feature representation of the second batch.
For example, unsupervised contrast learning according to embodiments of the present disclosure employs the unsupervised contrast loss function InfoNCE described above as the loss function. For example, the contrast-learned loss function InfoNCE is based on a similarity between the second feature representation of the first batch and the second feature representation of the second batch and a similarity between the second feature representation of the first batch and a plurality of second feature representations stored in a memory queue generated during a previous iteration of training.
In step 603, the second feature representation of the first lot and the second feature representation of the second lot are stored in the memory queue based on a first-in-first-out rule.
As described above, the conventional SimCLR takes 2N-2 pictures except for two enhanced views of the current picture within the input two batches of 2N pictures as negative examples at each iterative training. Unlike conventional SimCLR, the disclosed embodiments also add a memory queue for storing image features of previously trained batch images (e.g., the first batch second feature representation and the second batch second feature representation) as more negative examples, which facilitates extracting good features, since more negative examples can more effectively cover the underlying distribution, thereby giving a better training signal. For example, the memory queue uses a first-in-first-out based rule, that is, the memory queue is dynamic, and after a new training feature batch is queued, the oldest training feature batch is dequeued.
In step S605, a classification training is performed by using the classifier sub-model to generate a first classification prediction probability distribution for each image in the first batch of first-modality image images, so as to obtain a first classification prediction probability distribution in the first batch, and to generate a second classification prediction probability distribution for each image in the second batch of second-modality image images, so as to obtain a second classification prediction probability distribution in the second batch.
As shown in fig. 4, the outputs of the two encoders of the contrast learning subnetwork are connected to two classifiers, respectively, e.g. a first classifier may receive the first feature representation of a first batch from the first encoder and a second classifier may receive the first feature representation of a second batch from the second encoder. In this way, the first classifier and the second classifier may be used for classification training based on the received feature representations.
Here the classifier outputs a prediction probability distribution for each input image. In particular, the first classifier outputs a predicted probability distribution for each image of the first batch of first modality image images based on the first characterization of the first batch received from the first encoder. Similarly, the second classifier outputs a predicted probability distribution for each of the second batch of second modality image images based on the second batch of first feature representations received from the second encoder. For example, assuming we need to classify polyps as hyperplastic, adenoma, cancer, when inputting an image labeled as hyperplastic, if the output probability distribution of the classifier is: [0.6,0.3,0.1], it means that the classifier predicts that the image has a probability of hyperplasia of 0.6, adenoma of 0.3, and cancer of 0.1.
For a labeled image, a loss function for classification training may be determined based on the true label and the prediction probability distribution of the image. Although classification prediction is also performed on the unlabeled image, the prediction result is only used for determining a pseudo label for the unlabeled image subsequently, and a training set is added after the pseudo label is determined to be used as labeled data for subsequent iterative training, so that a loss value does not need to be calculated for the unlabeled image. This process will be described in more detail in subsequent paragraphs.
For example, due to the imbalance of polyp distributions, embodiments of the present disclosure may use a focal loss (focal loss) function as the loss function for classification training, as shown in equation (2) below.
Figure 155525DEST_PATH_IMAGE022
(2)
Wherein the content of the first and second substances,
Figure 993031DEST_PATH_IMAGE023
in order to predict the probability distribution,
Figure 559142DEST_PATH_IMAGE024
and is an adjustable weight.
Of course, other types of loss functions, such as cross-entropy loss functions, may be adopted according to the distribution of the training set, and the disclosure is not limited thereto.
For example, the focus loss function determined by the classification training for the white light image is determined as
Figure 860810DEST_PATH_IMAGE025
Determining the focus loss function determined by the classification training of the narrow-band light image as
Figure 221384DEST_PATH_IMAGE026
In step S607, a joint loss function is calculated based on the second feature representation of the first lot and the second feature representation of the second lot, and the first classification prediction probability distribution of the first lot and the second classification prediction probability distribution of the second lot, and parameters of the endoscopic image classification model are adjusted according to the joint loss function.
For example, the joint loss function herein may be determined as the sum of the loss function of the contrast learning submodel and the loss function of the classifier submodel, as shown in equation (3) below:
Figure 608503DEST_PATH_IMAGE027
=
Figure 666720DEST_PATH_IMAGE028
(3)
accordingly, the endoscopic image model shown in fig. 4 may be parametrically adjusted based on the joint loss function described above such that the joint loss function is ultimately minimized as iterative training continues.
In step S609, it is determined whether a trusted pseudo tag is generated for the unlabeled image in the first-batch first-modality video images and the unlabeled image in the second-batch second-modality video images.
As mentioned above, since there are a lot of label-missing cases in the real dataset, a semi-supervised training method is proposed herein, in which a credible pseudo label is generated for unlabeled data during the training process and added to the training set to continue the training as labeled data.
For example, an authentic pseudo-label may be generated for each pair of input images in conjunction with the two classifier outputs. As described above, the first classifier generates a first predicted probability distribution for a first batch of white light image images and the second classifier generates a second predicted probability distribution for a second batch of narrowband light image images. For unlabeled images, a label prediction value is first determined based on the prediction probability distribution. For example, for one of the unlabeled white light image images in the first batch of white light image images, the predicted probability distribution generated by the first classifier for the unlabeled white light image is 60% of hyperplasia, 20% of adenoma and 10% of cancer, and the probability value (for example, 60%) of the category (for example, hyperplasia) with the highest probability can be selected as the label predicted value corresponding to the current unlabeled image. For example, for one unlabeled narrowband photo image corresponding to one unlabeled image in the first batch of white light image images, the predicted probability distributions generated by the second classifier for the unlabeled narrowband photo image are 60% hyperplasia, 10% adenoma and 20% cancer, and the probability value (for example, 60% in this case) of the class (for example, hyperplasia) with the highest probability can be selected as the label predicted value corresponding to the current unlabeled narrowband photo image. And judging whether the label predicted values generated by the two classifiers are the same or not for the one-to-one corresponding label-free images. If not, no authentic pseudo-label is generated for the pair of images. If the label prediction values generated by the two classifiers are the same (for example, the two label prediction values are both 60%), the two label prediction values are fused. For example, the two corresponding label prediction values may be linearly added and then divided by 2, and of course, other data fusion manners may also be used, which is not limited by the present disclosure. The trusted pseudo tag is generated when the fused tag prediction value is greater than a predetermined threshold (e.g., 0.85), and is not generated if less than the threshold.
Next, in step S611, if it is determined that an authentic pseudo tag is generated for an unlabeled image in the first batch of first-modality image images and an unlabeled image in the second batch of second-modality image images, the first-modality image images and the corresponding second-modality image images, which generate the authentic pseudo tag, are added to the first image set and the second image set, respectively, to form a new first image set and a new second image set, so as to update the training data set.
Finally, in step S613, iterative training is continued on the adjusted endoscope image classification model using the new first image set and the new second image set as a new training data set.
And continuously optimizing the joint loss function in the training process to minimize and converge the joint loss function, namely determining that the training of the image classification model is finished. Of course, if no pseudo-label is generated for any unlabeled image in the first batch of first video images and any unlabeled image in the second batch of second video images, the next iterative training is still performed based on the original first and second image sets as training sets.
According to the endoscope classification method based on contrast learning, a new positive and negative example selection mode is adopted, information of images in different endoscope modes is better utilized, the characteristic of the abstract semantic level of the image is learned, and the classification accuracy rate of the white light image is enhanced. Meanwhile, a dynamic storage queue is added on the traditional SimCLR model for comparison and learning to store more negative samples, so that the bottom layer distribution is more effectively covered, and a better training effect is given. In addition, under the condition that the labeled data are limited, the data labels are dynamically added in a pseudo label mode to assist training, and the cost problem of manually collecting and labeling a large number of training sets is better solved.
Based on the endoscope image classification model trained in the above way, the embodiment of the disclosure also provides an endoscope image classification method. Taking an image to be recognized as a white light image as an example, a flowchart of an endoscopic image classification method in an embodiment of the present disclosure is described with reference to fig. 7, where the method includes:
in step S701, an endoscopic image to be recognized is acquired.
For example, if the trained image classification model is for polyp type recognition, the acquired endoscopic image to be recognized is the acquired polyp image.
Through the method for training the endoscope image classification model in the above embodiment, the embodiment of the present disclosure performs classification of endoscope images only by using the encoder and the classifier in the trained endoscope image classification model. Because the image images of different modes can complement each other in characteristics to assist in identification. For example, if the upper and lower branches are trained based on white light imagery and narrowband light imagery, respectively, embodiments of the present disclosure utilize an encoder and classifier in the upper branch or an encoder and classifier in the lower branch, respectively, based on whether the identified endoscopic image belongs to a white light imagery or a narrowband light imagery.
In step S703, an image feature representation of the endoscopic image is extracted based on an encoder in the trained endoscopic image classification model. The encoder here may be, for example, a ResNet101 network. The specific feature representation extraction process is well known to those skilled in the art and will not be described herein.
In step S705, the extracted image feature representations are input to the corresponding classifiers in the classification model of the endoscopic image, and the classification result of the endoscopic image is obtained.
The encoder and the classifier are obtained by mutually assisting and training endoscopic images of different modalities of the same lesion. Specifically, for example, the encoder and the classifier in the upper branch for classifying the white light image are obtained by performing assistant training on the encoder and the classifier in the lower branch based on the narrowband light image, so that the encoder and the classifier in the upper branch can achieve more accurate and reliable classification results when classifying the white light image. For example, when a white light image obtained by an endoscope operating in a white light mode is identified by using the trained endoscope image classification model of the present disclosure, the white light image may be input to a first encoder in an upper branch of the trained endoscope image classification model to extract a first feature representation, and the first feature representation may be input to a first classifier connected to the first encoder to perform classification identification. For example, for a white light image of an acquired adenoma, the first classifier can output predicted probability distributions of hyperplasia 10%, adenoma 80%, and cancer 10%.
Similarly, the encoder and the classifier in the lower branch can achieve a more accurate and reliable classification result when classifying the narrow-band optical image, and the description is omitted here. In addition, if the trained endoscope image classification model is learned based on other modality images, for example, the first modality image is an autofluorescence image, and the second modality image is an I-SCAN image, the encoder and the classifier connected thereto in the upper branch of the trained endoscope image classification model achieve a more accurate and reliable classification result when classifying the autofluorescence image, and the encoder and the classifier connected thereto in the lower branch achieve a more accurate and reliable classification result when classifying the I-SCAN image.
Based on the above embodiments, referring to fig. 8, a schematic structural diagram of an endoscopic image classification system 800 according to an embodiment of the present disclosure is shown. The endoscopic image classification system 800 includes at least an image acquisition component 801, a processing component 802, and an output component 803. In the embodiment of the present disclosure, the image acquisition component 801, the processing component 802, and the output component 803 are related medical devices, and may be integrated in the same medical device, or may be divided into a plurality of devices, which are connected to communicate with each other to form a medical system for use, for example, for diagnosing digestive tract diseases, the image acquisition component 801 may be an endoscope, and the processing component 802 and the output component 803 may be computer devices communicating with the endoscope.
Specifically, the image acquisition section 801 is used to acquire an image to be recognized. The processing component 802 is configured to extract image feature information of an image to be recognized, and obtain a lesion classification result of the image to be recognized based on the feature information of the image to be recognized. The output section 803 is used to output the classification result of the image to be recognized.
Fig. 9 shows a training apparatus of an endoscopic image classification model according to an embodiment of the present disclosure, which specifically includes a training data set acquisition component 901 and a training component 903.
The training data set acquisition section 901 is configured to: acquiring a first set of images, the first set of images being a set of first modality imagery images of one or more objects acquired by an endoscope operating at a first modality; and acquiring a second set of images, the second set of images being a set of second modality imagery images of the one or more objects acquired by an endoscope operating in a second modality different from the first modality, the second modality imagery images corresponding one-to-one to the first modality imagery images; and training component 903 for: and inputting the first image set and the second image set into the endoscope image classification model as training data sets, and training the endoscope image classification model to obtain a trained endoscope image classification model.
For example, the training component 903 is a semi-supervised training component, images of a first subset of the first set of images have labels labeling endoscopic image classes, and other images of the first set of images have no labels labeling endoscopic image classes; and the images of the second subset in the second image set, which correspond to the images of the first subset one by one, have the same label marking the endoscope image category, and the other images of the second image set do not have the label marking the endoscope image category.
For example, wherein the endoscope image classification model comprises: a comparative learning submodel, the comparative learning submodel comprising: a first learning module for receiving the first set of images and learning the first set of images to obtain a first feature representation and a second feature representation of the first set of images; a second learning module for receiving the second set of images and learning the second set of images to obtain a first feature representation and a second feature representation of the second set of images; a memory queue for storing second feature representations of the first set of images generated by the first learning module and second feature representations of the second set of images generated by the second learning module; a classifier submodel comprising: a first classifier submodel for performing classification learning according to the first feature representation of the first image set generated by the first learning module to generate a classification prediction probability distribution of each image in the first image set; and the second classifier submodel is used for performing classification learning according to the first feature representation of the second image set generated by the second learning module so as to generate a classification prediction probability distribution of each image in the second image set.
For example, wherein a first learning module comprises a first encoder and a first nonlinear mapper connected in sequence, a second learning module comprises a second encoder and a second nonlinear mapper connected in sequence, wherein the first encoder and the second encoder have the same structure and the first nonlinear mapper and the second nonlinear mapper have the same structure, a first classifier submodel comprises a first classifier connected to an output of the first encoder, and a first classifier submodel comprises a second classifier connected to an output of the second encoder, wherein the first classifier and the second classifier have the same structure.
For example, the training component 903 includes an input component 903_1 that, at each iteration of training: the input component 903_1 selects a first batch of first modality image images from the first image set, and inputs the first batch of first modality image images into the first learning module; and the input component 903_1 selects a second batch of second modality image images corresponding to the first batch of first modality image images one by one from the second image set, and inputs the second batch of second modality image images into the second learning module.
For example, the training component 903 training the endoscope image classification model to obtain a trained endoscope image classification model includes: the training component 903 trains the endoscope image classification model until the joint loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model.
For example, the training component 903 further comprises: an unsupervised learning component 903_2, configured to perform unsupervised contrast learning by using the contrast learning submodel to generate a first feature representation of a first batch and a second feature representation of the first batch for the first-batch first-modality image images, and generate a first feature representation of a second batch and a second feature representation of the second batch for the second-batch second-modality image images; a storage unit 903_3 for storing the second characteristic representation of the first batch and the second characteristic representation of the second batch in the memory queue based on a first-in-first-out rule; a classification training component 903_4, configured to perform classification training using the classifier submodel to generate a first classification prediction probability distribution for each image in the first batch of first-modality image images, so as to obtain a first classification prediction probability distribution in the first batch, and generate a second classification prediction probability distribution for each image in the second batch of second-modality image images, so as to obtain a second classification prediction probability distribution in the second batch; a parameter adjusting unit 903_5 that calculates a joint loss function based on the second feature representation of the first lot and the second feature representation of the second lot, and the first classification prediction probability distribution of the first lot and the second classification prediction probability distribution of the second lot, and adjusts a parameter of the endoscopic image classification model according to the joint loss function; a trusted pseudo tag determination unit 903_6 that determines whether or not trusted pseudo tags are generated for the non-tag images in the first-batch first-modality video images and the non-tag images in the second-batch second-modality video images; a training data set updating component 903_7, configured to, if it is determined that a trusted pseudo label is generated for an unlabeled image in the first batch of first-modality image images and an unlabeled image in the second batch of second-modality image images, add the first-modality image and the corresponding second-modality image that generate the trusted pseudo label to the first image set and the second image set respectively to form a new first image set and a new second image set, so as to update a training data set; and the training component 903 continues to iteratively train the adjusted endoscope image classification model using the new first image set and the new second image set as a new training data set.
For example, if the trusted pseudo-label determination component 903_6 determines that no trusted pseudo-label is generated for the unlabeled image in the first batch of first-modality image images and the unlabeled image in the second batch of second-modality image images, then iterative training of the adjusted endoscopic image classification model continues based on the first set of images and the second set of images as training data sets.
For example, the joint loss function of the endoscope image classification model is the sum of the following loss functions: the loss function of the contrast learning, the loss function when performing classification training for the labeled images in the first batch of first-mode image images, and the loss function when performing classification training for the labeled images in the second batch of second-mode image images.
For example, the loss function learned for the contrast is a noise contrast estimation loss function InfoNCE, and the loss function trained for classifying the labeled images in the first-batch first-modality image images and the loss function trained for classifying the labeled images in the second-batch second-modality image images are focus loss functions.
For example, performing unsupervised contrast learning using the contrast learning submodel to generate a first batch of first feature representations and a first batch of second feature representations for the first batch of first-modality imagery images, and a second batch of first feature representations and a second batch of second feature representations for the second batch of second-modality imagery images includes: converting each image in the first batch of first modality image images into a first feature representation based on the first encoder to obtain a first feature representation of a first batch, and nonlinearly mapping each first feature representation in the first feature representation of the first batch based on the first nonlinear mapper to obtain a second feature representation of the first batch; and converting each image in the second batch of second modality image images into a first feature representation based on the second encoder to obtain a first feature representation of the second batch, and performing nonlinear mapping on each first feature representation in the first feature representation of the second batch based on the second nonlinear mapper to obtain a second feature representation of the second batch.
For example, wherein the trusted pseudo tag determining component 903_6 determines whether to generate a trusted pseudo tag for an unlabeled image in the first batch of first-modality imagery images and an unlabeled image in the second batch of second-modality imagery images comprises: for each unlabeled first modality video image, determining a first label prediction value for the unlabeled first modality video image based on a first classification prediction probability distribution generated for the unlabeled first modality video image; and determining a second label prediction value of the unlabeled second modality video image for an unlabeled second modality video image that corresponds one-to-one with the unlabeled first modality video image based on a second classification prediction probability distribution generated for the unlabeled second modality video image; determining whether the first tag prediction value and the second tag prediction value are consistent; if not, not generating the credible pseudo label; and if the predicted value of the first label is consistent with the predicted value of the second label, fusing the predicted value of the first label and the predicted value of the second label, generating the credible pseudo label when the fused predicted value of the label is greater than a preset threshold value, and otherwise, not generating the credible pseudo label.
For example, the fusing the first label prediction value and the second label prediction value by the trusted pseudolabel determination component 903_6 includes: and carrying out weighted average on the first label predicted value and the second label predicted value to obtain the fused label predicted value.
For example, the object is a polyp, and the endoscopic image is a polyp endoscopic image.
For example, wherein the signature comprises at least one of hyperplasia, adenoma, and cancer.
For example, the first modality picture image is a white light picture image and the second modality picture image is a narrow band light picture image.
Based on the above embodiments, the embodiments of the present disclosure also provide electronic devices of another exemplary implementation. In some possible embodiments, an electronic device in the embodiments of the present disclosure may include a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor may implement the steps of the endoscope image classification model training method or the endoscope image recognition method in the embodiments described above when executing the program.
For example, taking an electronic device as the server 100 in fig. 1 of the present disclosure as an example for explanation, a processor in the electronic device is the processor 110 in the server 100, and a memory in the electronic device is the memory 120 in the server 100.
Embodiments of the present disclosure also provide a computer-readable storage medium. Fig. 10 shows a schematic diagram 1000 of a storage medium according to an embodiment of the disclosure. As shown in fig. 10, the computer-readable storage medium 1000 has stored thereon computer-executable instructions 1001. When the computer-executable instructions 1001 are executed by a processor, the training method of the contrast learning-based endoscopic image classification model and the endoscopic image classification method according to the embodiments of the present disclosure described with reference to the above drawings may be performed. The computer-readable storage medium includes, but is not limited to, volatile memory and/or non-volatile memory, for example. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.
Embodiments of the present disclosure also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the training method of the contrast learning-based endoscopic image classification model and the endoscopic image classification method according to the embodiments of the present disclosure.
Those skilled in the art will appreciate that the disclosure of the present disclosure is susceptible to numerous variations and modifications. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.
Further, while the present disclosure makes various references to certain elements of a system according to embodiments of the present disclosure, any number of different elements may be used and run on a client and/or server. The units are illustrative only, and different aspects of the systems and methods may use different units.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present disclosure is not limited to any specific form of combination of hardware and software.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The foregoing is illustrative of the present disclosure and is not to be construed as limiting thereof. Although illustrative embodiments of the present disclosure have been described, those skilled in the art will readily appreciate that many modifications are possible in the illustrative embodiments without materially departing from the novel teachings and advantages of the present disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims. It is to be understood that the foregoing is illustrative of the present disclosure and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The present disclosure is defined by the claims and their equivalents.

Claims (20)

1. A method of training an endoscopic image classification model based on contrast learning, the method comprising:
acquiring a first set of images, the first set of images being a set of first modality imagery images of one or more objects acquired by an endoscope operating at a first modality;
acquiring a second set of images, the second set of images being a set of second modality imagery images of the one or more objects acquired by an endoscope operating at a second modality different from the first modality, the second modality imagery images corresponding one-to-one with the first modality imagery images; and
inputting the first image set and the second image set into the endoscope image classification model as training data sets, training the endoscope image classification model until a joint loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model,
wherein training the endoscope image classification model until a joint loss function of the endoscope image classification model converges comprises:
carrying out unsupervised contrast learning by utilizing a contrast learning submodel to generate a first characteristic representation of a first batch and a second characteristic representation of the first batch for a first-batch first-mode image, and generate a first characteristic representation of a second batch and a second characteristic representation of the second batch for a second-batch second-mode image;
storing the second feature representation of the first batch and the second feature representation of the second batch into a memory queue based on a first-in-first-out rule;
performing classification training by using a classifier sub-model to generate a first classification prediction probability distribution for each image in the first batch of first modality image images so as to obtain a first classification prediction probability distribution of the first batch, and generate a second classification prediction probability distribution for each image in the second batch of second modality image images so as to obtain a second classification prediction probability distribution of the second batch;
calculating a joint loss function based on the second feature representation of the first batch and the second feature representation of the second batch, and the first classification prediction probability distribution of the first batch and the second classification prediction probability distribution of the second batch, and adjusting parameters of the endoscope image classification model according to the joint loss function;
determining whether a trusted pseudo-tag is generated for an unlabeled image in the first batch of first modality imagery images and an unlabeled image in the second batch of second modality imagery images;
if the credible pseudo labels are determined to be generated for the unlabeled images in the first batch of first-modality image images and the unlabeled images in the second batch of second-modality image images, adding the first-modality image images and the corresponding second-modality image images which generate the credible pseudo labels into the first image set and the second image set respectively to form a new first image set and a new second image set so as to update the training data set; and
using the new first image set and the new second image set as a new training data set to continuously carry out iterative training on the adjusted endoscope image classification model,
wherein determining whether to generate a trusted pseudo-tag for unlabeled images in the first batch of first-modality imagery images and unlabeled images in the second batch of second-modality imagery images comprises:
for each unlabeled first modality video image, determining a first label prediction value for the unlabeled first modality video image based on a first classification prediction probability distribution generated for the unlabeled first modality video image; and
determining a second label prediction value of the unlabeled second modality video image based on a second classification prediction probability distribution generated for the unlabeled second modality video image for an unlabeled second modality video image that corresponds one-to-one with the unlabeled first modality video image;
determining whether the first tag prediction value and the second tag prediction value are consistent;
if not, not generating the credible pseudo label;
and if the predicted value of the first label is consistent with the predicted value of the second label, fusing the predicted value of the first label and the predicted value of the second label, generating the credible pseudo label when the fused predicted value of the label is greater than a preset threshold value, and otherwise, not generating the credible pseudo label.
2. The method of claim 1, wherein the training method is a semi-supervised training method, images of a first subset of the first set of images having labels labeling endoscopic image classes, and other images of the first set of images having no labels labeling endoscopic image classes; and
the images of the second subset in the second image set, which correspond to the images of the first subset one by one, have the same label marking the endoscope image category, and the other images of the second image set do not have the label marking the endoscope image category.
3. The method of claim 1 or 2, wherein the endoscopic image classification model comprises:
a comparative learning submodel, the comparative learning submodel comprising:
a first learning module for receiving the first set of images and learning the first set of images to obtain a first feature representation and a second feature representation of the first set of images;
a second learning module for receiving the second set of images and learning the second set of images to obtain a first feature representation and a second feature representation of the second set of images; and
a memory queue for storing second feature representations of the first set of images generated by the first learning module and second feature representations of the second set of images generated by the second learning module;
a classifier submodel comprising:
a first classifier submodel for performing classification learning according to the first feature representation of the first image set generated by the first learning module to generate a classification prediction probability distribution of each image in the first image set; and
and the second classifier submodel is used for performing classification learning according to the first feature representation of the second image set generated by the second learning module so as to generate a classification prediction probability distribution of each image in the second image set.
4. The method of claim 3, wherein
The first learning module comprises a first coder and a first nonlinear mapper which are connected in sequence,
the second learning module comprises a second coder and a second nonlinear mapper which are connected in sequence, wherein the first coder and the second coder have the same structure, and the first nonlinear mapper and the second nonlinear mapper have the same structure,
the first classifier submodel comprises a first classifier connected to an output of the first encoder, an
The first classifier submodel includes a second classifier connected to an output of the second encoder, wherein the first classifier and the second classifier are structurally identical.
5. The method of claim 4, wherein inputting the first set of images and the second set of images as a training data set into an endoscopic image classification model comprises:
at each iterative training:
selecting a first batch of first modality image images from the first image set and inputting the first batch of first modality image images into the first learning module; and
and selecting second-batch second-mode image images which correspond to the first-batch first-mode image images one by one from the second image set, and inputting the second-batch second-mode image images into the second learning module.
6. The method of claim 1, wherein if it is determined that authentic pseudo-labels are not generated for unlabeled images in the first batch of first-modality image images and unlabeled images in the second batch of second-modality image images, continuing iterative training of the adjusted endoscopic image classification model based on the first set of images and the second set of images as a training data set.
7. The method of claim 1, wherein the joint loss function of the endoscope image classification model is a sum of:
the loss function of the contrast learning, the loss function when performing classification training for the labeled images in the first batch of first-mode image images, and the loss function when performing classification training for the labeled images in the second batch of second-mode image images.
8. The method of claim 7, wherein the loss function learned for the comparison is a noise contrast estimate loss function, InfonCE,
the loss function for classification training of the labeled images in the first batch of first modality image images and the loss function for classification training of the labeled images in the second batch of second modality image images are focus loss functions.
9. The method of claim 5, wherein performing unsupervised contrast learning with the contrast learning submodel to generate a first batch of first feature representations and a first batch of second feature representations for the first batch of first modality imagery images and a second batch of first feature representations and a second batch of second feature representations for the second batch of second modality imagery images comprises:
converting each image in the first batch of first modality image images into a first feature representation based on the first encoder to obtain a first feature representation of a first batch, and nonlinearly mapping each first feature representation in the first feature representation of the first batch based on the first nonlinear mapper to obtain a second feature representation of the first batch; and
based on the second encoder, each image in the second batch of second modality image images is converted into a first feature representation to obtain a first feature representation of the second batch, and based on the second nonlinear mapper, each first feature representation in the first feature representation of the second batch is subjected to nonlinear mapping to obtain a second feature representation of the second batch.
10. The method of claim 1, wherein fusing the first tag predictor and the second tag predictor comprises:
and carrying out weighted average on the first label predicted value and the second label predicted value to obtain the fused label predicted value.
11. The method of claim 1, wherein the object is a polyp and the endoscopic image is a polyp endoscopic image.
12. The method of claim 2, wherein the signature comprises at least one of hyperplasia, adenoma, and cancer.
13. The method of claim 2, wherein the first modality picture image is a white light picture image and the second modality picture image is a narrowband light picture image.
14. The method of claim 2, wherein the first modality imagery image is a white light imagery image and the second modality imagery image is an autofluorescence imagery image.
15. The method of claim 4, wherein the encoder is a convolutional layer portion of a residual neural network ResNet, the nonlinear mapper is comprised of a two-layer multi-layer perceptron MLP, and the classifier is comprised of a two-layer multi-layer perceptron MLP.
16. An endoscopic image classification method comprising:
acquiring an endoscope image to be identified;
extracting an image feature representation of the endoscopic image based on an encoder in a trained endoscopic image classification model;
inputting the extracted image feature representation into a corresponding classifier in a trained endoscope image classification model to obtain a classification result of the endoscope image;
wherein the trained endoscopic image classification model is obtained based on the training method of the contrast learning based endoscopic image classification model according to any one of claims 1-15.
17. An endoscopic image classification system comprising:
an image acquisition section for acquiring an endoscopic image to be recognized;
the processing component is used for extracting image characteristic representations of the endoscope images based on an encoder in the trained endoscope image classification model and inputting the extracted image characteristic representations into corresponding classifiers in the trained endoscope image classification model to obtain classification results of the endoscope images;
an output section for outputting a classification result of the image to be recognized,
wherein the trained endoscopic image classification model is obtained based on the training method of the contrast learning based endoscopic image classification model according to any one of claims 1-15.
18. A training apparatus for an endoscopic image classification model based on contrast learning, the apparatus comprising:
a training data set acquisition component for acquiring a first set of images, the first set of images being a set of first modality imagery images of one or more subjects acquired by an endoscope operating at a first modality; and acquiring a second set of images, the second set of images being a set of second modality imagery images of the one or more objects acquired by an endoscope operating in a second modality different from the first modality, the second modality imagery images corresponding one-to-one to the first modality imagery images; and
a training section configured to input the first image set and the second image set as a training data set into the endoscope image classification model, train the endoscope image classification model until a joint loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model,
wherein training the endoscope image classification model until a joint loss function of the endoscope image classification model converges comprises:
carrying out unsupervised contrast learning by utilizing a contrast learning submodel to generate a first characteristic representation of a first batch and a second characteristic representation of the first batch for a first-batch first-mode image, and generate a first characteristic representation of a second batch and a second characteristic representation of the second batch for a second-batch second-mode image;
storing the second feature representation of the first batch and the second feature representation of the second batch into a memory queue based on a first-in-first-out rule;
performing classification training by using a classifier sub-model to generate a first classification prediction probability distribution for each image in the first batch of first modality image images so as to obtain a first classification prediction probability distribution of the first batch, and generate a second classification prediction probability distribution for each image in the second batch of second modality image images so as to obtain a second classification prediction probability distribution of the second batch;
calculating a joint loss function based on the second feature representation of the first batch and the second feature representation of the second batch, and the first classification prediction probability distribution of the first batch and the second classification prediction probability distribution of the second batch, and adjusting parameters of the endoscope image classification model according to the joint loss function;
determining whether a trusted pseudo-tag is generated for an unlabeled image in the first batch of first modality imagery images and an unlabeled image in the second batch of second modality imagery images;
if the credible pseudo labels are determined to be generated for the unlabeled images in the first batch of first-modality image images and the unlabeled images in the second batch of second-modality image images, adding the first-modality image images and the corresponding second-modality image images which generate the credible pseudo labels into the first image set and the second image set respectively to form a new first image set and a new second image set so as to update the training data set; and
using the new first image set and the new second image set as a new training data set to continuously carry out iterative training on the adjusted endoscope image classification model,
wherein determining whether to generate a trusted pseudo-tag for unlabeled images in the first batch of first-modality imagery images and unlabeled images in the second batch of second-modality imagery images comprises:
for each unlabeled first modality video image, determining a first label prediction value for the unlabeled first modality video image based on a first classification prediction probability distribution generated for the unlabeled first modality video image; and
determining a second label prediction value of the unlabeled second modality video image based on a second classification prediction probability distribution generated for the unlabeled second modality video image for an unlabeled second modality video image that corresponds one-to-one with the unlabeled first modality video image;
determining whether the first tag prediction value and the second tag prediction value are consistent;
if not, not generating the credible pseudo label;
and if the predicted value of the first label is consistent with the predicted value of the second label, fusing the predicted value of the first label and the predicted value of the second label, generating the credible pseudo label when the fused predicted value of the label is greater than a preset threshold value, and otherwise, not generating the credible pseudo label.
19. An electronic device comprising a memory and a processor, wherein the memory has stored thereon program code readable by the processor, which when executed by the processor, performs the method of any of claims 1-16.
20. A computer-readable storage medium having stored thereon computer-executable instructions for performing the method of any of claims 1-16.
CN202111039387.8A 2021-09-06 2021-09-06 Training method of endoscope image classification model, image classification method and device Active CN113496489B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111039387.8A CN113496489B (en) 2021-09-06 2021-09-06 Training method of endoscope image classification model, image classification method and device
PCT/CN2022/117048 WO2023030521A1 (en) 2021-09-06 2022-09-05 Endoscope image classification model training method and device, and endoscope image classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111039387.8A CN113496489B (en) 2021-09-06 2021-09-06 Training method of endoscope image classification model, image classification method and device

Publications (2)

Publication Number Publication Date
CN113496489A CN113496489A (en) 2021-10-12
CN113496489B true CN113496489B (en) 2021-12-24

Family

ID=77997132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111039387.8A Active CN113496489B (en) 2021-09-06 2021-09-06 Training method of endoscope image classification model, image classification method and device

Country Status (2)

Country Link
CN (1) CN113496489B (en)
WO (1) WO2023030521A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496489B (en) * 2021-09-06 2021-12-24 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device
CN113642537B (en) * 2021-10-14 2022-01-04 武汉大学 Medical image recognition method and device, computer equipment and storage medium
CN113706526B (en) * 2021-10-26 2022-02-08 北京字节跳动网络技术有限公司 Training method and device for endoscope image feature learning model and classification model
CN115719415B (en) * 2022-03-28 2023-11-10 南京诺源医疗器械有限公司 Visual field adjustable double-video fusion imaging method and system
CN114758360B (en) * 2022-04-24 2023-04-18 北京医准智能科技有限公司 Multi-modal image classification model training method and device and electronic equipment
CN114782719B (en) * 2022-04-26 2023-02-03 北京百度网讯科技有限公司 Training method of feature extraction model, object retrieval method and device
CN114937178B (en) * 2022-06-30 2023-04-18 抖音视界有限公司 Multi-modality-based image classification method and device, readable medium and electronic equipment
CN115240036B (en) * 2022-09-22 2023-02-03 武汉珈鹰智能科技有限公司 Training method, application method and storage medium of crack image recognition network
CN116758562B (en) * 2023-08-22 2023-12-08 杭州实在智能科技有限公司 Universal text verification code identification method and system
CN117577258B (en) * 2024-01-16 2024-04-02 北京大学第三医院(北京大学第三临床医学院) PETCT (pulse-based transmission control test) similar case retrieval and prognosis prediction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948733A (en) * 2019-04-01 2019-06-28 深圳大学 More classification methods, sorter and the storage medium of alimentary tract endoscope image
CN110427994A (en) * 2019-07-24 2019-11-08 腾讯医疗健康(深圳)有限公司 Digestive endoscope image processing method, device, storage medium, equipment and system
CN112381116A (en) * 2020-10-21 2021-02-19 福州大学 Self-supervision image classification method based on contrast learning
CN112668627A (en) * 2020-12-24 2021-04-16 四川大学 Large-scale image online clustering system and method based on contrast learning

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109222865A (en) * 2018-10-17 2019-01-18 卓外(上海)医疗电子科技有限公司 A kind of multi-modality imaging endoscopic system
CN110490856B (en) * 2019-05-06 2021-01-15 腾讯医疗健康(深圳)有限公司 Method, system, machine device, and medium for processing medical endoscope image
CN110689025B (en) * 2019-09-16 2023-10-27 腾讯医疗健康(深圳)有限公司 Image recognition method, device and system and endoscope image recognition method and device
JP7278202B2 (en) * 2019-11-27 2023-05-19 富士フイルム株式会社 Image learning device, image learning method, neural network, and image classification device
CN112741651B (en) * 2020-12-25 2022-11-25 上海交通大学烟台信息技术研究院 Method and system for processing ultrasonic image of endoscope
CN112766323A (en) * 2020-12-30 2021-05-07 清华大学 Image identification method and device
CN112990297B (en) * 2021-03-10 2024-02-02 北京智源人工智能研究院 Training method, application method and device of multi-mode pre-training model
CN113011485B (en) * 2021-03-12 2023-04-07 北京邮电大学 Multi-mode multi-disease long-tail distribution ophthalmic disease classification model training method and device
CN113496489B (en) * 2021-09-06 2021-12-24 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948733A (en) * 2019-04-01 2019-06-28 深圳大学 More classification methods, sorter and the storage medium of alimentary tract endoscope image
CN110427994A (en) * 2019-07-24 2019-11-08 腾讯医疗健康(深圳)有限公司 Digestive endoscope image processing method, device, storage medium, equipment and system
CN112381116A (en) * 2020-10-21 2021-02-19 福州大学 Self-supervision image classification method based on contrast learning
CN112668627A (en) * 2020-12-24 2021-04-16 四川大学 Large-scale image online clustering system and method based on contrast learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Contrastive Learning(对比学习,MoCo,SimCLR,BYOL,SimSiam,SimCSE);上杉翔二;《https://blog.csdn.net/qq_39388410/article/details/108941999》;20201006;第1-9页 *
Contrastive Multiview Coding;Yonglong Tian;《arXiv:1906.05849v5》;20201218;第1-3节 *
系统学习机器学习之弱监督学习(二)--半监督学习综述;Eason.wxd;《https://blog.csdn.net/App_12062011/article/details/93314823》;20190622;第二节 *

Also Published As

Publication number Publication date
CN113496489A (en) 2021-10-12
WO2023030521A1 (en) 2023-03-09

Similar Documents

Publication Publication Date Title
CN113496489B (en) Training method of endoscope image classification model, image classification method and device
CN113706526B (en) Training method and device for endoscope image feature learning model and classification model
CN113486990B (en) Training method of endoscope image classification model, image classification method and device
US11633084B2 (en) Image diagnosis assistance apparatus, data collection method, image diagnosis assistance method, and image diagnosis assistance program
CN109523532B (en) Image processing method, image processing device, computer readable medium and electronic equipment
CN109523522B (en) Endoscopic image processing method, device, system and storage medium
US20210228071A1 (en) System and method of otoscopy image analysis to diagnose ear pathology
Pogorelov et al. Deep learning and hand-crafted feature based approaches for polyp detection in medical videos
Jain et al. Detection of abnormality in wireless capsule endoscopy images using fractal features
Goel et al. Investigating the significance of color space for abnormality detection in wireless capsule endoscopy images
CN113470029B (en) Training method and device, image processing method, electronic device and storage medium
EP4120186A1 (en) Computer-implemented systems and methods for object detection and characterization
Masmoudi et al. Optimal feature extraction and ulcer classification from WCE image data using deep learning
CN114399465A (en) Benign and malignant ulcer identification method and system
CN113781489A (en) Polyp image semantic segmentation method and device
Du et al. Improving the classification performance of esophageal disease on small dataset by semi-supervised efficient contrastive learning
US20230260652A1 (en) Self-Supervised Machine Learning for Medical Image Analysis
CN115511861A (en) Identification method based on artificial neural network
KR20220078495A (en) Method, apparatus and program to read lesion of small intestine based on capsule endoscopy image
US20240087115A1 (en) Machine learning enabled system for skin abnormality interventions
Huang et al. TongueMobile: automated tongue segmentation and diagnosis on smartphones
CN116740475B (en) Digestive tract image recognition method and system based on state classification
Yao Machine Learning and Image Processing for Clinical Outcome Prediction: Applications in Medical Data from Patients with Traumatic Brain Injury, Ulcerative Colitis, and Heart Failure
Manjunath et al. Deep Learning Architectures for Abnormality Detection in Endoscopy Videos.
WO2023285407A1 (en) Computer-implemented systems and methods for object detection and characterization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20211012

Assignee: Xiaohe medical instrument (Hainan) Co.,Ltd.

Assignor: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Contract record no.: X2021990000694

Denomination of invention: Training method, image classification method and device of endoscope image classification model

License type: Common License

Record date: 20211117

GR01 Patent grant
GR01 Patent grant