WO2023030520A1 - Training method and apparatus of endoscope image classification model, and image classification method - Google Patents

Training method and apparatus of endoscope image classification model, and image classification method Download PDF

Info

Publication number
WO2023030520A1
WO2023030520A1 PCT/CN2022/117043 CN2022117043W WO2023030520A1 WO 2023030520 A1 WO2023030520 A1 WO 2023030520A1 CN 2022117043 W CN2022117043 W CN 2022117043W WO 2023030520 A1 WO2023030520 A1 WO 2023030520A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification model
image classification
endoscopic image
network
loss function
Prior art date
Application number
PCT/CN2022/117043
Other languages
French (fr)
Chinese (zh)
Inventor
边成
李永会
杨延展
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023030520A1 publication Critical patent/WO2023030520A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30101Blood vessel; Artery; Vein; Vascular

Definitions

  • Embodiments of the present disclosure relate to a training method of an endoscope image classification model integrated with knowledge distillation, an image classification method, a device, and a computer-readable medium.
  • Colorectal cancer is the third most common cancer and the fourth most deadly cancer in the world, and more than 95% of colorectal cancers are caused by colonic polyps.
  • adenomas accounted for the majority, accounting for about 10.86% to 80%. It is generally believed that colorectal cancer originated from adenomatous polyps, and the cancerous rate was 1.4% to 9.2%.
  • Other types of polyps such as hyperplastic polyps and inflammatory polyps (2.32% to 13.8%), accounted for only a small proportion, showing a long-tailed distribution.
  • Embodiments of the present disclosure provide a method for training an endoscopic image classification model, an endoscopic image classification method, an apparatus, and a computer-readable medium.
  • Embodiments of the present disclosure provide a multi-expert decision-based training method for an endoscopic image classification model, wherein the endoscopic image classification model includes a plurality of expert sub-networks, and the method includes: obtaining a training data set, The training data set includes a plurality of endoscopic image images and label tags of the plurality of endoscopic image images, wherein the training data set presents a long-tail distribution; and based on the training data set, the endoscopic The endoscope image classification model is trained until the target loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model, wherein the target loss function is based at least on the basis of the plurality of expert subclasses The corresponding multiple output results of the network are determined.
  • training the endoscope image classification model based on the training data set includes: inputting image samples in the training image sample set into each of the plurality of expert sub-networks; using the The plurality of expert sub-networks, generating a corresponding plurality of expert sub-network output results for the image sample; based on the plurality of expert sub-network output results, generating a final output result of the endoscopic image classification model; and Based on at least the plurality of expert sub-network outputs and the final output, a loss value is calculated by a target loss function, and parameters of the endoscopic image classification model are adjusted based on the loss value.
  • the endoscopic image classification model further includes a shared sub-network
  • training the endoscopic image classification model based on the training data set includes: inputting image samples in the training image sample set to the The shared sub-network is used to extract shallow feature representations; based on the extracted shallow feature representations, use the multiple expert sub-networks to generate corresponding multiple expert sub-network output results for the image sample; based on the multiple expert sub-networks output results of a plurality of expert sub-networks to generate a final output result of the endoscopic image classification model; and based on at least the output results of the plurality of expert sub-networks and the final output results, a loss value is calculated by a target loss function, and A parameter of the endoscopic image classification model is adjusted based on the loss value.
  • the target loss function of the endoscope image classification model includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeled labels of image samples, and based on the multiple The KL divergence determined by the output of the sub-expert network.
  • generating the final output result of the endoscope image classification model includes: fusing the output results of the plurality of expert sub-networks as the endoscope The final output of the image classification model.
  • merging the output results of the multiple expert sub-networks includes: performing a weighted average on the output results of the multiple expert sub-networks.
  • the endoscopic image classification model further includes a student network having the same structure as the expert sub-network, wherein the plurality of expert sub-networks form a teacher network, and the teacher network is used for training based on knowledge distillation The student network, the method further comprising utilizing the student network to generate a corresponding student network output for the image sample.
  • calculating the loss value through the target loss function includes: based on the output results of the plurality of expert subnetworks, the final output result, and the Describe the output result of the student network, and calculate the loss value through the target loss function.
  • the target loss function is a weighted sum of the loss function of the teacher network and the loss function of the student network.
  • the sum of the weight value of the loss function of the teacher network and the weight value of the loss function of the student network is 1, and wherein the weight value of the loss function of the teacher network decreases continuously with the training iterations small until it finally decreases to 0, and the weight value of the loss function of the student network increases continuously with training iterations until it finally increases to 1.
  • the loss function of the teacher network includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeling labels of the image samples, and a cross-entropy loss function based on the output results of the multiple sub-expert networks And the determined KL divergence;
  • the loss function of the student network includes: a cross-entropy loss function determined based on the student network output result of the student network and the final output result of the endoscope image classification model, and based on the The KL divergence determined by the output results of the student network of the student network and the output results of the plurality of expert sub-networks generated by the plurality of expert sub-networks.
  • each of the plurality of expert sub-networks includes a multi-layer Transformer encoder connected in sequence, and a classifier.
  • a method for classifying endoscopic images including: acquiring an endoscopic image to be identified; Classification results; wherein, the trained endoscopic image classification model is obtained based on the training method of the endoscopic image classification model as described above.
  • a method for classifying endoscopic images including: acquiring an endoscopic image to be recognized; and obtaining the endoscopic image based on the student network in the trained endoscopic image classification model Classification results of endoscope images; wherein, the trained endoscope image classification model is obtained based on the above-mentioned endoscope image classification model training method.
  • an endoscope image classification system including: an image acquisition component, used to acquire an endoscope image to be recognized; a processing component, used to The classification model obtains the classification result of the endoscopic image; the output unit is used to output the classification result of the image to be recognized, wherein the trained endoscopic image classification model is based on the endoscopic image as described above Obtained by the training method of the classification model.
  • an endoscope image classification system including: an image acquisition component, used to acquire an endoscope image to be recognized; a processing component, used to The student network in the classification model obtains the classification result of the endoscopic image; the output unit is used to output the classification result of the image to be recognized, wherein the trained endoscopic image classification model is based on the internal Obtained by the training method of the looking-glass image classification model.
  • a training device for an endoscopic image classification model based on multi-expert decision-making wherein the endoscopic image classification model includes a plurality of expert sub-networks, and the device includes: training The data set acquisition component is used to obtain a training data set, the training data set includes a plurality of endoscopic image images and label labels of the plurality of endoscopic image images, wherein the training data set presents a long-tail distribution; And a training component for training the endoscope image classification model based on the training data set until the target loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model , wherein the target loss function is determined based at least on the corresponding multiple output results of the multiple expert sub-networks.
  • An embodiment of the present disclosure also provides an electronic device, including a memory and a processor, wherein the memory stores program codes readable by the processor, and when the processor executes the program codes, the above-mentioned method.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer-executable instructions are stored, and the computer-executable instructions are used to execute the method as described above.
  • a method based on multi-expert joint decision-making is proposed to learn the unbalanced data distribution in combination with the actual situation. It does not need to know the data distribution in advance, and can improve the model at the same time. The prediction accuracy of the head and tail data does not cause bias. In addition, the model is compressed by knowledge distillation to make the model more concise.
  • Fig. 1 shows a schematic diagram of the application architecture of the endoscopic image classification model training and the endoscopic image classification method in the embodiment of the present disclosure
  • Fig. 2 shows an exemplary block diagram of Vision Transformer (ViT);
  • Fig. 3 shows a schematic diagram of ViT in Fig. 2 flattening the original picture into a sequence
  • Fig. 4 shows a polyp imaging image according to an embodiment of the present disclosure
  • FIG. 5A shows a schematic structure of an endoscopic image classification model 500A according to an embodiment of the present disclosure
  • FIG. 5B shows a schematic structure of an endoscope image classification model 500B according to another embodiment of the present disclosure
  • FIG. 5C shows a schematic structure of an endoscopic image classification model 500C using Transformer as a feature extractor according to yet another embodiment of the present disclosure
  • FIG. 6A shows a flowchart of a method for training an endoscopic image classification model according to one embodiment of the present disclosure
  • FIG. 6B shows a more specific exemplary description of step S603 in FIG. 6A;
  • FIG. 7A shows a schematic diagram of an endoscopic image classification model 700A incorporating knowledge distillation according to an embodiment of the present disclosure
  • FIG. 7B shows a schematic diagram of an endoscopic image classification model 700B incorporating knowledge distillation according to another embodiment of the present disclosure
  • FIG. 7C shows a schematic diagram of an endoscopic image classification model 700C incorporating knowledge distillation according to yet another embodiment of the present disclosure
  • FIG. 8 shows a flowchart of a method for training an endoscopic image classification model incorporating knowledge distillation according to one embodiment of the present disclosure
  • FIG. 9 depicts a flowchart of a method for classifying endoscopic images according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an endoscope image classification system in an embodiment of the present disclosure.
  • FIG. 11 shows a training device for an endoscopic image classification model according to an embodiment of the present disclosure.
  • FIG. 12 shows a schematic diagram of a storage medium according to an embodiment of the disclosure.
  • any number of different modules may be used and run on the user terminal and/or the server.
  • the modules are illustrative only, and different aspects of the systems and methods may use different modules.
  • images of lesions inside the gastrointestinal tract are usually obtained based on diagnostic tools such as endoscopes, and relevant medical personnel judge the type of lesions by observing with human eyes.
  • diagnostic tools such as endoscopes
  • relevant medical personnel judge the type of lesions by observing with human eyes.
  • some work has tried to use deep learning to automatically identify lesion categories, but these lesion types usually have long-tail distribution characteristics.
  • adenomas account for the majority, accounting for about 10.86% to 80%.
  • colorectal cancer originates from adenomatous polyps, and its cancerous rate is 1.4% to 9.2%.
  • polyps such as hyperplastic polyps and inflammatory polyps (2.32% to 13.8%), accounted for only a small proportion, showing a long-tailed distribution.
  • the characteristics of the polyp type distribution are usually not considered, and the convolutional neural network is directly used for training, or the distribution of the data set is adjusted before training, which is obviously not in line with reality.
  • Properties of polyp data in Direct training without considering the imbalance of the data will easily make the model unable to identify the tail data well, and retraining the data set after re-training will easily lead to overfitting of the tail data, thus affecting the accuracy of the head data. cause certain losses.
  • this disclosure proposes a multi-expert joint algorithm that adapts to the long-tail data distribution and can improve the accuracy of the head and tail at the same time. end) to integrate it into a more compact model.
  • FIG. 1 shows a schematic diagram of an endoscopic image classification model training method, an endoscopic image classification method, and an application architecture of an endoscopic image classification method according to an embodiment of the present disclosure, including a server 100 and a terminal device 200 .
  • the terminal device 200 may be a medical device, for example, a user may view endoscopic image classification results based on the terminal device 200 .
  • the terminal device 200 and the server 100 may be connected through the Internet to realize mutual communication.
  • the aforementioned Internet uses standard communication technologies and/or protocols.
  • the Internet is usually the Internet, but can be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network , private network, or any combination of virtual private networks.
  • data exchanged over a network is represented using technologies and/or formats including Hyper Text Markup Language (HTML), Extensible Markup Language (XML), and the like.
  • HTML Hyper Text Markup Language
  • XML Extensible Markup Language
  • Secure Socket Layer Secure SocketLayer, SSL
  • Transport Layer Security Transport Layer Security
  • TLS Transport Layer Security
  • Virtual Private Network Virtual Private Network
  • VPN VirtualPrivate Network
  • IPsec Internet Protocol Security
  • Encryption technology to encrypt all or some links.
  • customized and/or dedicated data communication technologies may also be used to replace or supplement the above data communication technologies.
  • the server 100 may provide various network services for the terminal device 200, wherein the server 100 may be a server, a server cluster composed of several servers, or a cloud computing center.
  • the server 100 may include a processor 110 (Center Processing Unit, CPU), a memory 120, an input device 130, an output device 140, etc.
  • the input device 130 may include a keyboard, a mouse, a touch screen, etc.
  • the output device 140 may include a display device, Such as liquid crystal display (Liquid Crystal Display, LCD), cathode ray tube (Cathode Ray Tube, CRT) and so on.
  • LCD Liquid Crystal Display
  • CRT cathode Ray Tube
  • the memory 120 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides program instructions and data stored in the memory 120 to the processor 110 .
  • the memory 120 may be used to store the program of the endoscope image classification model training method or the endoscope image classification method based on the trained endoscope image classification model in the present disclosure embodiment.
  • the processor 110 calls the program instructions stored in the memory 120, and the processor 110 is used to execute any endoscopic image classification model training method in the embodiments of the present disclosure or based on the trained endoscopic image classification model according to the obtained program instructions. The steps of the method for classifying endoscopic images are performed.
  • the endoscopic image classification model training method or the endoscopic image classification method based on the trained endoscopic image classification model is mainly executed by the server 100 side, for example, for the endoscopic image classification method
  • the terminal device 200 can send the collected images of gastrointestinal lesions (for example, polyps) to the server 100, and the server 100 can identify the type of the lesion images, and can return the lesion classification result to the terminal device 200.
  • gastrointestinal lesions for example, polyps
  • the application architecture shown in FIG. 1 is described by taking the application on the server 100 side as an example.
  • the endoscopic image classification method in the embodiment of the present disclosure can also be executed by the terminal device 200, for example, the terminal device 200 can download from the server On the 100 side, the trained endoscopic image classification model fused with knowledge distillation is obtained, and based on the student network in the endoscopic image classification model fused with knowledge distillation, the types of lesion images are identified and the result of lesion classification is obtained.
  • the disclosed embodiments are not limited in this respect.
  • FIG. 1 Various embodiments of the present disclosure are schematically described by taking the application architecture diagram shown in FIG. 1 as an example.
  • Knowledge distillation usually adopts a teacher-student architecture, using the knowledge learned by the large model (teacher) to guide the training of the small model (student), so that the small model has the same performance as the large model, but The number of parameters is greatly reduced, enabling model compression and acceleration.
  • KL divergence The full name of KL divergence is kullback leibler divergence, which is generally used to measure the "distance" between two probability distribution functions. For two probability distributions P and Q of a discrete random variable, their KL Divergence is defined as:
  • KL divergence is a commonly used loss function in the field of machine learning.
  • a typical Transformer includes a multi-head attention (Multi-head Attention) module and a multi-layer perceptron (MLP, Multilayer Perceptron) module.
  • a multi-head attention module helps the encoder to look at other words while encoding a specific word.
  • Each module has a layer normalization (Layer Normalization) module before it, and uses the residual connection to connect each module.
  • Layer Normalization Layer Normalization
  • the layer normalization module is used for the Transformer learning process due to the multi-entry embedding (embedding) accumulation may bring Imposing constraints on the "scale” problem that comes, is equivalent to imposing constraints on the polysemy space that expresses each word, effectively reducing the variance of the model.
  • Vision Transformer is a technology that transfers Transformer from natural language processing to image processing.
  • FIG. 2 shows an exemplary block diagram of ViT. Similar to the series of word embeddings used when applying the Transformer to text, ViT divides the original image into a grid of squares, by concatenating all pixel channels in a square, and then linearly projecting it to the desired input dimension using a linear mapper , flatten each square into a single vector. ViT is agnostic to the structure of the input elements, so it is further required to utilize a position encoder to add a learnable position embedding in each square vector, enabling the model to understand the image structure. Finally, the flattened sequence is input into the encoder part of the original Transformer model (such as the m-layer Transformer encoder block shown in Figure 2) for feature extraction, and finally a fully connected layer is connected to classify or segment pictures, etc. Task.
  • the encoder part of the original Transformer model such as the m-layer Transformer encoder block shown in Figure 2
  • Fig. 3 shows a schematic diagram of ViT in Fig. 2 flattening the original picture into a sequence.
  • the image input into ViT is a polyp white light image of H ⁇ W ⁇ C, where H and W are the number of pixels in the length and width directions, respectively, and C is the number of channels.
  • the Vision Transformer can be used as a backbone network (backbone) to extract features, so as to obtain key information in the image more accurately.
  • backbone backbone network
  • image features are generally extracted first. This part is the foundation of the entire CV task, because subsequent downstream tasks are based on the extracted image features (such as classification, generation, etc.), so this part of the network structure is called the backbone network.
  • embodiments of the present disclosure may also use other network architectures as the backbone network, such as VggNet and ResNet architectures, etc., and the present disclosure is not limited here.
  • FIG. 4 shows a polyp imaging image according to an embodiment of the present disclosure.
  • the endoscope enters the human body through the natural orifice of the human body or through a small surgical incision to obtain images of the lesion, which are subsequently used for diagnosis and treatment of the disease.
  • Figure 4 shows the polyp images captured by the endoscope. The image on the left is the observation result of the polyp obtained by the endoscope operating in white light (WL) imaging mode, and the image on the right is the observation result of the polyp in narrow-band light. Another observation of the same polyp obtained with an endoscope operated in Narrow Band Imaging (NBI) mode.
  • NBI Narrow Band Imaging
  • the broadband spectrum of white light is composed of three kinds of light, R/G/B (red/green/blue), and their wavelengths are 605nm, 540nm, and 415nm respectively.
  • the narrow-band light mode uses a narrow-band filter to replace the traditional broadband filter to limit the light of different wavelengths, leaving only the green and blue narrow-band light waves with wavelengths of 540nm and 415nm.
  • the image generated under the narrow-band light mode has significantly enhanced contrast between blood vessels and mucosa, which is suitable for observing the morphology of blood vessels and mucosal structure on the surface of the mucosa.
  • the existing automatic recognition work for endoscopic image classification is basically based on ordinary convolutional neural networks. They usually use an off-the-shelf convolutional neural network such as ResNet, VGG, Inceptionv3, etc. However, they all only use traditional training methods and do not take into account the uneven distribution of certain endoscopic image types. For example, among the detected polyps, adenomas usually account for the majority, while other polyp types such as hyperplastic Polyps, inflammatory polyps, etc. only accounted for a small proportion, showing a long-tailed distribution.
  • this disclosure proposes a multi-expert joint algorithm that is suitable for long-tail data distribution and can improve the accuracy of the head and tail at the same time.
  • white light images of polyps are used to construct a data set exhibiting a long-tailed distribution.
  • the trained endoscope image classification model can better identify polyp images exhibiting long-tail distribution.
  • any other endoscopic images of gastrointestinal lesions with uneven distribution can also be used to construct a data set and implement the method according to the present disclosure.
  • the endoscopic image classification model for example is trained.
  • These endoscopic images may be images acquired by the endoscope in any suitable mode, such as narrow-band light images, autofluorescence images, I-SCAN images, and the like.
  • the above various modal images may also be mixed to construct a data set, which is not limited in the present disclosure.
  • the embodiment of the present disclosure aims at the long-tail distribution of polyp images, and proposes a multi-expert decision-making endoscopic image classification model.
  • the overall accuracy of prediction is improved by fusing the decision results of multiple experts;
  • the distribution distance between the prediction results of an expert allows different experts to pay attention to different data distributions, thereby improving the learning ability of unbalanced data sets.
  • FIG. 5A shows a schematic structure of an endoscopic image classification model 500A according to one embodiment of the present disclosure.
  • an endoscopic image classification model 500A includes n expert sub-networks, where n is an integer greater than 2, for example.
  • Each expert sub-network consists of a feature extractor and a classifier.
  • each expert sub-network here can have the same network structure, and the structure of each expert sub-network can be any deep learning network structure that can be used to perform classification tasks.
  • This type of network structure usually includes a A feature extractor for extracting feature representations and a classifier for classification.
  • the feature extractor here can be the Vision Transformer shown in Figure 2.
  • the input image is first flattened into N one-dimensional vectors based on the linear mapping module and the position encoder, and then feature extraction is performed through the m-layer transformer encoder block .
  • the classifier here can be a multi-head normalized classifier (multi-head normalized classifier), based on the feature representation of the image sample received from the Vision Transformer, the classifier can output the predicted classification probability value of the image sample.
  • multi-head normalized classifier based on the feature representation of the image sample received from the Vision Transformer, the classifier can output the predicted classification probability value of the image sample.
  • the feature extractor and classifier in the multi-expert sub-network in the embodiment of the present disclosure may be any other structures that can perform similar functions.
  • the feature extractor here can also be a deep residual network (Deep residual network, ResNet), for example, the classifier here can also be the convolutional layer part of the ResNet network, and this disclosure is not limited here.
  • ResNet deep residual network
  • the final optimization objective of the endoscopic image classification model can be determined as the following two, one is to minimize the loss between the classification prediction value and the real label of the final output of the endoscopic image classification model, so that The prediction accuracy of the endoscope image classification model can be improved.
  • the other is to maximize the distribution distance between the classification prediction values output by multiple experts, so that multiple experts can focus on different data distributions of the dataset.
  • the loss between the final output classification prediction value of the endoscope image classification model and the true label may be calculated based on a cross-entropy loss function.
  • the difference between different experts can be maximized by maximizing the KL divergence between the classification prediction values output by different experts.
  • the embodiment of the present disclosure constructs the target loss function for training the endoscopic image classification model based on the cross-entropy loss function and KL divergence.
  • the target loss function is continuously optimized to minimize and converge, and then the internal The training of the looking glass image classification model is complete.
  • each expert sub-network in the above-mentioned endoscopic image classification model 500A needs to start from the original image, it first extracts the shallow feature representation based on the shallower layer of the network, and then extracts the feature representation based on the deeper network structure. Deeper feature representation of specificity.
  • these expert sub-networks can share the shallow feature representation extracted by the same shallow feature extractor, and then based on the deep feature extractor To further learn specific deep features for classification tasks.
  • the present disclosure proposes a variation of the endoscopic image classification model 500A, as shown in FIG. 5B .
  • the endoscopic image classification model 500B of Fig. 5B multiple expert sub-networks share a shallow feature extractor, and each expert sub-network has its own deep-level feature extractor, and the last classifier, through Sharing some common shallow feature extractors, the endoscopic image classification model 500B has a more compact structure than the endoscopic image classification model 500A.
  • the shallow feature extractors here may be some common shallow structures in the feature extractors of multiple expert sub-networks in the endoscopic image classification model 500A of FIG. 5A .
  • the shallow feature extractor here can be the linear mapper of the Vision Transformer layer, a position encoder layer, and a Transformer encoder block.
  • These expert sub-networks can share this common shallow feature extractor to obtain common shallow features, and use the remaining (m-1) layer Transformer encoder block as a deep feature extractor to extract specific Deep features, as shown in endoscope classification model 500C in Figure 5C.
  • the shared sub-network and deep feature extractor here can also be any other suitable feature extractors for extracting image features.
  • FIG. 6A shows a flowchart of a method 600 for training an endoscopic image classification model according to one embodiment of the present disclosure.
  • the endoscopic image classification model here is the endoscopic image classification model 500A shown above with reference to FIG. 5A .
  • the training method 600 of the endoscopic image classification model 500A can be executed by a server, which can be the server 100 shown in FIG. 1 .
  • a training data set is obtained, the training data set includes a plurality of endoscopic image images and annotation labels of the plurality of endoscopic image images, wherein the training data set presents a long-tail distribution.
  • the training data set here can be prepared by simulating the long-tailed distribution of polyp types in the real situation.
  • the training data set here may include 2131 white light image images of polyps, and these images have four kinds of labels, namely adenoma, hyperplasia, inflammation and cancer, where The images labeled with adenoma label accounted for the majority (eg, 65%), while images with other polyp label types such as hyperplastic polyps, inflammatory polyps, and cancer accounted for only a small proportion (eg, only 13%, 12%, respectively). and 10%), so that the entire training data set presents a long-tailed distribution.
  • the training data set here may be obtained by operating an endoscope, downloaded from a network, or obtained in other ways, which is not limited in the embodiments of the present disclosure.
  • embodiments of the present disclosure may also be applicable to image classification of other digestive tract lesions other than polyps, such as inflammation, ulcer, vascular malformation, and diverticulum, and the present disclosure is not limited thereto.
  • step S603 the endoscope image classification model is trained based on the training data set until the target loss function of the endoscope image classification model converges, so as to obtain a trained endoscope image classification model.
  • the goal here is on the one hand to improve the overall accuracy of the prediction by fusing the decision results of multiple experts, and on the other hand to maximize the distribution distance between the prediction results of multiple experts so that different experts can Focus on different data distributions, thereby improving the learning ability on datasets with imbalanced distribution. Therefore, based on the multi-expert decision-making endoscopic image classification model 500A, the final output classification prediction value and the real label cross-entropy loss minimization and the KL divergence between the classification prediction values output by different expert sub-networks Maximization is used as the training target for training the endoscopic image classification model according to the embodiment of the present application.
  • the training of the endoscope image classification model based on the training data set in step S603 may include the following sub-steps S603_1-S603_4.
  • step S603_1 the image samples in the training image sample set are input into each of the plurality of expert sub-networks.
  • the endoscopic image classification model 500B has a more compact structure than the endoscopic image classification model 500A by sharing some common shallow feature extractors.
  • step S603_2 using the plurality of expert sub-networks to generate corresponding output results of the plurality of expert sub-networks for the image sample.
  • the input image be x
  • the feature extractor here is the Vision Transformer as described above, with the function represents, where ⁇ i represents the parameters of the i-th expert subnetwork
  • the extracted features are expressed as:
  • the extracted features can also be expressed as: where f (x) represents the shared subnetwork, Represents a deep feature extractor.
  • the classifier here can be a multi-head normalized classifier. Based on the multi-head normalized classifier, the logits of the i-th expert subnetwork are calculated as follows:
  • ⁇ and ⁇ are parameters
  • K is the number of multi-heads
  • w i is the weight parameter of the classifier in the i-th expert subnetwork
  • K is the number of multi-heads
  • w i is the weight parameter of the classifier in the i-th expert subnetwork
  • Equation (2) the probability value of the predicted classification can be obtained, as shown in equation (2) below.
  • step S603_3 based on the output results of the plurality of expert sub-networks, a final output result of the endoscopic image classification model is generated.
  • the output results of multiple expert sub-networks can be fused to obtain the final result of the endoscopic image classification model.
  • the fusion here can be a linear average, as shown in equation (3) below.
  • n is the number of expert sub-networks in the endoscopic image classification model
  • p soft (x) is the final prediction result of the endoscopic image classification model.
  • step S603_4 a loss value is calculated through a target loss function, and parameters of the endoscopic image classification model are adjusted based on the loss value.
  • One goal is that the final result of multi-expert fusion is closer to the real label, and the other goal is to maximize the distribution distance between the output results of multiple experts, so that more An expert can focus on different distributions of the data.
  • the objective function can include two parts, the first part is based on the cross-entropy loss function between the fused classification prediction probability and the real label of the image sample, for example, as shown in equation (4) below,
  • L ce represents the cross-entropy loss function
  • p soft (x) is the final prediction result of the endoscopic image classification model obtained after fusing the prediction results of multiple expert sub-networks, is the true label of the image sample.
  • the second part in the objective function is the negative KL divergence between the classification prediction probabilities output by multiple expert sub-networks.
  • the smaller the KL divergence the closer the distances between different distributions are. Since when optimizing with a loss function, the ultimate optimization goal is to minimize the loss function. Therefore, the difference between the distributions of the output of each expert sub-network is increased by minimizing the negative KL divergence, for example, the following equation (5),
  • Equation (5) above expresses the average of the KL divergence between the output of the i-th expert sub-network and the KL divergence of the remaining (n-1) expert sub-networks.
  • n represents the number of multiple expert sub-networks
  • ⁇ i represents the parameters of the i-th expert sub-network
  • c is the number of label categories.
  • the total loss function of the training method of the endoscopic image classification model according to one embodiment of the present disclosure can be defined, as shown in the following equation (7).
  • the parameters of the endoscope image classification model of the embodiment of the present disclosure can be adjusted, so that as the iterative training continues, the total loss function is minimized to obtain a trained endoscope image classification model.
  • the embodiment of the present disclosure is based on the way of multi-expert joint decision-making, and the final result of multi-expert fusion is closest to the real label, and the distribution distance between the output results of multiple experts is maximized as the training goal, so that the trained endoscope
  • the image classification model can adapt to the data distribution and can improve the accuracy of head and tail prediction simultaneously.
  • this disclosure further compresses the endoscopic image classification model structure composed of multiple expert sub-networks based on knowledge distillation, making it integrated into a more concise student network.
  • FIG. 7A shows a schematic diagram of an endoscopic image classification model 700A incorporating knowledge distillation according to another embodiment of the present disclosure.
  • an endoscopic image classification model 700A incorporating knowledge distillation includes two sub-networks, namely a teacher network 703A and a student network 705A.
  • the teacher network 703A here may be a plurality of expert sub-networks in the endoscopic image classification model 500A described in FIG. 5A .
  • the student network 705A here may have the same structure as each expert sub-network.
  • a student network 705A with the same structure as the multi-expert sub-network is designed. Based on the principle of knowledge distillation, multiple expert sub-networks are used as the teacher network to train the student network, so that a trained student network is finally obtained. Compared with the original multi-expert network structure, it has a simpler structure and fewer parameters, and at the same time can achieve an accuracy rate close to that of the multi-expert classification network.
  • FIG. 7B shows a schematic diagram of an endoscopic image classification model 700B incorporating knowledge distillation according to another embodiment of the present disclosure.
  • the endoscopic image classification model 700B incorporating knowledge distillation includes a shared sub-network 701B in addition to a teacher network 703B and a student network 705B.
  • the teacher network 703B here may be a plurality of expert sub-networks constituting the endoscopic image classification model 500B described in FIG. 5B .
  • Both the teacher network 703B and the student network 705B are connected to a shared sub-network 701B, and further deep feature extraction is performed based on the shallow feature representation extracted by the shared sub-network 701B to perform classification tasks.
  • the shallow feature extractor in the shared subnetwork 701B and the deep feature extractors in the multiple expert subnetworks here may also be any other suitable feature extractors for extracting image features.
  • FIG. 7C shows an exemplary endoscopic image classification model 700C incorporating knowledge distillation using Transformer as a feature extractor.
  • the shared sub-network 701C here can be a Vision Transformer, which includes a linear mapper layer, a position encoder layer and a traditional Transformer encoder block.
  • These expert sub-networks in teacher network 703C and student network 705C can share this common shallow feature extractor (i.e., shared sub-network 701C) to obtain common shallow features, and based on multiple layers (e.g., shown in FIG. 7C
  • the traditional Transformer encoder block is used as a deep feature extractor to extract specific deep features for classification and recognition, as shown in Figure 7C shown.
  • FIG. 8 shows a flowchart of a method 800 for training an endoscopic image classification model incorporating knowledge distillation according to one embodiment of the present disclosure.
  • step S801 the image samples in the training image sample set are input into each of the plurality of expert sub-networks of the teacher network and into the student network.
  • the endoscopic image classification model fused with knowledge distillation here may be the model 700A shown in FIG. 7A .
  • step S803 use the multiple expert sub-networks to generate corresponding output results of multiple expert sub-networks for the image sample, and use the student network to generate a corresponding student network for the image sample Output the result.
  • the process of generating the network output result here is similar to step S603_2 in FIG. 6B , and its repeated description will be omitted here.
  • step S805 a final output result of the teacher network is generated based on the output results of the plurality of expert sub-networks.
  • the process of generating the final output result of the teacher network here is similar to step S603_3 in FIG. 6B , and its repeated description will be omitted here.
  • step S807 a loss value is calculated through a target loss function, and parameters of the endoscopic image classification model fused with knowledge distillation are adjusted based on the loss value.
  • the training method 800 for an endoscope image classification model incorporating knowledge distillation uses the model 500A, 500B or 500C as a teacher network, and trains a student network with a relatively simplified structure and parameters based on knowledge distillation.
  • the goal of the training method 800 of the endoscopic image classification model fused with knowledge distillation here is also expected to achieve the following two further goals: 3) Make the student network The output results are closer to the output results of the teacher network, and 4) make the output distribution of the student network closer to the distribution of the output results of each expert sub-network in the teacher network.
  • the embodiment of the present disclosure constructs the loss function (8) of the teacher network based on the above objectives 1) and 2):
  • the embodiment of the present disclosure constructs the loss function of the student network based on the above objectives 3) and 4), as shown in the following equation (9):
  • p soft is the classification prediction probability of the final output of the teacher network
  • p soft is the classification prediction probability of the final output of the teacher network
  • n is the number of expert sub-networks in the teacher network
  • logits output by the student network is the logits output by the student network, and those skilled in the art should understand that after the logits are normalized by softmax, the probability distribution of the predicted classification can be obtained.
  • the total loss function of the training method of the endoscopic image classification model incorporating knowledge distillation according to an embodiment of the present disclosure can be defined, as shown in the following equation (10).
  • is the weight parameter, which is set to 1 in the initial stage, and gradually decreases with the training process, and finally drops to 0.
  • the parameters of the endoscopic image classification model fused with knowledge distillation can be adjusted in the embodiment of the present disclosure, so that as the iterative training continues, the total loss function is minimized, so as to obtain the fused knowledge after training.
  • a Distilled Endoscopy Image Classification Model In the endoscopic image classification model integrated with knowledge distillation completed by this training, the number of student network parameters is small, the model structure is relatively simple, and the prediction accuracy close to that of the complex teacher network can be achieved, so it can be directly based on the training student network for subsequent classification applications.
  • an embodiment of the present disclosure also provides a method for classifying endoscopic images.
  • the method includes:
  • step S901 an endoscopic image to be identified is acquired.
  • the acquired endoscopic image to be identified is the acquired polyp image.
  • step S903 the endoscopic image to be recognized is input into a trained endoscopic image classification model to obtain a classification result of the endoscopic image.
  • the endoscope image classification model here may be the endoscope image classification model 500A, 500B or 500C trained for the above method.
  • the trained endoscopic image classification model is the model shown in Figure 5B
  • the endoscopic image to be recognized can be input to the trained endoscopic image fused with knowledge distillation
  • the sub-networks in the classification model are shared to extract shallow features, which are then fed into the trained endoscopic image classification model.
  • an endoscope image classification model fused with knowledge distillation such as the above-mentioned endoscope image classification model 700A, 700B or 700C fused with knowledge distillation. Due to the small number of parameters of the student network, the relatively simple model structure, and the ability to achieve prediction accuracy close to that of the complex teacher network, the endoscopic image to be recognized can be directly input to the trained endoscopic image that incorporates knowledge distillation. in the student network in an image classification model.
  • FIG. 10 is a schematic structural diagram of an endoscope image classification system 1000 in an embodiment of the present disclosure.
  • the endoscopic image classification system 1000 at least includes an image acquisition unit 1001 , a processing unit 1002 and an output unit 1003 .
  • the image acquisition unit 1001, the processing unit 1002, and the output unit 1003 are related medical devices, which can be integrated into the same medical device, or can be divided into multiple devices, connected and communicated with each other to form a medical system for use etc.
  • the image acquisition unit 1001 can be an endoscope
  • the processing unit 1002 and the output unit 1003 can be a computer device in communication with the endoscope, etc.
  • the image acquiring component 1001 is used to acquire an image to be recognized.
  • the processing component 1002 is, for example, configured to execute the method steps shown in FIG. 9 , extract image feature information of the image to be recognized, and obtain a lesion classification result of the image to be recognized based on the feature information of the image to be recognized.
  • the output unit 1003 is used to output the classification result of the image to be recognized.
  • FIG. 11 shows a training device 1100 for an endoscope image classification model according to an embodiment of the present disclosure, which specifically includes a training data set acquisition component 1101 and a training component 1103 .
  • the training data set acquisition component 1101 is used to: acquire a training data set, the training data set includes a plurality of endoscopic image images and label tags of the plurality of endoscopic image images, wherein the training data set presents a long tail distribution; and a training component 1103, configured to train the endoscope image classification model based on the training data set until the target loss function of the endoscope image classification model converges, so as to obtain a trained endoscope Image classification model.
  • the target loss function is determined based at least on the corresponding multiple output results of the multiple expert sub-networks.
  • the training component 1103 includes: an input subcomponent 1103_1, which is used to input image samples in the training image sample set into each of the plurality of expert sub-networks; an output result generation subcomponent 1103_2, which utilizes the The plurality of expert sub-networks, generating a corresponding plurality of expert sub-network output results for the image sample; based on the plurality of expert sub-network output results, generating a final output result of the endoscopic image classification model; and
  • the loss function calculation subcomponent 1103_3 calculates a loss value through a target loss function based on at least the output results of the plurality of expert subnetworks and the final output result; and the parameter adjustment subcomponent 1103_4 adjusts the internal Parameters for the looking-glass image classification model.
  • the endoscope image classification model also includes a shared subnetwork, wherein the training component 1103 includes: an input subcomponent 1103_1, which inputs image samples in the training image sample set into the shared subnetwork to extract Shallow feature representation; output result generating subcomponent 1103_2, based on the extracted shallow feature representation, using the multiple expert sub-networks to generate corresponding output results of multiple expert sub-networks for the image sample; based on the multiple output results of a plurality of expert sub-networks to generate the final output results of the endoscopic image classification model; and loss function calculation subcomponent 1103_3, based on at least the output results of the plurality of expert sub-networks and the final output results, through the target loss function to calculate a loss value; and a parameter adjustment subcomponent 1103_4, which adjusts the parameters of the endoscopic image classification model based on the loss value.
  • the training component 1103 includes: an input subcomponent 1103_1, which inputs image samples in the training image sample set into the shared subnetwork to extract Shallow feature
  • the target loss function of the endoscope image classification model includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeled labels of image samples, and based on the multiple The KL divergence determined by the output of the sub-expert network.
  • the output result generating subcomponent 1103_2 fuses the output results of the plurality of expert sub-networks as the final output result of the endoscope image classification model.
  • the output result generating subcomponent 1103_2 fuses the output results of the multiple expert sub-networks includes: performing weighted average on the output results of the multiple expert sub-networks.
  • the endoscopic image classification model further includes a student network with the same structure as the expert sub-network, wherein the plurality of expert sub-networks constitute a teacher network, and the teacher network is used to train the A student network, the output result generating subcomponent 1103_2 further utilizes the student network to generate a corresponding student network output result for the image sample.
  • the loss function calculation subcomponent 1103_3 calculates the loss value through the target loss function based on the output results of the plurality of expert subnetworks, the final output result and the output result of the student network, and the parameter adjustment subcomponent 1103_4 is based on The loss value adjusts parameters of the endoscopic image classification model.
  • the target loss function is a weighted sum of the loss function of the teacher network and the loss function of the student network.
  • the sum of the weight value of the loss function of the teacher network and the weight value of the loss function of the student network is 1, and wherein the weight value of the loss function of the teacher network decreases continuously with the training iterations small until it finally decreases to 0, and the weight value of the loss function of the student network increases continuously with training iterations until it finally increases to 1.
  • the loss function of the teacher network includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeling labels of the image samples, and a cross-entropy loss function based on the output results of the multiple sub-expert networks
  • the loss function of the student network includes: a cross-entropy loss function determined based on the output result of the student network of the student network and the final output result of the endoscope image classification model, and based on the The KL divergence determined by the output results of the student network of the student network and the output results of the plurality of expert sub-networks generated by the plurality of expert sub-networks.
  • each of the plurality of expert sub-networks includes a multi-layer Transformer encoder connected in sequence, and a classifier.
  • an electronic device in another exemplary embodiment is also provided in the embodiments of the present disclosure.
  • the electronic device in the embodiments of the present disclosure may include a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the above embodiments may be implemented. Steps of a method for training an endoscopic image classification model or a method for endoscopic image recognition.
  • the electronic device is the server 100 in FIG. 120.
  • Embodiments of the present disclosure also provide a computer-readable storage medium.
  • FIG. 12 shows a schematic diagram 1200 of a storage medium according to an embodiment of the disclosure.
  • computer-executable instructions 1201 are stored on the computer-readable storage medium 1200 .
  • the training method of the endoscopic image classification model incorporating knowledge distillation and the endoscopic image classification method according to the embodiments of the present disclosure described with reference to the above figures can be executed.
  • the computer readable storage medium includes, but is not limited to, for example, volatile memory and/or nonvolatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
  • Embodiments of the present disclosure also provide a computer program product or computer program, the computer program product or computer program including computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the training method of the endoscopic image classification model incorporating knowledge distillation according to an embodiment of the present disclosure and Classification methods for endoscopic images.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Endoscopes (AREA)

Abstract

A training method and apparatus of an endoscope image classification model, and an image classification method. The endoscope image classification model comprises: a plurality of expert sub-networks. The method comprises: obtaining a training data set, wherein the training data set comprises a plurality of endoscopic images and labeling tags of the plurality of endoscopic images, and the training data set presents a long-tail distribution; and training the endoscope image classification model on the basis of the training data set till a target loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model, wherein the target loss function is determined at least on the basis of a corresponding plurality of output results of the plurality of expert sub-networks.

Description

内窥镜图像分类模型的训练方法、图像分类方法和装置Training method for endoscope image classification model, image classification method and device
本申请要求于2021年9月6日递交的中国专利申请第202111039189.1号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。This application claims the priority of the Chinese patent application No. 202111039189.1 submitted on September 6, 2021, and the content disclosed in the above Chinese patent application is cited in its entirety as a part of this application.
技术领域technical field
本公开的实施例涉及一种融合了知识蒸馏的内窥镜图像分类模型的训练方法、图像分类方法、装置及计算机可读介质。Embodiments of the present disclosure relate to a training method of an endoscope image classification model integrated with knowledge distillation, an image classification method, a device, and a computer-readable medium.
背景技术Background technique
结直肠癌是世界上发病率第3和死亡率第4的癌症,而95%以上的结直肠癌是由结肠息肉癌变的。在检出的息肉中,腺瘤占大多数,大约占10.86%到80%,一般认为结直肠癌起源于腺瘤性息肉,其癌变率为1.4%~9.2%。而其他息肉类型如增生型息肉、炎症性息肉(占比2.32%到13.8%)等分别只占很少的比例,呈现一种长尾分布。Colorectal cancer is the third most common cancer and the fourth most deadly cancer in the world, and more than 95% of colorectal cancers are caused by colonic polyps. Among the detected polyps, adenomas accounted for the majority, accounting for about 10.86% to 80%. It is generally believed that colorectal cancer originated from adenomatous polyps, and the cancerous rate was 1.4% to 9.2%. Other types of polyps, such as hyperplastic polyps and inflammatory polyps (2.32% to 13.8%), accounted for only a small proportion, showing a long-tailed distribution.
为了减轻医生的负担,有一些工作尝试研究使用深度学习的方式自动化地实现对息肉类型的识别。现有的对息肉分类的识别工作基本基于普通的卷积神经网络。它们通常使用一个现成的卷积神经网络,如ResNet,VGG,Inceptionv3等。但是它们都仅使用传统的训练方式,并没有考虑到息肉类型分布的不均衡性。In order to reduce the burden on doctors, some work attempts to automatically realize the identification of polyp types using deep learning. Existing recognition work on polyp classification is basically based on ordinary convolutional neural networks. They usually use an off-the-shelf convolutional neural network such as ResNet, VGG, Inceptionv3, etc. But they all only use the traditional training method, which does not take into account the imbalance of polyp type distribution.
目前针对长尾问题进行了大量的研究,例如,有一部分研究工作通过对数据集进行重采样的方法来解决长尾问题,包括对头部进行欠采样,对尾部进行过采样,或是根据每个类别的分布进行一种数据均衡的采样。然而这些方法预先了解了未来的数据分布,不符合现实的情况,且容易造成对尾部数据的过拟合。有一部分研究工作通过对不同的类或是样本分配不同的权重来解决长尾问题,通过修改损失来对尾部数据分配更高的权重。然而,这类方法虽然相较基于重采样的方法更简洁,但是它们面临着同样的问题,即易造成对头部/尾部数据的欠拟合/过拟合,且不符合现实情境。有一部分研究工作通过将头部数据学到的特征迁移到尾部数量不足的数据上,然而这类方法通 常模型和计算量都较为复杂。还有一些工作尝试融合以上方法或从其他角度解决长尾问题。如通过修改分类器模型更新的动量,去除其偏向头部数据的动量来解决这种不平衡问题。然而这种方法无法保证不会牺牲一部分头部数据的准确性。At present, a lot of research has been done on the long-tail problem. For example, some research works solve the long-tail problem by resampling the data set, including undersampling the head, oversampling the tail, or according to each A balanced sampling of the data is performed over the distribution of categories. However, these methods know the future data distribution in advance, which is not in line with the reality, and it is easy to cause overfitting to the tail data. Some research works solve the long-tail problem by assigning different weights to different classes or samples, and assign higher weights to the tail data by modifying the loss. However, although such methods are more concise than resampling-based methods, they face the same problem, that is, they are prone to underfitting/overfitting to the head/tail data, and do not conform to the real situation. Some research works transfer the features learned from the head data to the data with insufficient tail data. However, such methods usually have complex models and calculations. There are also some works that try to combine the above methods or solve the long tail problem from other perspectives. For example, by modifying the momentum of classifier model update and removing its momentum biased towards the head data, this imbalance problem can be solved. However, this method cannot guarantee that the accuracy of part of the header data will not be sacrificed.
在现有的对息肉进行分类的方法或研究工作中,通常没有考虑息肉类型长尾分布的特性,而直接使用卷积神经网络进行训练,或是将数据集的分布进行调整之后再训练,而这显然不符合现实中息肉数据的特性。不考虑数据的不均衡性直接进行训练容易使模型无法很好的对尾部数据进行识别,而将数据集重新调整之后再训练易形成对尾部数据的过拟合而对头部数据的准确性造成一定的损失。In the existing methods or research work for classifying polyps, the characteristics of the long-tail distribution of polyp types are usually not considered, and the convolutional neural network is directly used for training, or the distribution of the data set is adjusted before training. This obviously does not conform to the characteristics of polyp data in reality. Direct training without considering the imbalance of the data will easily make the model unable to identify the tail data well, and retraining the data set after re-training will easily cause overfitting to the tail data and cause damage to the accuracy of the head data. A certain loss.
因此,期望提出一种改进的息肉分类方法,使得能够适应于长尾数据分布并且可以同时提升头尾部准确性。Therefore, it is desirable to propose an improved polyp classification method that can adapt to long-tail data distributions and can simultaneously improve head and tail accuracy.
发明内容Contents of the invention
本公开的实施例提供一种训练内窥镜图像分类模型的方法、内窥镜图像分类方法、装置及计算机可读介质。Embodiments of the present disclosure provide a method for training an endoscopic image classification model, an endoscopic image classification method, an apparatus, and a computer-readable medium.
本公开的实施例提供了一种基于多专家决策的内窥镜图像分类模型的训练方法,其中所述内窥镜图像分类模型包括多个专家子网络,所述方法包括:获取训练数据集,所述训练数据集包括多个内窥镜影像图像以及所述多个内窥镜影像图像的标注标签,其中所述训练数据集呈现长尾分布;以及基于所述训练数据集对所述内窥镜图像分类模型进行训练,直到所述内窥镜图像分类模型的目标损失函数收敛,以获得训练完成的内窥镜图像分类模型,其中,所述目标损失函数是至少基于所述多个专家子网络的相应多个输出结果来确定的。Embodiments of the present disclosure provide a multi-expert decision-based training method for an endoscopic image classification model, wherein the endoscopic image classification model includes a plurality of expert sub-networks, and the method includes: obtaining a training data set, The training data set includes a plurality of endoscopic image images and label tags of the plurality of endoscopic image images, wherein the training data set presents a long-tail distribution; and based on the training data set, the endoscopic The endoscope image classification model is trained until the target loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model, wherein the target loss function is based at least on the basis of the plurality of expert subclasses The corresponding multiple output results of the network are determined.
例如,其中,基于所述训练数据集对所述内窥镜图像分类模型进行训练包括:将所述训练图像样本集中的图像样本输入到所述多个专家子网络中的每一个中;利用所述多个专家子网络,生成针对所述图像样本的相应的多个专家子网络输出结果;基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果;以及基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值,并基于所述损失值调整所述内窥镜图像分类模型的参数。For example, wherein, training the endoscope image classification model based on the training data set includes: inputting image samples in the training image sample set into each of the plurality of expert sub-networks; using the The plurality of expert sub-networks, generating a corresponding plurality of expert sub-network output results for the image sample; based on the plurality of expert sub-network output results, generating a final output result of the endoscopic image classification model; and Based on at least the plurality of expert sub-network outputs and the final output, a loss value is calculated by a target loss function, and parameters of the endoscopic image classification model are adjusted based on the loss value.
例如,其中,所述内窥镜图像分类模型还包括共享子网络,基于所述训练数据集对所述内窥镜图像分类模型进行训练包括:将所述训练图像样本集中的图像样本输入到所述共享子网络中以提取浅层特征表示;基于所提取的浅层特征表示,利用所述多个专家子网络生成针对所述图像样本的相应的多个专家子网络输出结果;基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果;以及基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值,并基于所述损失值调整所述内窥镜图像分类模型的参数。For example, wherein the endoscopic image classification model further includes a shared sub-network, and training the endoscopic image classification model based on the training data set includes: inputting image samples in the training image sample set to the The shared sub-network is used to extract shallow feature representations; based on the extracted shallow feature representations, use the multiple expert sub-networks to generate corresponding multiple expert sub-network output results for the image sample; based on the multiple expert sub-networks output results of a plurality of expert sub-networks to generate a final output result of the endoscopic image classification model; and based on at least the output results of the plurality of expert sub-networks and the final output results, a loss value is calculated by a target loss function, and A parameter of the endoscopic image classification model is adjusted based on the loss value.
例如,其中,所述内窥镜图像分类模型的目标损失函数包括:基于所述内窥镜图像分类模型的最终输出结果与图像样本的标注标签而确定的交叉熵损失函数,以及基于所述多个子专家网络输出结果而确定的KL散度。For example, wherein, the target loss function of the endoscope image classification model includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeled labels of image samples, and based on the multiple The KL divergence determined by the output of the sub-expert network.
例如,其中,基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果包括:将所述多个专家子网络输出结果进行融合,以作为所述内窥镜图像分类模型的最终输出结果。For example, wherein, based on the output results of the plurality of expert sub-networks, generating the final output result of the endoscope image classification model includes: fusing the output results of the plurality of expert sub-networks as the endoscope The final output of the image classification model.
例如,其中,将所述多个专家子网络输出结果进行融合包括:对所述多个专家子网络输出结果进行加权平均。For example, merging the output results of the multiple expert sub-networks includes: performing a weighted average on the output results of the multiple expert sub-networks.
例如,其中,所述内窥镜图像分类模型还包括与所述专家子网络具有相同结构的学生网络,其中,所述多个专家子网络构成教师网络,基于知识蒸馏利用所述教师网络来训练所述学生网络,所述方法进一步包括利用所述学生网络来生成针对所述图像样本的相应的学生网络输出结果。For example, wherein the endoscopic image classification model further includes a student network having the same structure as the expert sub-network, wherein the plurality of expert sub-networks form a teacher network, and the teacher network is used for training based on knowledge distillation The student network, the method further comprising utilizing the student network to generate a corresponding student network output for the image sample.
例如,其中,基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值包括:基于所述多个专家子网络输出结果、所述最终输出结果以及所述学生网络输出结果,通过目标损失函数来计算损失值。For example, wherein, based on at least the output results of the plurality of expert subnetworks and the final output result, calculating the loss value through the target loss function includes: based on the output results of the plurality of expert subnetworks, the final output result, and the Describe the output result of the student network, and calculate the loss value through the target loss function.
例如,其中,所述目标损失函数是所述教师网络的损失函数和所述学生网络的损失函数的加权和。For example, the target loss function is a weighted sum of the loss function of the teacher network and the loss function of the student network.
例如,其中,所述教师网络的损失函数的权重值和所述学生网络的损失函数的权重值之和为1,并且其中所述教师网络的损失函数的权重值随着训练的迭代而不断减小,直到最终减小为0,所述学生网络的损失函数的权重值随着训练的迭代而不断增加,直到最终增加为1。For example, wherein, the sum of the weight value of the loss function of the teacher network and the weight value of the loss function of the student network is 1, and wherein the weight value of the loss function of the teacher network decreases continuously with the training iterations small until it finally decreases to 0, and the weight value of the loss function of the student network increases continuously with training iterations until it finally increases to 1.
例如,其中,所述教师网络的损失函数包括:基于所述内窥镜图像分类 模型的最终输出结果与图像样本的标注标签而确定的交叉熵损失函数,以及基于所述多个子专家网络输出结果而确定的KL散度;所述学生网络的损失函数包括:基于所述学生网络的学生网络输出结果与所述内窥镜图像分类模型的最终输出结果而确定的交叉熵损失函数,以及基于所述学生网络的学生网络输出结果与所述多个专家子网络所生成的多个专家子网络输出结果所确定的KL散度。For example, wherein the loss function of the teacher network includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeling labels of the image samples, and a cross-entropy loss function based on the output results of the multiple sub-expert networks And the determined KL divergence; the loss function of the student network includes: a cross-entropy loss function determined based on the student network output result of the student network and the final output result of the endoscope image classification model, and based on the The KL divergence determined by the output results of the student network of the student network and the output results of the plurality of expert sub-networks generated by the plurality of expert sub-networks.
例如,其中,所述共享子网络包括Vision Transformer,所述多个专家子网络中的每一个包括多层依次连接的Transformer编码器,以及一个分类器。For example, wherein the shared sub-network includes a Vision Transformer, each of the plurality of expert sub-networks includes a multi-layer Transformer encoder connected in sequence, and a classifier.
根据本公开的另一个实施例,提供了一种内窥镜图像分类方法,包括:获取待识别的内窥镜图像;基于训练好的内窥镜图像分类模型,获得所述内窥镜图像的分类结果;其中,所述训练好的内窥镜图像分类模型是基于根据如上所述的内窥镜图像分类模型的训练方法所获得的。According to another embodiment of the present disclosure, a method for classifying endoscopic images is provided, including: acquiring an endoscopic image to be identified; Classification results; wherein, the trained endoscopic image classification model is obtained based on the training method of the endoscopic image classification model as described above.
根据本公开的另一个实施例,提供了一种内窥镜图像分类方法,包括:获取待识别的内窥镜图像;基于训练好的内窥镜图像分类模型中的学生网络,获得所述内窥镜图像的分类结果;其中,所述训练好的内窥镜图像分类模型基于如上所述的内窥镜图像分类模型的训练方法所获得的。According to another embodiment of the present disclosure, a method for classifying endoscopic images is provided, including: acquiring an endoscopic image to be recognized; and obtaining the endoscopic image based on the student network in the trained endoscopic image classification model Classification results of endoscope images; wherein, the trained endoscope image classification model is obtained based on the above-mentioned endoscope image classification model training method.
根据本公开的另一个实施例,提供了一种内窥镜图像分类系统,包括:图像获取部件,用于获取待识别的内窥镜图像;处理部件,用于基于训练好的内窥镜图像分类模型获得所述内窥镜图像的分类结果;输出部件,用于输出待识别图像的分类结果,其中,所述训练好的内窥镜图像分类模型是基于根据如上所述的内窥镜图像分类模型的训练方法所获得的。According to another embodiment of the present disclosure, an endoscope image classification system is provided, including: an image acquisition component, used to acquire an endoscope image to be recognized; a processing component, used to The classification model obtains the classification result of the endoscopic image; the output unit is used to output the classification result of the image to be recognized, wherein the trained endoscopic image classification model is based on the endoscopic image as described above Obtained by the training method of the classification model.
根据本公开的另一个实施例,提供了一种内窥镜图像分类系统,包括:图像获取部件,用于获取待识别的内窥镜图像;处理部件,用于基于训练好的内窥镜图像分类模型中的学生网络获得所述内窥镜图像的分类结果;输出部件,用于输出待识别图像的分类结果,其中,所述训练好的内窥镜图像分类模型基于根据如上所述的内窥镜图像分类模型的训练方法所获得的。According to another embodiment of the present disclosure, an endoscope image classification system is provided, including: an image acquisition component, used to acquire an endoscope image to be recognized; a processing component, used to The student network in the classification model obtains the classification result of the endoscopic image; the output unit is used to output the classification result of the image to be recognized, wherein the trained endoscopic image classification model is based on the internal Obtained by the training method of the looking-glass image classification model.
根据本公开的另一个实施例,提供了一种基于多专家决策的内窥镜图像分类模型的训练装置,其中所述内窥镜图像分类模型包括多个专家子网络,所述装置包括:训练数据集获取部件,用于获取训练数据集,所述训练数据集包括多个内窥镜影像图像以及所述多个内窥镜影像图像的标注标签,其中所述训练数据集呈现长尾分布;以及训练部件,用于基于所述训练数据集对 所述内窥镜图像分类模型进行训练,直到所述内窥镜图像分类模型的目标损失函数收敛,以获得训练完成的内窥镜图像分类模型,其中,所述目标损失函数是至少基于所述多个专家子网络的相应多个输出结果来确定的。According to another embodiment of the present disclosure, a training device for an endoscopic image classification model based on multi-expert decision-making is provided, wherein the endoscopic image classification model includes a plurality of expert sub-networks, and the device includes: training The data set acquisition component is used to obtain a training data set, the training data set includes a plurality of endoscopic image images and label labels of the plurality of endoscopic image images, wherein the training data set presents a long-tail distribution; And a training component for training the endoscope image classification model based on the training data set until the target loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model , wherein the target loss function is determined based at least on the corresponding multiple output results of the multiple expert sub-networks.
本公开的实施例还提供了一种电子设备,包括存储器和处理器,其中,所述存储器上存储有处理器可读的程序代码,当处理器执行所述程序代码时,执行如上所述的方法。An embodiment of the present disclosure also provides an electronic device, including a memory and a processor, wherein the memory stores program codes readable by the processor, and when the processor executes the program codes, the above-mentioned method.
本公开的实施例还提供了一种计算机可读存储介质,其上存储有计算机可执行指令,所述计算机可执行指令用于执行如上所述的方法。Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer-executable instructions are stored, and the computer-executable instructions are used to execute the method as described above.
根据本公开的实施例的内窥镜图像分类模型的训练方法结合实际情况提出了一种基于多专家共同决策的方式来学习不均衡的数据分布,不需要预先了解数据分布,且可以同时提高模型对头部和尾部数据的预测准确性,而不造成偏倚,此外通过知识蒸馏的方式来对模型进行压缩,使模型更简洁。According to the training method of the endoscopic image classification model according to the embodiment of the present disclosure, a method based on multi-expert joint decision-making is proposed to learn the unbalanced data distribution in combination with the actual situation. It does not need to know the data distribution in advance, and can improve the model at the same time. The prediction accuracy of the head and tail data does not cause bias. In addition, the model is compressed by knowledge distillation to make the model more concise.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例的附图作简单地介绍。明显地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the accompanying drawings of the embodiments of the present disclosure will be briefly introduced below. Apparently, the drawings in the following description only relate to some embodiments of the present disclosure, rather than limiting the present disclosure.
图1示出了本公开实施例中内窥镜图像分类模型训练及内窥镜图像分类方法的应用架构示意图;Fig. 1 shows a schematic diagram of the application architecture of the endoscopic image classification model training and the endoscopic image classification method in the embodiment of the present disclosure;
图2示出了Vision Transformer(ViT)的一个示例性框图;Fig. 2 shows an exemplary block diagram of Vision Transformer (ViT);
图3示出了图2中的ViT将原始图片展平成序列的示意图;Fig. 3 shows a schematic diagram of ViT in Fig. 2 flattening the original picture into a sequence;
图4示出了根据本公开实施例的息肉影像图像;Fig. 4 shows a polyp imaging image according to an embodiment of the present disclosure;
图5A示出了根据本公开一个实施例的内窥镜图像分类模型500A的示意性结构;FIG. 5A shows a schematic structure of an endoscopic image classification model 500A according to an embodiment of the present disclosure;
图5B示出了根据本公开另一个实施例的内窥镜图像分类模型500B的示意性结构;FIG. 5B shows a schematic structure of an endoscope image classification model 500B according to another embodiment of the present disclosure;
图5C示出了根据本公开又一个实施例的以Transformer作为特征提取器的内窥镜图像分类模型500C的示意结构;FIG. 5C shows a schematic structure of an endoscopic image classification model 500C using Transformer as a feature extractor according to yet another embodiment of the present disclosure;
图6A示出了用于训练根据本公开一个实施例的内窥镜图像分类模型的方法的流程图;FIG. 6A shows a flowchart of a method for training an endoscopic image classification model according to one embodiment of the present disclosure;
图6B示出了图6A中的步骤S603的更具体的示例性说明;FIG. 6B shows a more specific exemplary description of step S603 in FIG. 6A;
图7A示出了根据本公开一个实施例的融合了知识蒸馏的内窥镜图像分类模型700A的示意性图;FIG. 7A shows a schematic diagram of an endoscopic image classification model 700A incorporating knowledge distillation according to an embodiment of the present disclosure;
图7B示出了根据本公开另一个实施例的融合了知识蒸馏的内窥镜图像分类模型700B的示意性图;FIG. 7B shows a schematic diagram of an endoscopic image classification model 700B incorporating knowledge distillation according to another embodiment of the present disclosure;
图7C示出了根据本公开又一个实施例的融合了知识蒸馏的内窥镜图像分类模型700C的示意性图;FIG. 7C shows a schematic diagram of an endoscopic image classification model 700C incorporating knowledge distillation according to yet another embodiment of the present disclosure;
图8示出了用于训练根据本公开一个实施例的融合了知识蒸馏的内窥镜图像分类模型的方法的流程图;FIG. 8 shows a flowchart of a method for training an endoscopic image classification model incorporating knowledge distillation according to one embodiment of the present disclosure;
图9描述了根据本公开实施例的内窥镜图像分类方法的流程图;FIG. 9 depicts a flowchart of a method for classifying endoscopic images according to an embodiment of the present disclosure;
图10本公开实施例中一种内窥镜图像分类系统的结构示意图;FIG. 10 is a schematic structural diagram of an endoscope image classification system in an embodiment of the present disclosure;
图11示出了根据本公开实施例的内窥镜图像分类模型的训练装置;以及FIG. 11 shows a training device for an endoscopic image classification model according to an embodiment of the present disclosure; and
图12示出了根据本公开的实施例的存储介质的示意图。FIG. 12 shows a schematic diagram of a storage medium according to an embodiment of the disclosure.
具体实施方式Detailed ways
下面将结合附图对本申请实施例中的技术方案进行清楚、完整地描述,显而易见地,所描述的实施例仅仅是本申请的部分实施例,而不是全部的实施例。基于本申请实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,也属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described embodiments are only some of the embodiments of the present application, not all of the embodiments. Based on the embodiments of the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts also fall within the protection scope of the present application.
本说明书中使用的术语是考虑到关于本公开的功能而在本领域中当前广泛使用的那些通用术语,但是这些术语可以根据本领域普通技术人员的意图、先例或本领域新技术而变化。此外,特定术语可以由申请人选择,并且在这种情况下,其详细含义将在本公开的详细描述中描述。因此,说明书中使用的术语不应理解为简单的名称,而是基于术语的含义和本公开的总体描述。The terms used in this specification are those general terms currently widely used in the art in consideration of functions about the present disclosure, but the terms may be changed according to the intention of those of ordinary skill in the art, precedents, or new technologies in the art. Also, specific terms may be selected by the applicant, and in this case, their detailed meanings will be described in the detailed description of the present disclosure. Therefore, the terms used in the specification should not be understood as simple names, but based on the meaning of the terms and the general description of the present disclosure.
虽然本申请对根据本申请的实施例的系统中的某些模块做出了各种引用,然而,任何数量的不同模块可以被使用并运行在用户终端和/或服务器上。所述模块仅是说明性的,并且所述系统和方法的不同方面可以使用不同模块。Although the application makes various references to certain modules in the system according to the embodiments of the application, any number of different modules may be used and run on the user terminal and/or the server. The modules are illustrative only, and different aspects of the systems and methods may use different modules.
本申请中使用了流程图来说明根据本申请的实施例的系统所执行的操作。应当理解的是,前面或下面操作不一定按照顺序来精确地执行。相反,根据需要,可以按照倒序或同时处理各种步骤。同时,也可以将其他操作添 加到这些过程中,或从这些过程移除某一步或数步操作。Flow charts are used in this application to illustrate the operations performed by the system according to the embodiments of this application. It should be understood that the preceding or following operations are not necessarily performed in an exact order. Instead, various steps may be processed in reverse order or concurrently, as desired. At the same time, other operations can be added to these procedures, or a certain step or steps can be removed from these procedures.
关于消化道疾病的诊断,通常基于内窥镜等诊断工具获取消化道内部的病灶影像,相关医疗人员通过人眼观察判断病变类别。为了减轻医生的负担,有一些工作尝试研究使用深度学习的方式自动识别病变类别,然而这些病变类型通常具有长尾分布特征。例如,在检出的息肉中,腺瘤占大多数,大约占10.86%到80%,一般认为结直肠癌起源于腺瘤性息肉,其癌变率为1.4%~9.2%。而其他息肉类型如增生型息肉、炎症性息肉(占比2.32%到13.8%)等分别只占很少的比例,呈现一种长尾分布。在现有的对息肉进行分类的方法中,通常没有考虑息肉类型分布的特性,而直接使用卷积神经网络进行训练,或者是将数据集的分布进行调整之后再训练,而这显然不符合现实中息肉数据的特性。不考虑数据的不均衡性直接进行训练容易使模型无法很好的对尾部数据进行识别,而将数据集重新调整之后再训练易形成对尾部数据的过拟合,从而对头部数据的准确性造成一定的损失。With regard to the diagnosis of gastrointestinal diseases, images of lesions inside the gastrointestinal tract are usually obtained based on diagnostic tools such as endoscopes, and relevant medical personnel judge the type of lesions by observing with human eyes. In order to reduce the burden on doctors, some work has tried to use deep learning to automatically identify lesion categories, but these lesion types usually have long-tail distribution characteristics. For example, among the detected polyps, adenomas account for the majority, accounting for about 10.86% to 80%. It is generally believed that colorectal cancer originates from adenomatous polyps, and its cancerous rate is 1.4% to 9.2%. Other types of polyps, such as hyperplastic polyps and inflammatory polyps (2.32% to 13.8%), accounted for only a small proportion, showing a long-tailed distribution. In the existing methods for classifying polyps, the characteristics of the polyp type distribution are usually not considered, and the convolutional neural network is directly used for training, or the distribution of the data set is adjusted before training, which is obviously not in line with reality. Properties of polyp data in . Direct training without considering the imbalance of the data will easily make the model unable to identify the tail data well, and retraining the data set after re-training will easily lead to overfitting of the tail data, thus affecting the accuracy of the head data. cause certain losses.
因此,本公开针对息肉影像数据的长尾分布特性,提出了一种适应于长尾数据分布并且可以同时提升头尾部准确性的多专家联合算法,同时通过一种端到端(end-to-end)的知识蒸馏方法将其集成为一个更简洁的模型。Therefore, aiming at the long-tail distribution characteristics of polyp image data, this disclosure proposes a multi-expert joint algorithm that adapts to the long-tail data distribution and can improve the accuracy of the head and tail at the same time. end) to integrate it into a more compact model.
图1示出了本公开实施例的内窥镜图像分类模型训练方法及内窥镜图像分类方法及内窥镜图像分类方法的应用架构示意图,包括服务器100、终端设备200。FIG. 1 shows a schematic diagram of an endoscopic image classification model training method, an endoscopic image classification method, and an application architecture of an endoscopic image classification method according to an embodiment of the present disclosure, including a server 100 and a terminal device 200 .
终端设备200可以是医疗设备,例如,用户可以基于终端设备200查看内窥镜图像分类结果。The terminal device 200 may be a medical device, for example, a user may view endoscopic image classification results based on the terminal device 200 .
终端设备200与服务器100之间可以通过互联网相连,实现相互之间的通信。可选地,上述的互联网使用标准通信技术和/或协议。互联网通常为因特网、但也可以是任何网络,包括但不限于局域网(Local Area Network,LAN)、城域网(Metropolitan AreaNetwork,MAN)、广域网(Wide Area Network,WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合。在一些实施例中,使用包括超文本标记语言(Hyper Text MarkupLanguage,HTML)、可扩展标记语言(Extensible Markup Language,XML)等的技术和/或格式来代表通过网络交换的数据。此外还可以使用诸如安全套接字层(Secure SocketLayer,SSL)、传输层安全(Transport Layer Security,TLS)、虚拟专用网络(VirtualPrivate Network,VPN)、网际协议安全(Internet Protocol  Security,IPsec)等常规加密技术来加密所有或者一些链路。在另一些实施例中,还可以使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。The terminal device 200 and the server 100 may be connected through the Internet to realize mutual communication. Optionally, the aforementioned Internet uses standard communication technologies and/or protocols. The Internet is usually the Internet, but can be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network , private network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using technologies and/or formats including Hyper Text Markup Language (HTML), Extensible Markup Language (XML), and the like. In addition, conventional methods such as Secure Socket Layer (Secure SocketLayer, SSL), Transport Layer Security (Transport Layer Security, TLS), Virtual Private Network (VirtualPrivate Network, VPN), Internet Protocol Security (Internet Protocol Security, IPsec) can also be used. Encryption technology to encrypt all or some links. In some other embodiments, customized and/or dedicated data communication technologies may also be used to replace or supplement the above data communication technologies.
服务器100可以为终端设备200提供各种网络服务,其中,服务器100可以是一台服务器、若干台服务器组成的服务器集群或云计算中心。The server 100 may provide various network services for the terminal device 200, wherein the server 100 may be a server, a server cluster composed of several servers, or a cloud computing center.
具体地,服务器100可以包括处理器110(Center Processing Unit,CPU)、存储器120、输入设备130和输出设备140等,输入设备130可以包括键盘、鼠标、触摸屏等,输出设备140可以包括显示设备,如液晶显示器(Liquid Crystal Display,LCD)、阴极射线管(Cathode Ray Tube,CRT)等。Specifically, the server 100 may include a processor 110 (Center Processing Unit, CPU), a memory 120, an input device 130, an output device 140, etc., the input device 130 may include a keyboard, a mouse, a touch screen, etc., and the output device 140 may include a display device, Such as liquid crystal display (Liquid Crystal Display, LCD), cathode ray tube (Cathode Ray Tube, CRT) and so on.
存储器120可以包括只读存储器(ROM)和随机存取存储器(RAM),并向处理器110提供存储器120中存储的程序指令和数据。在本公开实施例中,存储器120可以用于存储本公开实施例中内窥镜图像分类模型训练方法或基于训练好的内窥镜图像分类模型进行内窥镜影像分类方法的程序。The memory 120 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides program instructions and data stored in the memory 120 to the processor 110 . In the embodiment of the present disclosure, the memory 120 may be used to store the program of the endoscope image classification model training method or the endoscope image classification method based on the trained endoscope image classification model in the present disclosure embodiment.
处理器110通过调用存储器120存储的程序指令,处理器110用于按照获得的程序指令执行本公开实施例中任一种内窥镜图像分类模型训练方法或基于训练好的内窥镜图像分类模型进行内窥镜影像分类方法的步骤。The processor 110 calls the program instructions stored in the memory 120, and the processor 110 is used to execute any endoscopic image classification model training method in the embodiments of the present disclosure or based on the trained endoscopic image classification model according to the obtained program instructions. The steps of the method for classifying endoscopic images are performed.
例如,本公开实施例中,内窥镜图像分类模型训练方法或基于训练好的内窥镜图像分类模型进行内窥镜影像分类方法主要由服务器100侧执行,例如,针对内窥镜图像分类方法,终端设备200可以将采集到的消化道病灶(例如,息肉)的影像图像发送给服务器100,由服务器100对病灶影像进行类型识别,并可以将病灶分类结果返回给终端设备200。For example, in the embodiment of the present disclosure, the endoscopic image classification model training method or the endoscopic image classification method based on the trained endoscopic image classification model is mainly executed by the server 100 side, for example, for the endoscopic image classification method The terminal device 200 can send the collected images of gastrointestinal lesions (for example, polyps) to the server 100, and the server 100 can identify the type of the lesion images, and can return the lesion classification result to the terminal device 200.
如图1所示的应用架构,是以应用于服务器100侧为例进行说明的,当然,本公开实施例中内窥镜图像分类方法也可以由终端设备200执行,例如终端设备200可以从服务器100侧获得训练好的融合了知识蒸馏的内窥镜图像分类模型,从而基于该融合了知识蒸馏的内窥镜图像分类模型中的学生网络,对病灶影像进行类型识别,获得病灶分类结果,本公开实施例对此并不进行限制。The application architecture shown in FIG. 1 is described by taking the application on the server 100 side as an example. Of course, the endoscopic image classification method in the embodiment of the present disclosure can also be executed by the terminal device 200, for example, the terminal device 200 can download from the server On the 100 side, the trained endoscopic image classification model fused with knowledge distillation is obtained, and based on the student network in the endoscopic image classification model fused with knowledge distillation, the types of lesion images are identified and the result of lesion classification is obtained. The disclosed embodiments are not limited in this respect.
另外,本公开实施例中的应用架构图是为了更加清楚地说明本公开实施例中的技术方案,并不构成对本公开实施例提供的技术方案的限制,当然,对于其它的应用架构和业务应用,本公开实施例提供的技术方案对于类似的问题,同样适用。In addition, the application architecture diagrams in the embodiments of the present disclosure are for more clearly illustrating the technical solutions in the embodiments of the present disclosure, and do not constitute limitations on the technical solutions provided in the embodiments of the present disclosure. Of course, for other application architectures and business applications , the technical solutions provided by the embodiments of the present disclosure are also applicable to similar problems.
本公开各个实施例以应用于图1所示的应用架构图为例进行示意性说明。Various embodiments of the present disclosure are schematically described by taking the application architecture diagram shown in FIG. 1 as an example.
首先,为了使本领域技术人员能更清楚地理解本公开的原理,下面对本公开所涉及的一些技术术语进行简要的描述。First, in order to enable those skilled in the art to understand the principles of the present disclosure more clearly, some technical terms involved in the present disclosure are briefly described below.
知识蒸馏:知识蒸馏通常采用一种老师-学生(teacher-student)架构,利用大模型(老师)学习到的知识去指导小模型(学生)训练,使得小模型具有与大模型相当的性能,但是参数数量大幅降低,从而实现模型压缩与加速。Knowledge distillation: Knowledge distillation usually adopts a teacher-student architecture, using the knowledge learned by the large model (teacher) to guide the training of the small model (student), so that the small model has the same performance as the large model, but The number of parameters is greatly reduced, enabling model compression and acceleration.
KL散度:KL散度全称叫kullback leibler散度,一般用于度量两个概率分布函数之间的“距离”,对于一个离散型随机变量的两个概率分布P和Q来说,他们的KL散度定义为:KL divergence: The full name of KL divergence is kullback leibler divergence, which is generally used to measure the "distance" between two probability distribution functions. For two probability distributions P and Q of a discrete random variable, their KL Divergence is defined as:
Figure PCTCN2022117043-appb-000001
Figure PCTCN2022117043-appb-000001
最小化KL散度,可以使分布P和Q变得接近,同理,最小化负的KL散度,可以使P和Q的分布距离最大化。KL散度是机器学习领域中常用的一种损失函数。Minimizing the KL divergence can make the distributions P and Q close. Similarly, minimizing the negative KL divergence can maximize the distribution distance between P and Q. KL divergence is a commonly used loss function in the field of machine learning.
Transformer:Transformer在谷歌公司的一篇论文《Attention is All You Need》中被提出,用于解决自然语言翻译问题。其基于注意力机制来提高模型训练速度。典型的Transformer包括多头注意力(Multi-head Attention)模块和多层感知机(MLP,Multilayer Perceptron)模块。多头注意力模块可以帮助编码器在编码某个特定单词时,也会查看其他单词。每个模块之前具有一个层归一化(Layer Normalization)模块,并使用残差连接来联通每个模块,层归一化模块用于对Transformer学习过程中由于多词条嵌入(embedding)累加可能带来的“尺度”问题施加约束,相当于对表达每个词一词多义的空间施加了约束,有效降低模型方差。Transformer: Transformer was proposed in a paper "Attention is All You Need" by Google to solve the problem of natural language translation. It is based on the attention mechanism to improve the model training speed. A typical Transformer includes a multi-head attention (Multi-head Attention) module and a multi-layer perceptron (MLP, Multilayer Perceptron) module. A multi-head attention module helps the encoder to look at other words while encoding a specific word. Each module has a layer normalization (Layer Normalization) module before it, and uses the residual connection to connect each module. The layer normalization module is used for the Transformer learning process due to the multi-entry embedding (embedding) accumulation may bring Imposing constraints on the "scale" problem that comes, is equivalent to imposing constraints on the polysemy space that expresses each word, effectively reducing the variance of the model.
Vision Transformer(ViT):Vision Transformer是一种将Transformer从自然语言处理转移到图像处理的一个技术。Vision Transformer (ViT): Vision Transformer is a technology that transfers Transformer from natural language processing to image processing.
图2示出了ViT的一个示例性框图。类似于在将Transformer应用于文本时使用的一系列单词嵌入,ViT对原始图片分为方块网格,通过连接一个方块中所有像素通道,然后利用线性映射器将其线性投影到所需的输入维度,将每个方块展平为单个向量。ViT与输入元素的结构无关,因此还进一步需要利用位置编码器在每个方块向量中添加可学习的位置嵌入,使模型能够了解图像结构。最后将展平的序列输入进原始Transformer模型的编码器部分(例如图2所示的m层的Transformer编码器块)用以进行特征提取,最后 接入一个全连接层对图片进行分类或分割等任务。Figure 2 shows an exemplary block diagram of ViT. Similar to the series of word embeddings used when applying the Transformer to text, ViT divides the original image into a grid of squares, by concatenating all pixel channels in a square, and then linearly projecting it to the desired input dimension using a linear mapper , flatten each square into a single vector. ViT is agnostic to the structure of the input elements, so it is further required to utilize a position encoder to add a learnable position embedding in each square vector, enabling the model to understand the image structure. Finally, the flattened sequence is input into the encoder part of the original Transformer model (such as the m-layer Transformer encoder block shown in Figure 2) for feature extraction, and finally a fully connected layer is connected to classify or segment pictures, etc. Task.
图3示出了图2中的ViT将原始图片展平成序列的示意图。Fig. 3 shows a schematic diagram of ViT in Fig. 2 flattening the original picture into a sequence.
如图3所示,输入ViT的图像是一张H×W×C的息肉白光影像图像,其中H和W分别为长和宽方向上的像素数量,C为通道数量。先对图片分为方块,再进行展平。假设每个方块的长宽为(P×P),那么方块的数目为N=H×W/(P×P),然后对每个图片方块展平成一维向量,每个向量大小为P×P×C,N个方块总的输入向量变换为N×(P×P×C)。接着利用线性映射器对每个向量都做一个线性变换(即全连接层)来进行矩阵变维(reshape),将维度压缩为D,这里称其为图块嵌入(Patch Embedding),就得到了一个N×D的嵌入序列(embedding vector),N是最终得到的嵌入序列的长度,D是嵌入序列的每个向量的维度。由此,H×W×C的三维图形就转换为了(N×D)的二维输入。随后,用一个位置编码器在序列中加入位置信息。接下来便可以将加入了位置信息以后的序列输入到Transformer编码器中进行特征提取。应当理解,Transformer和Vision Transformer的结构及其进行提取特征的技术在本领域是公知的,在此不做过多赘述。As shown in Figure 3, the image input into ViT is a polyp white light image of H×W×C, where H and W are the number of pixels in the length and width directions, respectively, and C is the number of channels. The image is divided into squares first, and then flattened. Assuming that the length and width of each square is (P×P), then the number of squares is N=H×W/(P×P), and then each picture square is flattened into a one-dimensional vector, and the size of each vector is P× P×C, the total input vector of N blocks is transformed into N×(P×P×C). Then use the linear mapper to perform a linear transformation (that is, the fully connected layer) on each vector to reshape the matrix, and compress the dimension to D, which is called patch embedding (Patch Embedding) here, and you get An N×D embedding vector, N is the length of the final embedding sequence, and D is the dimension of each vector of the embedding sequence. Thus, the three-dimensional graphics of H×W×C are transformed into two-dimensional input of (N×D). Subsequently, a position encoder is used to add position information to the sequence. Next, the sequence after adding the position information can be input into the Transformer encoder for feature extraction. It should be understood that the structure of Transformer and Vision Transformer and the technology for extracting features thereof are well known in the art, and will not be repeated here.
根据本公开的一个实施例可以利用Vision Transformer来作为主干网络(backbone)来提取特征,以更准确的获取图像中的关键信息。在神经网络中,尤其是计算机视觉(Computer Vision,CV)领域,一般先对图像进行特征提取,这一部分是整个CV任务的根基,因为后续的下游任务都是基于提取出来的图像特征进行(比如分类,生成等等),所以将这一部分网络结构称为主干网络。According to an embodiment of the present disclosure, the Vision Transformer can be used as a backbone network (backbone) to extract features, so as to obtain key information in the image more accurately. In the neural network, especially in the field of computer vision (CV), image features are generally extracted first. This part is the foundation of the entire CV task, because subsequent downstream tasks are based on the extracted image features (such as classification, generation, etc.), so this part of the network structure is called the backbone network.
当然,应当注意的是,本公开实施例还可以利用其它的网络架构来作为主干网络,例如VggNet和ResNet架构等,本公开在此不做限制。Of course, it should be noted that the embodiments of the present disclosure may also use other network architectures as the backbone network, such as VggNet and ResNet architectures, etc., and the present disclosure is not limited here.
图4示出了根据本公开实施例的息肉影像图像。FIG. 4 shows a polyp imaging image according to an embodiment of the present disclosure.
内窥镜经人体的天然孔道,或者是经手术做的小切口进入人体内,获取关于病灶的影像,这些影像后续被用于疾病的诊断和治疗。如图4示出了利用内窥镜所拍摄到的息肉影像,左边的图像是在白光(white light,WL)成像模式下操作的内窥镜所获取的息肉的观测结果,右边是在窄带光成像(Narrow Band Imaging,NBI)模式下操作的内窥镜所获取的同一息肉的另一观测结果。The endoscope enters the human body through the natural orifice of the human body or through a small surgical incision to obtain images of the lesion, which are subsequently used for diagnosis and treatment of the disease. Figure 4 shows the polyp images captured by the endoscope. The image on the left is the observation result of the polyp obtained by the endoscope operating in white light (WL) imaging mode, and the image on the right is the observation result of the polyp in narrow-band light. Another observation of the same polyp obtained with an endoscope operated in Narrow Band Imaging (NBI) mode.
白光的宽带光谱由R/G/B(红/绿/蓝)3种光组成的,其波长分别为605nm、 540nm、415nm。在白光成像模式下呈现高亮度、锐利的白光内镜图像,有利于观察黏膜深层的构造。窄带光模式采用窄带滤光器代替传统的宽带滤光器,对不同波长的光进行限定,仅留下540nm和415nm波长的绿、蓝色窄带光波。在窄带光模式下生成的图像血管相对于粘膜的对比度显著增强,适合观察黏膜表层的血管形态和黏膜构造。The broadband spectrum of white light is composed of three kinds of light, R/G/B (red/green/blue), and their wavelengths are 605nm, 540nm, and 415nm respectively. In the white light imaging mode, it presents a high-brightness, sharp white-light endoscopic image, which is conducive to observing the deep structure of the mucosa. The narrow-band light mode uses a narrow-band filter to replace the traditional broadband filter to limit the light of different wavelengths, leaving only the green and blue narrow-band light waves with wavelengths of 540nm and 415nm. The image generated under the narrow-band light mode has significantly enhanced contrast between blood vessels and mucosa, which is suitable for observing the morphology of blood vessels and mucosal structure on the surface of the mucosa.
为了减轻医生的负担,现有的一些工作尝试研究使用深度学习的方式自动识别内窥镜获取的影像中的病灶的病变类别。然而现有的内窥镜影像分类的自动识别工作基本基于普通的卷积神经网络。它们通常使用一个现成的卷积神经网络,如ResNet,VGG,Inceptionv3等。但是它们都仅使用传统的训练方式,并没有考虑到某些内窥镜影像类型分布的不均衡性,例如,在检出的息肉中,腺瘤通常占大多数,而其他息肉类型如增生型息肉、炎症性息肉等分别只占很少的比例,呈现一种长尾分布。In order to reduce the burden on doctors, some existing works try to use deep learning to automatically identify the lesion categories of lesions in images acquired by endoscopy. However, the existing automatic recognition work for endoscopic image classification is basically based on ordinary convolutional neural networks. They usually use an off-the-shelf convolutional neural network such as ResNet, VGG, Inceptionv3, etc. However, they all only use traditional training methods and do not take into account the uneven distribution of certain endoscopic image types. For example, among the detected polyps, adenomas usually account for the majority, while other polyp types such as hyperplastic Polyps, inflammatory polyps, etc. only accounted for a small proportion, showing a long-tailed distribution.
因此,本公开针对息肉影像数据的长尾分布特性,提出了一种适应于长尾数据分布并且可以同时提升头尾部准确性的多专家联合算法。Therefore, aiming at the long-tail distribution characteristics of polyp image data, this disclosure proposes a multi-expert joint algorithm that is suitable for long-tail data distribution and can improve the accuracy of the head and tail at the same time.
以下,以息肉影像分类问题为例,对本公开实施例的技术方案进行示意性说明。应当注意,本公开实施例提供的技术方案同样适用于分布不均衡的一些其他内窥镜影像。In the following, the technical solutions of the embodiments of the present disclosure will be schematically described by taking the problem of polyp image classification as an example. It should be noted that the technical solutions provided by the embodiments of the present disclosure are also applicable to some other endoscopic images with unbalanced distribution.
例如,根据本公开的一个实施例,采用息肉的白光影像来构建呈现长尾分布的数据集。通过利用本申请提出的内窥镜图像分类模型的训练方法,可以使得训练好的内窥镜图像分类模型能够更好地识别出呈现长尾分布的息肉影像。For example, according to one embodiment of the present disclosure, white light images of polyps are used to construct a data set exhibiting a long-tailed distribution. By utilizing the training method of the endoscope image classification model proposed in this application, the trained endoscope image classification model can better identify polyp images exhibiting long-tail distribution.
应当理解,如果要针对其他分布不均衡的消化道病灶内窥镜影像进行分类识别,这里也可以采用任何其他分布不均衡的消化道病灶的内窥镜影像来构建数据集并对根据本公开实施例的内窥镜图像分类模型进行训练。这些内窥镜影像可以是内窥镜采取任意合适的模式所获取的影像,例如窄带光影像、自发荧光影像、I-SCAN影像等。例如,还可以将以上各种模态影像混合起来构建数据集,本公开对此不作限制。It should be understood that if it is necessary to classify and identify other endoscopic images of gastrointestinal lesions with uneven distribution, any other endoscopic images of gastrointestinal lesions with uneven distribution can also be used to construct a data set and implement the method according to the present disclosure. The endoscopic image classification model for example is trained. These endoscopic images may be images acquired by the endoscope in any suitable mode, such as narrow-band light images, autofluorescence images, I-SCAN images, and the like. For example, the above various modal images may also be mixed to construct a data set, which is not limited in the present disclosure.
本公开实施例针对息肉影像长尾分布的问题,提出多专家决策的内窥镜图像分类模型,一方面通过融合多个专家的决策结果来提高预测的整体准确性,另一方面通过最大化多个专家的预测结果之间的分布距离来使得不同的专家可以关注不同的数据分布,从而提高对分布不均衡的数据集的学习能力。The embodiment of the present disclosure aims at the long-tail distribution of polyp images, and proposes a multi-expert decision-making endoscopic image classification model. On the one hand, the overall accuracy of prediction is improved by fusing the decision results of multiple experts; The distribution distance between the prediction results of an expert allows different experts to pay attention to different data distributions, thereby improving the learning ability of unbalanced data sets.
图5A示出了根据本公开一个实施例的内窥镜图像分类模型500A的示意性结构。FIG. 5A shows a schematic structure of an endoscopic image classification model 500A according to one embodiment of the present disclosure.
如图5A所示,根据本公开一个实施例的内窥镜图像分类模型500A包括n个专家子网络,其中n是例如大于2的整数。每个专家子网络都包括一个特征提取器和一个分类器。As shown in FIG. 5A , an endoscopic image classification model 500A according to an embodiment of the present disclosure includes n expert sub-networks, where n is an integer greater than 2, for example. Each expert sub-network consists of a feature extractor and a classifier.
根据本公开实施例,这里的每个专家子网络可以具有相同的网络结构,每个专家子网络的结构可以是任何可以用于执行分类任务的深度学习网络结构,这类网络结构通常包括一个用于提取特征表示的特征提取器和一个用于进行分类的分类器。According to the embodiment of the present disclosure, each expert sub-network here can have the same network structure, and the structure of each expert sub-network can be any deep learning network structure that can be used to perform classification tasks. This type of network structure usually includes a A feature extractor for extracting feature representations and a classifier for classification.
例如,这里的特征提取器可以是如图2所示的Vision Transformer。例如,在利用图2的Vision Transformer作为特征提取器时,首先基于线性映射模块和位置编码器将输入的图像展平为N个一维的向量,再经过m层的transformer编码器块进行特征提取。For example, the feature extractor here can be the Vision Transformer shown in Figure 2. For example, when using the Vision Transformer in Figure 2 as a feature extractor, the input image is first flattened into N one-dimensional vectors based on the linear mapping module and the position encoder, and then feature extraction is performed through the m-layer transformer encoder block .
例如,这里的分类器可以是一个多头归一化分类器(multi-head normalized classifier),基于从Vision Transformer接收的图像样本的特征表示,该分类器可以输出所图像样本的预测的分类概率值。For example, the classifier here can be a multi-head normalized classifier (multi-head normalized classifier), based on the feature representation of the image sample received from the Vision Transformer, the classifier can output the predicted classification probability value of the image sample.
应当理解,本公开实施例的多专家子网络中的特征提取器和分类器可以是任何可以进行类似功能的其他结构。例如,这里的特征提取器还可以是深度残差网络(Deep residual network,ResNet),例如,这里的分类器还可以是ResNet网络的卷积层部分,本公开在此不做限制。It should be understood that the feature extractor and classifier in the multi-expert sub-network in the embodiment of the present disclosure may be any other structures that can perform similar functions. For example, the feature extractor here can also be a deep residual network (Deep residual network, ResNet), for example, the classifier here can also be the convolutional layer part of the ResNet network, and this disclosure is not limited here.
例如,这里将可以该内窥镜图像分类模型的最终优化目标确定为如下两个,一个是该内窥镜图像分类模型的最终输出的分类预测值与真实的标签之间的损失最小化,使得能够提高该内窥镜图像分类模型的预测准确率。另一个是多个专家输出的分类预测值之间的分布距离最大化,使得多个专家可以关注于数据集的不同数据分布。For example, here the final optimization objective of the endoscopic image classification model can be determined as the following two, one is to minimize the loss between the classification prediction value and the real label of the final output of the endoscopic image classification model, so that The prediction accuracy of the endoscope image classification model can be improved. The other is to maximize the distribution distance between the classification prediction values output by multiple experts, so that multiple experts can focus on different data distributions of the dataset.
例如,根据本公开的实施例,这里可以基于交叉熵损失函数来计算内窥镜图像分类模型的最终输出的分类预测值与真实的标签之间的损失。例如,根据本公开的实施例,可以通过最大化不同专家输出的分类预测值之间的KL散度来最大化不同专家之间的差异。For example, according to an embodiment of the present disclosure, the loss between the final output classification prediction value of the endoscope image classification model and the true label may be calculated based on a cross-entropy loss function. For example, according to an embodiment of the present disclosure, the difference between different experts can be maximized by maximizing the KL divergence between the classification prediction values output by different experts.
如此,本公开实施例基于交叉熵损失函数和KL散度来构造训练内窥镜图像分类模型的目标损失函数,在训练过程中不断优化目标损失函数,使其 最小化并收敛,即可确定内窥镜图像分类模型训练完成。In this way, the embodiment of the present disclosure constructs the target loss function for training the endoscopic image classification model based on the cross-entropy loss function and KL divergence. During the training process, the target loss function is continuously optimized to minimize and converge, and then the internal The training of the looking glass image classification model is complete.
此外,由于上述内窥镜图像分类模型500A中的每个专家子网络都需要从原始图片开始,先基于网络的较浅的层次来提取浅层特征表示,再基于更深层次的网络结构来提取具有特异性的更深层次的特征表示。事实上,由于浅层特征表示对分类决策的影响不大,为了进一步简化模型复杂度,这些专家子网络可以共享同一个浅层特征提取器所提取的浅层特征表示,再基于深层特征提取器来进一步地学习特异性的深层特征,以进行分类任务。In addition, since each expert sub-network in the above-mentioned endoscopic image classification model 500A needs to start from the original image, it first extracts the shallow feature representation based on the shallower layer of the network, and then extracts the feature representation based on the deeper network structure. Deeper feature representation of specificity. In fact, since the shallow feature representation has little influence on the classification decision, in order to further simplify the model complexity, these expert sub-networks can share the shallow feature representation extracted by the same shallow feature extractor, and then based on the deep feature extractor To further learn specific deep features for classification tasks.
因此,本公开提出了内窥镜图像分类模型500A的一个变型,如图5B所示。在图5B的内窥镜图像分类模型500B中,多个专家子网络共享一个浅层特征提取器,同时每个专家子网络具有各自的深层次的特征提取器,以及最后的一个分类器,通过共享一些共同的浅层的特征提取器,内窥镜图像分类模型500B具有比内窥镜图像分类模型500A更简洁的结构。Accordingly, the present disclosure proposes a variation of the endoscopic image classification model 500A, as shown in FIG. 5B . In the endoscopic image classification model 500B of Fig. 5B, multiple expert sub-networks share a shallow feature extractor, and each expert sub-network has its own deep-level feature extractor, and the last classifier, through Sharing some common shallow feature extractors, the endoscopic image classification model 500B has a more compact structure than the endoscopic image classification model 500A.
例如,这里的浅层特征提取器可以是图5A的内窥镜图像分类模型500A的多个专家子网络的特征提取器中的一些共同的浅层结构。For example, the shallow feature extractors here may be some common shallow structures in the feature extractors of multiple expert sub-networks in the endoscopic image classification model 500A of FIG. 5A .
例如,当内窥镜图像分类模型500A的每个专家子网络中的特征提取器是如图2所示的Vision Transformer的情况下,这里的浅层特征提取器可以是该Vision Tranformer的线性映射器层、位置编码器层和一个Transformer编码器块。这些专家子网络可以共享这个共同的浅层特征提取器来获取共同的浅层特征,并基于剩下的(m-1)层的Transformer编码器块作为深层特征提取器,以提取具有特异性的深层特征,如图5C中的内窥镜分类模型500C所示。可替代地,这里的共享子网络和深层特征提取器也可以是其他用于提取图像特征的任何合适的特征提取器。For example, when the feature extractor in each expert sub-network of the endoscope image classification model 500A is the Vision Transformer as shown in Figure 2, the shallow feature extractor here can be the linear mapper of the Vision Transformer layer, a position encoder layer, and a Transformer encoder block. These expert sub-networks can share this common shallow feature extractor to obtain common shallow features, and use the remaining (m-1) layer Transformer encoder block as a deep feature extractor to extract specific Deep features, as shown in endoscope classification model 500C in Figure 5C. Alternatively, the shared sub-network and deep feature extractor here can also be any other suitable feature extractors for extracting image features.
图6A示出了用于训练根据本公开一个实施例的内窥镜图像分类模型的方法600的流程图。例如,这里该内窥镜图像分类模型是如上参考图5A所示的内窥镜图像分类模型500A。例如,该内窥镜图像分类模型500A的训练方法600可以由服务器来执行,该服务器可以是图1中所示的服务器100。FIG. 6A shows a flowchart of a method 600 for training an endoscopic image classification model according to one embodiment of the present disclosure. For example, the endoscopic image classification model here is the endoscopic image classification model 500A shown above with reference to FIG. 5A . For example, the training method 600 of the endoscopic image classification model 500A can be executed by a server, which can be the server 100 shown in FIG. 1 .
首先,在步骤S601中,获取训练数据集,所述训练数据集包括多个内窥镜影像图像以及所述多个内窥镜影像图像的标注标签,其中所述训练数据集呈现长尾分布。First, in step S601, a training data set is obtained, the training data set includes a plurality of endoscopic image images and annotation labels of the plurality of endoscopic image images, wherein the training data set presents a long-tail distribution.
这里的训练数据集可以是模仿真实情况中息肉类型呈现长尾分布的情况所准备的。例如,在本公开的实施例的一个具体实现方式中,这里的训练 数据集可以包括2131张息肉的白光影像图像,这些图像有四种标注标签,分别为腺瘤、增生、炎症和癌症,其中标注腺瘤标签的影像占大多数(例如为65%),而其他息肉标签类型如增生型息肉、炎症性息肉和癌症等的影像分别只占很少的比例(例如分别只有13%、12%和10%),使得整个训练数据集呈现一种长尾分布。The training data set here can be prepared by simulating the long-tailed distribution of polyp types in the real situation. For example, in a specific implementation of an embodiment of the present disclosure, the training data set here may include 2131 white light image images of polyps, and these images have four kinds of labels, namely adenoma, hyperplasia, inflammation and cancer, where The images labeled with adenoma label accounted for the majority (eg, 65%), while images with other polyp label types such as hyperplastic polyps, inflammatory polyps, and cancer accounted for only a small proportion (eg, only 13%, 12%, respectively). and 10%), so that the entire training data set presents a long-tailed distribution.
应当理解,用于训练根据本公开实施例的内窥镜图像分类模型的训练方法的训练数据集的数量和标签比例可以根据实际情况进行调整,本公开对此不做限制。It should be understood that the number of training data sets and the proportion of labels used for training the training method of the endoscope image classification model according to the embodiment of the present disclosure may be adjusted according to the actual situation, which is not limited in the present disclosure.
例如,这里的训练数据集可以是通过操作内窥镜获得的,也可以是通过网络下载的方式获取的,也可以通过其他途径获取的,本公开的实施例对此不作限制。For example, the training data set here may be obtained by operating an endoscope, downloaded from a network, or obtained in other ways, which is not limited in the embodiments of the present disclosure.
应当理解,本公开实施例还可以同样适用于除息肉以外的其他消化道病灶的影像分类,例如炎症、溃疡、血管畸形以及憩室等,本公开对此不作限制。It should be understood that the embodiments of the present disclosure may also be applicable to image classification of other digestive tract lesions other than polyps, such as inflammation, ulcer, vascular malformation, and diverticulum, and the present disclosure is not limited thereto.
在步骤S603中,基于所述训练数据集对所述内窥镜图像分类模型进行训练,直到所述内窥镜图像分类模型的目标损失函数收敛,以获得训练完成的内窥镜图像分类模型。In step S603, the endoscope image classification model is trained based on the training data set until the target loss function of the endoscope image classification model converges, so as to obtain a trained endoscope image classification model.
如上所述,这里的目标一方面是通过融合多个专家的决策结果来提高预测的整体准确性,另一方面是通过最大化多个专家的预测结果之间的分布距离来使得不同的专家可以关注不同的数据分布,从而提高对分布不均衡的数据集的学习能力。因此,这里可以基于多专家决策内窥镜图像分类模型500A的最终输出的分类预测值与真实的标签之间的交叉熵损失最小化以及不同专家子网络输出的分类预测值之间的KL散度最大化来作为这里训练根据本申请实施例的内窥镜图像分类模型的训练目标。As mentioned above, the goal here is on the one hand to improve the overall accuracy of the prediction by fusing the decision results of multiple experts, and on the other hand to maximize the distribution distance between the prediction results of multiple experts so that different experts can Focus on different data distributions, thereby improving the learning ability on datasets with imbalanced distribution. Therefore, based on the multi-expert decision-making endoscopic image classification model 500A, the final output classification prediction value and the real label cross-entropy loss minimization and the KL divergence between the classification prediction values output by different expert sub-networks Maximization is used as the training target for training the endoscopic image classification model according to the embodiment of the present application.
下面参考图6B,来对步骤S603中的基于所述训练数据集对所述内窥镜图像分类模型进行训练步骤进行更具体的示例性说明。Referring to FIG. 6B below, a more specific exemplary description will be given to the step of training the endoscope image classification model based on the training data set in step S603.
如图6B所示,步骤S603中的基于所述训练数据集对所述内窥镜图像分类模型进行训练可以包括以下子步骤S603_1-S603_4。As shown in Fig. 6B, the training of the endoscope image classification model based on the training data set in step S603 may include the following sub-steps S603_1-S603_4.
具体地,在步骤S603_1中,将所述训练图像样本集中的图像样本输入到所述多个专家子网络中的每一个中。Specifically, in step S603_1, the image samples in the training image sample set are input into each of the plurality of expert sub-networks.
作为一个替代实施例,在基于图5B所示的内窥镜图像分类模型500B来 进行分类训练的情况下,可以首先经过一个共享子网络提取该图像样本的浅层特征,随后将该图像样本的这些浅层特征(而不是直接将原始图像样本)输入到如图内窥镜图像分类模型500B的多个专家子网络中的每一个中。如上所述,通过共享一些共同的浅层的特征提取器,内窥镜图像分类模型500B具有比内窥镜图像分类模型500A更简洁的结构。As an alternative embodiment, in the case of performing classification training based on the endoscope image classification model 500B shown in FIG. These shallow features (instead of raw image samples directly) are input into each of the multiple expert sub-networks as shown in the endoscopic image classification model 500B. As mentioned above, the endoscopic image classification model 500B has a more compact structure than the endoscopic image classification model 500A by sharing some common shallow feature extractors.
接着,在步骤S603_2中,利用所述多个专家子网络,生成针对所述图像样本的相应的多个专家子网络输出结果。Next, in step S603_2, using the plurality of expert sub-networks to generate corresponding output results of the plurality of expert sub-networks for the image sample.
例如,设输入图像为x,对于每个专家子网络,首先基于其特征提取器来提取图像样本的特征表示
Figure PCTCN2022117043-appb-000002
(例如,这里的特征提取器是如上所述的Vision Transformer,以函数
Figure PCTCN2022117043-appb-000003
代表,其中θ i表示第i个专家子网络的参数),则提取的特征表示为:
Figure PCTCN2022117043-appb-000004
For example, let the input image be x, for each expert sub-network, first extract the feature representation of the image sample based on its feature extractor
Figure PCTCN2022117043-appb-000002
(For example, the feature extractor here is the Vision Transformer as described above, with the function
Figure PCTCN2022117043-appb-000003
represents, where θ i represents the parameters of the i-th expert subnetwork), then the extracted features are expressed as:
Figure PCTCN2022117043-appb-000004
作为一个替代实施例,在基于图5B所示的内窥镜图像分类模型500B来进行分类训练的情况下,提取的特征还可以表示为:
Figure PCTCN2022117043-appb-000005
其中f (x)表示共享子网络,
Figure PCTCN2022117043-appb-000006
表示深层特征提取器。
As an alternative embodiment, in the case of performing classification training based on the endoscope image classification model 500B shown in FIG. 5B , the extracted features can also be expressed as:
Figure PCTCN2022117043-appb-000005
where f (x) represents the shared subnetwork,
Figure PCTCN2022117043-appb-000006
Represents a deep feature extractor.
然后,基于特征表示
Figure PCTCN2022117043-appb-000007
利用分类器对该图像样本进行分类,例如,这里的分类器可以是一个多头归一化分类器,基于该多头归一化分类器,第i个专家子网络的logits计算如下:
Then, based on the feature representation
Figure PCTCN2022117043-appb-000007
Use a classifier to classify the image sample. For example, the classifier here can be a multi-head normalized classifier. Based on the multi-head normalized classifier, the logits of the i-th expert subnetwork are calculated as follows:
Figure PCTCN2022117043-appb-000008
Figure PCTCN2022117043-appb-000008
其中,γ和τ为参数,K为多头(multi-head)的数量,w i为第i个专家子网络中分类器的权重参数,
Figure PCTCN2022117043-appb-000009
为第i个专家子网络针对输入的图像样本进行分类计算得出的logits,如本领域技术人员已知的,将该logits通过softmax归一化后,便可得到所预测的分类的概率值,如下方等式(2)所示。
Among them, γ and τ are parameters, K is the number of multi-heads, and w i is the weight parameter of the classifier in the i-th expert subnetwork,
Figure PCTCN2022117043-appb-000009
is the logits obtained by classifying and calculating the input image samples for the i-th expert sub-network, as known to those skilled in the art, after the logits are normalized by softmax, the probability value of the predicted classification can be obtained, as shown in equation (2) below.
Figure PCTCN2022117043-appb-000010
Figure PCTCN2022117043-appb-000010
在步骤S603_3中,基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果。例如,可以将多个专家子网络的输出结果进行融合来得到内窥镜图像分类模型的最终结果。例如,这里的融合可以是线性平均,如下方等式(3)所示。In step S603_3, based on the output results of the plurality of expert sub-networks, a final output result of the endoscopic image classification model is generated. For example, the output results of multiple expert sub-networks can be fused to obtain the final result of the endoscopic image classification model. For example, the fusion here can be a linear average, as shown in equation (3) below.
Figure PCTCN2022117043-appb-000011
Figure PCTCN2022117043-appb-000011
其中,n为内窥镜图像分类模型中专家子网络的数量,p soft(x)即为内窥镜图像分类模型的最终预测结果。 Among them, n is the number of expert sub-networks in the endoscopic image classification model, and p soft (x) is the final prediction result of the endoscopic image classification model.
在步骤S603_4中,通过目标损失函数来计算损失值,并基于所述损失值调整所述内窥镜图像分类模型的参数。In step S603_4, a loss value is calculated through a target loss function, and parameters of the endoscopic image classification model are adjusted based on the loss value.
如上所述,这里的模型优化的目标有两个,一个目标是多专家融合的最终结果更接近真实标签,另一个目标是使得多个专家的输出结果之间的分布距离最大化,以使得多个专家能够关注于数据不同的分布。As mentioned above, there are two goals of model optimization here. One goal is that the final result of multi-expert fusion is closer to the real label, and the other goal is to maximize the distribution distance between the output results of multiple experts, so that more An expert can focus on different distributions of the data.
因此,目标函数可以包括两部分,第一部分是基于融合后的分类预测概率与图像样本的真实标签之间的交叉熵损失函数,例如,如下等式(4)所示,Therefore, the objective function can include two parts, the first part is based on the cross-entropy loss function between the fused classification prediction probability and the real label of the image sample, for example, as shown in equation (4) below,
Figure PCTCN2022117043-appb-000012
Figure PCTCN2022117043-appb-000012
其中,L ce表示交叉熵损失函数,p soft(x)为对多个专家子网络的预测结果进行融合后得到的内窥镜图像分类模型的最终预测结果,
Figure PCTCN2022117043-appb-000013
为图像样本的真实标签。
Among them, L ce represents the cross-entropy loss function, p soft (x) is the final prediction result of the endoscopic image classification model obtained after fusing the prediction results of multiple expert sub-networks,
Figure PCTCN2022117043-appb-000013
is the true label of the image sample.
目标函数中的第二部分是多个专家子网络的输出的分类预测概率之间的负的KL散度。如本领域技术人员所理解的,KL散度越小,表示不同分布之间距离最接近。由于当以损失函数来进行优化时,最终的优化目标是使得损失函数最小化。因此这里通过最小化负的KL散度来增大各个专家子网络输出的分布之间的差异,例如,如下等式(5),The second part in the objective function is the negative KL divergence between the classification prediction probabilities output by multiple expert sub-networks. As understood by those skilled in the art, the smaller the KL divergence, the closer the distances between different distributions are. Since when optimizing with a loss function, the ultimate optimization goal is to minimize the loss function. Therefore, the difference between the distributions of the output of each expert sub-network is increased by minimizing the negative KL divergence, for example, the following equation (5),
Figure PCTCN2022117043-appb-000014
Figure PCTCN2022117043-appb-000014
如上等式(5)表示对第i个专家子网络的输出与其余(n-1)个专家子网络的KL散度求平均。Equation (5) above expresses the average of the KL divergence between the output of the i-th expert sub-network and the KL divergence of the remaining (n-1) expert sub-networks.
其中,
Figure PCTCN2022117043-appb-000015
in,
Figure PCTCN2022117043-appb-000015
定义所有专家子网络的散度损失函数如图(6)所示:The divergence loss function defining all expert subnetworks is shown in Figure (6):
Figure PCTCN2022117043-appb-000016
Figure PCTCN2022117043-appb-000016
其中,n表示多个专家子网络的数量,θ i表示第i个专家子网络的参数,c为标签类别数。 Among them, n represents the number of multiple expert sub-networks, θi represents the parameters of the i-th expert sub-network, and c is the number of label categories.
因此,可以定义根据本公开一个实施例的内窥镜图像分类模型的训练方法的总的损失函数,如下等式(7)所示。Therefore, the total loss function of the training method of the endoscopic image classification model according to one embodiment of the present disclosure can be defined, as shown in the following equation (7).
Figure PCTCN2022117043-appb-000017
Figure PCTCN2022117043-appb-000017
基于上述总的损失函数可以对本公开实施例的内窥镜图像分类模型进行参数调整,以使得随着迭代训练的继续,总损失函数最小化,以得到训练完成的内窥镜图像分类模型。Based on the above total loss function, the parameters of the endoscope image classification model of the embodiment of the present disclosure can be adjusted, so that as the iterative training continues, the total loss function is minimized to obtain a trained endoscope image classification model.
本公开实施例基于多专家共同决策的方式,以多专家融合的最终结果最接近真实标签、并且多个专家的输出结果之间的分布距离最大化作为训练的目标,使得训练好的内窥镜图像分类模型能够适应于数据分布并且可以同时提升头尾部预测的准确性。The embodiment of the present disclosure is based on the way of multi-expert joint decision-making, and the final result of multi-expert fusion is closest to the real label, and the distribution distance between the output results of multiple experts is maximized as the training goal, so that the trained endoscope The image classification model can adapt to the data distribution and can improve the accuracy of head and tail prediction simultaneously.
此外,由于专家子网络的数量较多、模型较为复杂,本公开进一步基于知识蒸馏的方式来将多个专家子网络构成的内窥镜图像分类模型结构进行压缩,使其集成为更简洁的学生网络。In addition, due to the large number of expert sub-networks and the relatively complex models, this disclosure further compresses the endoscopic image classification model structure composed of multiple expert sub-networks based on knowledge distillation, making it integrated into a more concise student network.
图7A示出了根据本公开另一个实施例的融合了知识蒸馏的内窥镜图像分类模型700A的示意性图。FIG. 7A shows a schematic diagram of an endoscopic image classification model 700A incorporating knowledge distillation according to another embodiment of the present disclosure.
如图7A所示,根据本公开实施例的融合了知识蒸馏的内窥镜图像分类模型700A包括两个子网络,分别是教师网络703A和学生网络705A。As shown in FIG. 7A , an endoscopic image classification model 700A incorporating knowledge distillation according to an embodiment of the present disclosure includes two sub-networks, namely a teacher network 703A and a student network 705A.
例如,这里的教师网络703A可以是图5A所描述的内窥镜图像分类模型500A中的多个专家子网络。这里的学生网络705A可以具有与每个专家子网络相同的结构。For example, the teacher network 703A here may be a plurality of expert sub-networks in the endoscopic image classification model 500A described in FIG. 5A . The student network 705A here may have the same structure as each expert sub-network.
本公开实施例设计了一个与多专家子网络相同结构的学生网络705A,基于知识蒸馏的原理,利用多个专家子网络作为教师网络来训练该学生网络,使得最终得到一个训练好的学生网络,其相对于原来的多专家的网络结构更为简洁、参数量更少,且同时能实现和多专家分类网络接近的准确率。In the embodiment of the present disclosure, a student network 705A with the same structure as the multi-expert sub-network is designed. Based on the principle of knowledge distillation, multiple expert sub-networks are used as the teacher network to train the student network, so that a trained student network is finally obtained. Compared with the original multi-expert network structure, it has a simpler structure and fewer parameters, and at the same time can achieve an accuracy rate close to that of the multi-expert classification network.
同样地,由于在图7A中的教师网络703A中的每个专家子网络以及学生网络都需要从原始图片开始,先基于网络的较浅的层次来提取浅层特征表示,再基于更深层次的网络结构来提取具有特异性的更深层次的特征表示。事实上,由于浅层特征表示对分类影响不大,为了进一步简化模型复杂度,在根据本公开实施例的融合了知识蒸馏的内窥镜图像分类模型700A的一个变型中,教师网络和学生网络可以共享同一个浅层特征提取器,再基于深层特征提取器来进行进一步地学习特异性的深层特征,以进行分类任务。如图7B所示,图7B示出了根据本公开另一个实施例的融合了知识蒸馏的内窥镜图像分类模型700B的示意性图。Similarly, since each expert sub-network and student network in the teacher network 703A in Figure 7A needs to start from the original picture, first extract shallow feature representations based on the shallower layers of the network, and then extract the shallow feature representation based on the deeper layers of the network structure to extract deeper feature representations with specificity. In fact, since the shallow feature representation has little effect on classification, in order to further simplify the model complexity, in a modification of the knowledge distillation-fused endoscopic image classification model 700A according to an embodiment of the present disclosure, the teacher network and the student network It is possible to share the same shallow feature extractor, and then further learn specific deep features based on the deep feature extractor for classification tasks. As shown in FIG. 7B , FIG. 7B shows a schematic diagram of an endoscopic image classification model 700B incorporating knowledge distillation according to another embodiment of the present disclosure.
如图7B所示,融合了知识蒸馏的内窥镜图像分类模型700B除了包括一个教师网络703B和一个学生网络705B以外,还包括一个共享子网络701B。As shown in FIG. 7B , the endoscopic image classification model 700B incorporating knowledge distillation includes a shared sub-network 701B in addition to a teacher network 703B and a student network 705B.
参考图5B所述,这里的教师网络703B可以是图5B所描述的构成内窥镜图像分类模型500B的多个专家子网络。教师网络703B和学生网络705B 都连接到一个共享子网络701B,并基于该共享子网络701B提取的浅层特征表示进行进一步深度特征提取以执行分类任务。As described with reference to FIG. 5B , the teacher network 703B here may be a plurality of expert sub-networks constituting the endoscopic image classification model 500B described in FIG. 5B . Both the teacher network 703B and the student network 705B are connected to a shared sub-network 701B, and further deep feature extraction is performed based on the shallow feature representation extracted by the shared sub-network 701B to perform classification tasks.
可替代地,这里的共享子网络701B中的浅层特征提取器和多个专家子网络中的深层特征提取器也可以是其他用于提取图像特征的任何合适的特征提取器。Alternatively, the shallow feature extractor in the shared subnetwork 701B and the deep feature extractors in the multiple expert subnetworks here may also be any other suitable feature extractors for extracting image features.
图7C示出了以Transformer作为特征提取器的一个示例性的融合了知识蒸馏的内窥镜图像分类模型700C。例如,这里的共享子网络701C可以是一个Vision Transformer,其包括一个线性映射器层、一个位置编码器层和一个传统的Transformer编码器块。教师网络703C中的这些专家子网络和学生网络705C可以共享这个共同的浅层特征提取器(即,共享子网络701C)来获取共同的浅层特征,并基于多层(例如,图7C中示出为3层,也可以是其他层数,本公开在此不做限制)的传统Transformer编码器块作为深层特征提取器,提取具有特异性的深层特征,以便用于进行分类识别,如图7C所示。FIG. 7C shows an exemplary endoscopic image classification model 700C incorporating knowledge distillation using Transformer as a feature extractor. For example, the shared sub-network 701C here can be a Vision Transformer, which includes a linear mapper layer, a position encoder layer and a traditional Transformer encoder block. These expert sub-networks in teacher network 703C and student network 705C can share this common shallow feature extractor (i.e., shared sub-network 701C) to obtain common shallow features, and based on multiple layers (e.g., shown in FIG. 7C The traditional Transformer encoder block is used as a deep feature extractor to extract specific deep features for classification and recognition, as shown in Figure 7C shown.
图8示出了用于训练根据本公开一个实施例的融合了知识蒸馏的内窥镜图像分类模型的方法800的流程图。FIG. 8 shows a flowchart of a method 800 for training an endoscopic image classification model incorporating knowledge distillation according to one embodiment of the present disclosure.
首先,在步骤S801中,将所述训练图像样本集中的图像样本输入到所述教师网络的多个专家子网络中的每一个中以及学生网络中。First, in step S801, the image samples in the training image sample set are input into each of the plurality of expert sub-networks of the teacher network and into the student network.
例如,这里的融合了知识蒸馏的内窥镜图像分类模型可以是图7A所示的模型700A。For example, the endoscopic image classification model fused with knowledge distillation here may be the model 700A shown in FIG. 7A .
作为一个替代实施例,在基于图7B所示的融合了知识蒸馏的内窥镜图像分类模型700B来进行分类训练的情况下,可以首先经过一个共享子网络提取该图像样本的浅层特征,随后将该图像样本的这些浅层特征(而不是直接将原始图像样本)输入到所述多个专家子网络中的每一个中以及学生网络中,这些专家子网络中和学生网络进一步利用深层特征提取器来提取更具有特异性的深层特征。As an alternative embodiment, in the case of performing classification training based on the endoscope image classification model 700B fused with knowledge distillation shown in FIG. These shallow features of the image sample (instead of directly feeding the original image sample) are fed into each of the plurality of expert sub-networks and the student network, which further utilize deep feature extraction to extract more specific deep features.
接着,在步骤S803中,利用所述多个专家子网络,生成针对所述图像样本的相应的多个专家子网络输出结果,以及利用所述学生网络生成针对所述图像样本的相应的学生网络输出结果。这里的网络输出结果的生成过程和图6B的步骤S603_2相似,在此将省略其重复描述。Next, in step S803, use the multiple expert sub-networks to generate corresponding output results of multiple expert sub-networks for the image sample, and use the student network to generate a corresponding student network for the image sample Output the result. The process of generating the network output result here is similar to step S603_2 in FIG. 6B , and its repeated description will be omitted here.
在步骤S805中,基于所述多个专家子网络输出结果,生成所述教师网络的最终输出结果。这里生成所述教师网络的最终输出结果的过程和图6B 的步骤S603_3相似,在此将省略其重复描述。In step S805, a final output result of the teacher network is generated based on the output results of the plurality of expert sub-networks. The process of generating the final output result of the teacher network here is similar to step S603_3 in FIG. 6B , and its repeated description will be omitted here.
在步骤S807中,通过目标损失函数来计算损失值,并基于所述损失值调整所述融合了知识蒸馏的内窥镜图像分类模型的参数。In step S807, a loss value is calculated through a target loss function, and parameters of the endoscopic image classification model fused with knowledge distillation are adjusted based on the loss value.
如上所述,内窥镜图像分类模型500A、500B或500C的优化的目标有两个,一个目标是1)多专家融合的最终结果更接近真实标签,另一个目标是2)使得多个专家的输出结果之间的分布距离最大化,以使得多个专家能够关注于数据不同的分布。这里的融合了知识蒸馏的内窥镜图像分类模型的训练方法800将模型500A、500B或500C作为教师网络,基于知识蒸馏的方式来训练出在结构和参数上都较为精简的学生网络。因此,这里的融合了知识蒸馏的内窥镜图像分类模型的训练方法800的目标除了上面那两个目标1)和2)以外,还期望达到如下进一步的两个目标:3)使得学生网络的输出结果更加接近教师网络的输出结果,以及4)使得学生网络的输出分布更接近于教师网络中各个专家子网络的输出结果的分布。As mentioned above, there are two goals for the optimization of the endoscope image classification model 500A, 500B or 500C. One goal is 1) the final result of multi-expert fusion is closer to the real label, and the other goal is 2) to make the multi-expert The distribution distance between output results is maximized so that multiple experts can focus on different distributions of data. Here, the training method 800 for an endoscope image classification model incorporating knowledge distillation uses the model 500A, 500B or 500C as a teacher network, and trains a student network with a relatively simplified structure and parameters based on knowledge distillation. Therefore, in addition to the above two goals 1) and 2), the goal of the training method 800 of the endoscopic image classification model fused with knowledge distillation here is also expected to achieve the following two further goals: 3) Make the student network The output results are closer to the output results of the teacher network, and 4) make the output distribution of the student network closer to the distribution of the output results of each expert sub-network in the teacher network.
本公开实施例基于上面的目标1)和2),构造出了教师网络的损失函数(8):The embodiment of the present disclosure constructs the loss function (8) of the teacher network based on the above objectives 1) and 2):
Figure PCTCN2022117043-appb-000018
Figure PCTCN2022117043-appb-000018
这里的
Figure PCTCN2022117043-appb-000019
就是前面参考图6B所描述的基于多个专家子网络的输出结果融合后的教师网络的最终输出结果(例如分类预测概率)与图像样本的真实标签之间的交叉熵损失函数,
Figure PCTCN2022117043-appb-000020
就是前面参考图6B所描述的多个专家子网络输出结果的散度损失函数。
here
Figure PCTCN2022117043-appb-000019
It is the cross-entropy loss function between the final output result (such as classification prediction probability) of the teacher network based on the fusion of the output results of multiple expert sub-networks described above with reference to FIG. 6B and the real label of the image sample,
Figure PCTCN2022117043-appb-000020
It is the divergence loss function of the output results of the multiple expert sub-networks described above with reference to FIG. 6B .
本公开实施例基于上面的目标3)和4),构造出了学生网络的损失函数,如下等式(9)所示:The embodiment of the present disclosure constructs the loss function of the student network based on the above objectives 3) and 4), as shown in the following equation (9):
Figure PCTCN2022117043-appb-000021
Figure PCTCN2022117043-appb-000021
其中,p soft是教师网络的最终输出的分类预测概率,
Figure PCTCN2022117043-appb-000022
是学生网络输出的分类预测概率。
Figure PCTCN2022117043-appb-000023
表示学生网络输出的分类预测概率与教师网络输出的最终的分类预测概率之间的交叉熵损失函数。
Figure PCTCN2022117043-appb-000024
是教师网络中每个专家子网络输出的logits,n是教师网络中专家子网络的数量,
Figure PCTCN2022117043-appb-000025
是学生网络输出的logits,本领域技术人员应当了解,将该logits通过softmax归一化后,便可得到所预测的分类的概率分布。
where p soft is the classification prediction probability of the final output of the teacher network,
Figure PCTCN2022117043-appb-000022
is the class prediction probability output by the student network.
Figure PCTCN2022117043-appb-000023
Represents the cross-entropy loss function between the classification prediction probability output by the student network and the final classification prediction probability output by the teacher network.
Figure PCTCN2022117043-appb-000024
is the logits output by each expert sub-network in the teacher network, n is the number of expert sub-networks in the teacher network,
Figure PCTCN2022117043-appb-000025
is the logits output by the student network, and those skilled in the art should understand that after the logits are normalized by softmax, the probability distribution of the predicted classification can be obtained.
Figure PCTCN2022117043-appb-000026
是学生网络输出的分布与教师网络中多个专家子网络的多个输出之间的KL散度。
Figure PCTCN2022117043-appb-000026
is the KL divergence between the distribution of the output of the student network and the multiple outputs of multiple expert sub-networks in the teacher network.
因此,可以定义根据本公开一个实施例的融合了知识蒸馏的内窥镜图像分类模型的训练方法的总的损失函数,如下等式(10)所示。Therefore, the total loss function of the training method of the endoscopic image classification model incorporating knowledge distillation according to an embodiment of the present disclosure can be defined, as shown in the following equation (10).
Figure PCTCN2022117043-appb-000027
Figure PCTCN2022117043-appb-000027
其中,α为权重参数,在开始的阶段,将其设置为1,并随着训练过程进行逐渐减小,最终降为0。Among them, α is the weight parameter, which is set to 1 in the initial stage, and gradually decreases with the training process, and finally drops to 0.
基于上述总的损失函数可以对本公开实施例的融合了知识蒸馏的内窥镜图像分类模型进行参数调整,以使得随着迭代训练的继续,总损失函数最小化,从而得到训练完成的融合了知识蒸馏的内窥镜图像分类模型。该训练完成的融合了知识蒸馏的内窥镜图像分类模型中,学生网络参数量较小、模型结构相对简单,并且能够达到接近复杂的教师网络的预测准确度,因此便可直接仅基于训练后的学生网络来进行后续的分类应用。Based on the above-mentioned total loss function, the parameters of the endoscopic image classification model fused with knowledge distillation can be adjusted in the embodiment of the present disclosure, so that as the iterative training continues, the total loss function is minimized, so as to obtain the fused knowledge after training. A Distilled Endoscopy Image Classification Model. In the endoscopic image classification model integrated with knowledge distillation completed by this training, the number of student network parameters is small, the model structure is relatively simple, and the prediction accuracy close to that of the complex teacher network can be achieved, so it can be directly based on the training student network for subsequent classification applications.
基于通过如上方式训练好的学生网络,本公开实施例还提供了一种内窥镜图像分类方法。参考图9来描述本公开实施例中内窥镜图像分类方法的流程图,该方法包括:Based on the student network trained in the above manner, an embodiment of the present disclosure also provides a method for classifying endoscopic images. Referring to FIG. 9 to describe the flow chart of the endoscopic image classification method in the embodiment of the present disclosure, the method includes:
在步骤S901中,获取待识别的内窥镜图像。In step S901, an endoscopic image to be identified is acquired.
例如,如果训练的图像分类模型是针对息肉类型识别训练的,获取的待识别的内窥镜图像即是采集到的息肉影像。For example, if the trained image classification model is trained for polyp type identification, the acquired endoscopic image to be identified is the acquired polyp image.
在步骤S903中,将所述待识别的内窥镜图像输入到训练好的内窥镜图像分类模型中,以获得所述内窥镜图像的分类结果。In step S903, the endoscopic image to be recognized is input into a trained endoscopic image classification model to obtain a classification result of the endoscopic image.
例如,这里的内窥镜图像分类模型可以是针对上述方法训练完成的内窥镜图像分类模型500A、500B或500C。For example, the endoscope image classification model here may be the endoscope image classification model 500A, 500B or 500C trained for the above method.
例如,可替代地,若训练好的内窥镜图像分类模型是图5B所示的模型,则可以先将该待识别的内窥镜图像输入到训练好的融合了知识蒸馏的内窥镜图像分类模型中的共享子网络以提取浅层特征,再将该浅层特征输入到该训练好的内窥镜图像分类模型中。For example, alternatively, if the trained endoscopic image classification model is the model shown in Figure 5B, the endoscopic image to be recognized can be input to the trained endoscopic image fused with knowledge distillation The sub-networks in the classification model are shared to extract shallow features, which are then fed into the trained endoscopic image classification model.
例如,可替代地,训练完成的是融合了知识蒸馏的内窥镜图像分类模型,例如上述融合了知识蒸馏的内窥镜图像分类模型700A、700B或700C。由于学生网络参数量较小、模型结构相对简单,并且能够达到接近复杂的教师网络的预测准确度,因此可以直接将待识别的内窥镜图像输入到训练好的融合了知识蒸馏的内窥镜图像分类模型中的学生网络中。For example, alternatively, what is trained is an endoscope image classification model fused with knowledge distillation, such as the above-mentioned endoscope image classification model 700A, 700B or 700C fused with knowledge distillation. Due to the small number of parameters of the student network, the relatively simple model structure, and the ability to achieve prediction accuracy close to that of the complex teacher network, the endoscopic image to be recognized can be directly input to the trained endoscopic image that incorporates knowledge distillation. in the student network in an image classification model.
基于上述实施例,参阅图10所示,为本公开实施例中一种内窥镜图像分 类系统1000的结构示意图。该内窥镜图像分类系统1000至少包括图像获取部件1001、处理部件1002和输出部件1003。本公开实施例中,图像获取部件1001、处理部件1002和输出部件1003为相关的医疗器械,可以集成在同一医疗器械中,也可以分为多个设备,相互连接通信,组成一个医疗系统来使用等,例如针对消化道疾病诊断,图像获取部件1001可以为内镜,处理部件1002和输出部件1003可以为与内镜相通信的计算机设备等。Based on the above-mentioned embodiments, refer to FIG. 10 , which is a schematic structural diagram of an endoscope image classification system 1000 in an embodiment of the present disclosure. The endoscopic image classification system 1000 at least includes an image acquisition unit 1001 , a processing unit 1002 and an output unit 1003 . In the embodiment of the present disclosure, the image acquisition unit 1001, the processing unit 1002, and the output unit 1003 are related medical devices, which can be integrated into the same medical device, or can be divided into multiple devices, connected and communicated with each other to form a medical system for use etc. For example, for the diagnosis of digestive tract diseases, the image acquisition unit 1001 can be an endoscope, and the processing unit 1002 and the output unit 1003 can be a computer device in communication with the endoscope, etc.
具体地,图像获取部件1001用于获取待识别图像。处理部件1002例如用于执行图9所示的方法步骤,提取待识别图像的图像特征信息,并基于待识别的图像的特征信息获得待识别图像的病变分类结果。输出部件1003用于输出待识别图像的分类结果。Specifically, the image acquiring component 1001 is used to acquire an image to be recognized. The processing component 1002 is, for example, configured to execute the method steps shown in FIG. 9 , extract image feature information of the image to be recognized, and obtain a lesion classification result of the image to be recognized based on the feature information of the image to be recognized. The output unit 1003 is used to output the classification result of the image to be recognized.
图11示出了根据本公开实施例的内窥镜图像分类模型的训练装置1100,具体包括训练数据集获取部件1101和训练部件1103。FIG. 11 shows a training device 1100 for an endoscope image classification model according to an embodiment of the present disclosure, which specifically includes a training data set acquisition component 1101 and a training component 1103 .
训练数据集获取部件1101用于:获取训练数据集,所述训练数据集包括多个内窥镜影像图像以及所述多个内窥镜影像图像的标注标签,其中所述训练数据集呈现长尾分布;以及训练部件1103,用于基于所述训练数据集对所述内窥镜图像分类模型进行训练,直到所述内窥镜图像分类模型的目标损失函数收敛,以获得训练完成的内窥镜图像分类模型。The training data set acquisition component 1101 is used to: acquire a training data set, the training data set includes a plurality of endoscopic image images and label tags of the plurality of endoscopic image images, wherein the training data set presents a long tail distribution; and a training component 1103, configured to train the endoscope image classification model based on the training data set until the target loss function of the endoscope image classification model converges, so as to obtain a trained endoscope Image classification model.
例如,其中,所述目标损失函数是至少基于所述多个专家子网络的相应多个输出结果来确定的。For example, wherein the target loss function is determined based at least on the corresponding multiple output results of the multiple expert sub-networks.
例如,所述训练部件1103包括:输入子部件1103_1,用于将所述训练图像样本集中的图像样本输入到所述多个专家子网络中的每一个中;输出结果生成子部件1103_2,利用所述多个专家子网络,生成针对所述图像样本的相应的多个专家子网络输出结果;基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果;以及损失函数计算子部件1103_3,基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值;以及参数调整子部件1103_4,基于所述损失值调整所述内窥镜图像分类模型的参数。For example, the training component 1103 includes: an input subcomponent 1103_1, which is used to input image samples in the training image sample set into each of the plurality of expert sub-networks; an output result generation subcomponent 1103_2, which utilizes the The plurality of expert sub-networks, generating a corresponding plurality of expert sub-network output results for the image sample; based on the plurality of expert sub-network output results, generating a final output result of the endoscopic image classification model; and The loss function calculation subcomponent 1103_3 calculates a loss value through a target loss function based on at least the output results of the plurality of expert subnetworks and the final output result; and the parameter adjustment subcomponent 1103_4 adjusts the internal Parameters for the looking-glass image classification model.
例如,所述内窥镜图像分类模型还包括共享子网络,其中,所述训练部件1103包括:输入子部件1103_1,将所述训练图像样本集中的图像样本输入到所述共享子网络中以提取浅层特征表示;输出结果生成子部件1103_2,基于所提取的浅层特征表示,利用所述多个专家子网络生成针对所述图像样 本的相应的多个专家子网络输出结果;基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果;以及损失函数计算子部件1103_3,基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值;以及参数调整子部件1103_4,基于所述损失值调整所述内窥镜图像分类模型的参数。For example, the endoscope image classification model also includes a shared subnetwork, wherein the training component 1103 includes: an input subcomponent 1103_1, which inputs image samples in the training image sample set into the shared subnetwork to extract Shallow feature representation; output result generating subcomponent 1103_2, based on the extracted shallow feature representation, using the multiple expert sub-networks to generate corresponding output results of multiple expert sub-networks for the image sample; based on the multiple output results of a plurality of expert sub-networks to generate the final output results of the endoscopic image classification model; and loss function calculation subcomponent 1103_3, based on at least the output results of the plurality of expert sub-networks and the final output results, through the target loss function to calculate a loss value; and a parameter adjustment subcomponent 1103_4, which adjusts the parameters of the endoscopic image classification model based on the loss value.
例如,其中,所述内窥镜图像分类模型的目标损失函数包括:基于所述内窥镜图像分类模型的最终输出结果与图像样本的标注标签而确定的交叉熵损失函数,以及基于所述多个子专家网络输出结果而确定的KL散度。For example, wherein, the target loss function of the endoscope image classification model includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeled labels of image samples, and based on the multiple The KL divergence determined by the output of the sub-expert network.
例如,其中,所述输出结果生成子部件1103_2将所述多个专家子网络输出结果进行融合,以作为所述内窥镜图像分类模型的最终输出结果。For example, the output result generating subcomponent 1103_2 fuses the output results of the plurality of expert sub-networks as the final output result of the endoscope image classification model.
例如,其中,所述输出结果生成子部件1103_2将所述多个专家子网络输出结果进行融合包括:对所述多个专家子网络输出结果进行加权平均。For example, where the output result generating subcomponent 1103_2 fuses the output results of the multiple expert sub-networks includes: performing weighted average on the output results of the multiple expert sub-networks.
例如,所述内窥镜图像分类模型还包括与所述专家子网络具有相同结构的学生网络,其中,所述多个专家子网络构成教师网络,基于知识蒸馏利用所述教师网络来训练所述学生网络,所述输出结果生成子部件1103_2进一步利用所述学生网络来生成针对所述图像样本的相应的学生网络输出结果。For example, the endoscopic image classification model further includes a student network with the same structure as the expert sub-network, wherein the plurality of expert sub-networks constitute a teacher network, and the teacher network is used to train the A student network, the output result generating subcomponent 1103_2 further utilizes the student network to generate a corresponding student network output result for the image sample.
例如,所述损失函数计算子部件1103_3基于所述多个专家子网络输出结果、所述最终输出结果以及所述学生网络输出结果,通过目标损失函数来计算损失值,并且参数调整子部件1103_4基于所述损失值调整所述内窥镜图像分类模型的参数。For example, the loss function calculation subcomponent 1103_3 calculates the loss value through the target loss function based on the output results of the plurality of expert subnetworks, the final output result and the output result of the student network, and the parameter adjustment subcomponent 1103_4 is based on The loss value adjusts parameters of the endoscopic image classification model.
例如,其中,所述目标损失函数是所述教师网络的损失函数和所述学生网络的损失函数的加权和。For example, the target loss function is a weighted sum of the loss function of the teacher network and the loss function of the student network.
例如,其中,所述教师网络的损失函数的权重值和所述学生网络的损失函数的权重值之和为1,并且其中所述教师网络的损失函数的权重值随着训练的迭代而不断减小,直到最终减小为0,所述学生网络的损失函数的权重值随着训练的迭代而不断增加,直到最终增加为1。For example, wherein, the sum of the weight value of the loss function of the teacher network and the weight value of the loss function of the student network is 1, and wherein the weight value of the loss function of the teacher network decreases continuously with the training iterations small until it finally decreases to 0, and the weight value of the loss function of the student network increases continuously with training iterations until it finally increases to 1.
例如,其中,所述教师网络的损失函数包括:基于所述内窥镜图像分类模型的最终输出结果与图像样本的标注标签而确定的交叉熵损失函数,以及基于所述多个子专家网络输出结果而确定的KL散度,所述学生网络的损失函数包括:基于所述学生网络的学生网络输出结果与所述内窥镜图像分类模型的最终输出结果而确定的交叉熵损失函数,以及基于所述学生网络的学生 网络输出结果与所述多个专家子网络所生成的多个专家子网络输出结果所确定的KL散度。For example, wherein the loss function of the teacher network includes: a cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeling labels of the image samples, and a cross-entropy loss function based on the output results of the multiple sub-expert networks And the determined KL divergence, the loss function of the student network includes: a cross-entropy loss function determined based on the output result of the student network of the student network and the final output result of the endoscope image classification model, and based on the The KL divergence determined by the output results of the student network of the student network and the output results of the plurality of expert sub-networks generated by the plurality of expert sub-networks.
例如,其中,所述共享子网络包括Vision Transformer,所述多个专家子网络中的每一个包括多层依次连接的Transformer编码器,以及一个分类器。For example, wherein the shared sub-network includes a Vision Transformer, each of the plurality of expert sub-networks includes a multi-layer Transformer encoder connected in sequence, and a classifier.
基于上述实施例,本公开实施例中还提供了另一示例性实施方式的电子设备。在一些可能的实施方式中,本公开实施例中电子设备可以包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行程序时可以实现上述实施例中内窥镜图像分类模型训练方法或内窥镜图像识别方法的步骤。Based on the above-mentioned embodiments, an electronic device in another exemplary embodiment is also provided in the embodiments of the present disclosure. In some possible implementation manners, the electronic device in the embodiments of the present disclosure may include a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the above embodiments may be implemented. Steps of a method for training an endoscopic image classification model or a method for endoscopic image recognition.
例如,以电子设备为本公开图1中的服务器100为例进行说明,则该电子设备中的处理器即为服务器100中的处理器110,该电子设备中的存储器即为服务器100中的存储器120。For example, if the electronic device is the server 100 in FIG. 120.
本公开的实施例还提供了一种计算机可读存储介质。图12示出了根据本公开的实施例的存储介质的示意图1200。如图12所示,所述计算机可读存储介质1200上存储有计算机可执行指令1201。当所述计算机可执行指令1201由处理器运行时,可以执行参照以上附图描述的根据本公开实施例的融合了知识蒸馏的内窥镜图像分类模型的训练方法和内窥镜图像分类方法。所述计算机可读存储介质包括但不限于例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。Embodiments of the present disclosure also provide a computer-readable storage medium. FIG. 12 shows a schematic diagram 1200 of a storage medium according to an embodiment of the disclosure. As shown in FIG. 12 , computer-executable instructions 1201 are stored on the computer-readable storage medium 1200 . When the computer-executable instructions 1201 are executed by the processor, the training method of the endoscopic image classification model incorporating knowledge distillation and the endoscopic image classification method according to the embodiments of the present disclosure described with reference to the above figures can be executed. The computer readable storage medium includes, but is not limited to, for example, volatile memory and/or nonvolatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
本公开的实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行根据本公开实施例的融合了知识蒸馏的内窥镜图像分类模型的训练方法和内窥镜图像分类方法。Embodiments of the present disclosure also provide a computer program product or computer program, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the training method of the endoscopic image classification model incorporating knowledge distillation according to an embodiment of the present disclosure and Classification methods for endoscopic images.
本领域技术人员能够理解,本公开所披露的内容可以出现多种变型和改进。例如,以上所描述的各种设备或组件可以通过硬件实现,也可以通过软件、固件、或者三者中的一些或全部的组合实现。Those skilled in the art can understand that the content disclosed in the present disclosure can be modified and improved in many ways. For example, the various devices or components described above may be implemented by hardware, software, firmware, or a combination of some or all of the three.
此外,虽然本公开对根据本公开的实施例的系统中的某些单元做出了各种引用,然而,任何数量的不同单元可以被使用并运行在客户端和/或服务器 上。所述单元仅是说明性的,并且所述系统和方法的不同方面可以使用不同单元。Furthermore, although this disclosure makes various references to certain units in a system according to embodiments of the present disclosure, any number of different units may be used and run on the client and/or server. The elements described are illustrative only, and different aspects of the systems and methods may use different elements.
本领域普通技术人员可以理解上述方法中的全部或部分的步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本公开并不限制于任何特定形式的硬件和软件的结合。Those of ordinary skill in the art can understand that all or part of the steps in the above method can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, and the like. Optionally, all or part of the steps in the foregoing embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, or may be implemented in the form of software function modules. This disclosure is not limited to any specific form of combination of hardware and software.
除非另有定义,这里使用的所有术语(包括技术和科学术语)具有与本公开所属领域的普通技术人员共同理解的相同含义。还应当理解,诸如在通常字典里定义的那些术语应当被解释为具有与它们在相关技术的上下文中的含义相一致的含义,而不应用理想化或极度形式化的意义来解释,除非这里明确地这样定义。Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It should also be understood that terms such as those defined in common dictionaries should be interpreted as having meanings consistent with their meanings in the context of the relevant technology, and should not be interpreted in idealized or extremely formalized meanings, unless explicitly stated herein defined in this way.
以上是对本公开的说明,而不应被认为是对其的限制。尽管描述了本公开的若干示例性实施例,但本领域技术人员将容易地理解,在不背离本公开的新颖教学和优点的前提下可以对示例性实施例进行许多修改。因此,所有这些修改都意图包含在权利要求书所限定的本公开范围内。应当理解,上面是对本公开的说明,而不应被认为是限于所公开的特定实施例,并且对所公开的实施例以及其他实施例的修改意图包含在所附权利要求书的范围内。本公开由权利要求书及其等效物限定。The above is an illustration of the present disclosure and should not be considered as a limitation thereof. Although a few example embodiments of this disclosure have been described, those skilled in the art will readily appreciate that many modifications are possible in the example embodiments without departing from the novel teachings and advantages of this disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims. It is to be understood that the above is a description of the disclosure and should not be considered limited to the particular embodiments disclosed and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be within the scope of the appended claims. The disclosure is defined by the claims and their equivalents.

Claims (19)

  1. 一种基于多专家决策的内窥镜图像分类模型的训练方法,其中所述内窥镜图像分类模型包括多个专家子网络,所述方法包括:A training method of an endoscopic image classification model based on multi-expert decision-making, wherein the endoscopic image classification model includes a plurality of expert sub-networks, the method comprising:
    获取训练数据集,所述训练数据集包括多个内窥镜影像图像以及所述多个内窥镜影像图像的标注标签,其中所述训练数据集呈现长尾分布;以及Obtain a training data set, the training data set includes a plurality of endoscopic image images and annotation labels of the plurality of endoscopic image images, wherein the training data set presents a long-tailed distribution; and
    基于所述训练数据集对所述内窥镜图像分类模型进行训练,直到所述内窥镜图像分类模型的目标损失函数收敛,以获得训练完成的内窥镜图像分类模型,The endoscope image classification model is trained based on the training data set until the target loss function of the endoscope image classification model converges to obtain a trained endoscope image classification model,
    其中,所述目标损失函数是至少基于所述多个专家子网络的相应多个输出结果来确定的。Wherein, the target loss function is determined based at least on the corresponding multiple output results of the multiple expert sub-networks.
  2. 根据权利要求1所述的方法,其中,基于所述训练数据集对所述内窥镜图像分类模型进行训练包括:The method according to claim 1, wherein training the endoscopic image classification model based on the training data set comprises:
    将所述训练图像样本集中的图像样本输入到所述多个专家子网络中的每一个中;inputting image samples from the training image sample set into each of the plurality of expert sub-networks;
    利用所述多个专家子网络,生成针对所述图像样本的相应的多个专家子网络输出结果;using the plurality of expert sub-networks to generate corresponding plurality of expert sub-network outputs for the image samples;
    基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果;以及generating a final output of the endoscopic image classification model based on the plurality of expert sub-network outputs; and
    基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值,并基于所述损失值调整所述内窥镜图像分类模型的参数。Based on at least the plurality of expert sub-network outputs and the final output, a loss value is calculated by a target loss function, and parameters of the endoscopic image classification model are adjusted based on the loss value.
  3. 根据权利要求1所述的所述的方法,其中,所述内窥镜图像分类模型还包括共享子网络,其中,基于所述训练数据集对所述内窥镜图像分类模型进行训练包括:The method according to claim 1, wherein the endoscopic image classification model further comprises a shared sub-network, wherein training the endoscopic image classification model based on the training data set comprises:
    将所述训练图像样本集中的图像样本输入到所述共享子网络中以提取浅层特征表示;inputting image samples in the training image sample set into the shared sub-network to extract shallow feature representations;
    基于所提取的浅层特征表示,利用所述多个专家子网络生成针对所述图像样本的相应的多个专家子网络输出结果;using the plurality of expert sub-networks to generate corresponding plurality of expert sub-network outputs for the image sample based on the extracted shallow feature representations;
    基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果;以及generating a final output of the endoscopic image classification model based on the plurality of expert sub-network outputs; and
    基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值,并基于所述损失值调整所述内窥镜图像分类模型的参数。Based on at least the plurality of expert sub-network outputs and the final output, a loss value is calculated by a target loss function, and parameters of the endoscopic image classification model are adjusted based on the loss value.
  4. 根据权利要求2或3所述的方法,其中,所述内窥镜图像分类模型的目标损失函数包括:基于所述内窥镜图像分类模型的最终输出结果与图像样本的标注标签而确定的交叉熵损失函数,以及基于所述多个子专家网络输出结果而确定的Kullback Leibler散度。The method according to claim 2 or 3, wherein the target loss function of the endoscopic image classification model comprises: an intersection determined based on the final output result of the endoscopic image classification model and the labeled label of the image sample An entropy loss function, and a Kullback Leibler divergence determined based on the output results of the plurality of sub-expert networks.
  5. 根据权利要求2-4任一项所述的方法,其中,基于所述多个专家子网络输出结果,生成所述内窥镜图像分类模型的最终输出结果包括:The method according to any one of claims 2-4, wherein, based on the output results of the plurality of expert sub-networks, generating the final output result of the endoscopic image classification model comprises:
    将所述多个专家子网络输出结果进行融合,以作为所述内窥镜图像分类模型的最终输出结果。The output results of the plurality of expert sub-networks are fused as the final output result of the endoscope image classification model.
  6. 根据权利要求5所述的方法,其中,将所述多个专家子网络输出结果进行融合包括:The method according to claim 5, wherein merging the output results of the plurality of expert subnetworks comprises:
    对所述多个专家子网络输出结果进行加权平均。Weighted averaging is performed on the output results of the plurality of expert sub-networks.
  7. 根据权利要求2-6任一项所述的方法,其中,所述内窥镜图像分类模型还包括与所述专家子网络具有相同结构的学生网络,其中,所述多个专家子网络构成教师网络,基于知识蒸馏利用所述教师网络来训练所述学生网络,所述方法进一步包括:The method according to any one of claims 2-6, wherein the endoscopic image classification model further comprises a student network having the same structure as the expert sub-network, wherein the plurality of expert sub-networks constitute a teacher network, using the teacher network to train the student network based on knowledge distillation, the method further comprising:
    利用所述学生网络来生成针对所述图像样本的相应的学生网络输出结果。The student network is utilized to generate a corresponding student network output for the image sample.
  8. 根据权利要求7所述的方法,其中,基于至少所述多个专家子网络输出结果和所述最终输出结果,通过目标损失函数来计算损失值包括:The method according to claim 7, wherein, based on at least the output results of the plurality of expert sub-networks and the final output result, calculating the loss value by the target loss function comprises:
    基于所述多个专家子网络输出结果、所述最终输出结果以及所述学生网络输出结果,通过目标损失函数来计算损失值。Based on the output results of the plurality of expert sub-networks, the final output result and the output result of the student network, a loss value is calculated by an objective loss function.
  9. 根据权利要求8所述的方法,其中,所述目标损失函数是所述教师网络的损失函数和所述学生网络的损失函数的加权和。The method of claim 8, wherein the target loss function is a weighted sum of a loss function of the teacher network and a loss function of the student network.
  10. 根据权利要求9所述的方法,其中,所述教师网络的损失函数的权重值和所述学生网络的损失函数的权重值之和为1,并且其中所述教师网络的损失函数的权重值随着训练的迭代而不断减小,直到最终减小为0,所述学生网络的损失函数的权重值随着训练的迭代而不断增加,直到最终增加为1。The method according to claim 9, wherein the sum of the weight value of the loss function of the teacher network and the weight value of the loss function of the student network is 1, and wherein the weight value of the loss function of the teacher network varies with The weight value of the loss function of the student network increases continuously with the iterations of the training until it finally decreases to 1.
  11. 根据权利要求9或10所述的方法,其中,A method according to claim 9 or 10, wherein,
    所述教师网络的损失函数包括:基于所述内窥镜图像分类模型的最终输出结果与图像样本的标注标签而确定的交叉熵损失函数,以及基于所述多个子专家网络输出结果而确定的Kullback Leibler散度,The loss function of the teacher network includes: the cross-entropy loss function determined based on the final output result of the endoscope image classification model and the labeling label of the image sample, and the Kullback loss function determined based on the output results of the multiple sub-expert networks. Leibler divergence,
    所述学生网络的损失函数包括:基于所述学生网络的学生网络输出结果与所述内窥镜图像分类模型的最终输出结果而确定的交叉熵损失函数,以及基于所述学生网络的学生网络输出结果与所述多个专家子网络所生成的多个专家子网络输出结果所确定的Kullback Leibler散度。The loss function of the student network includes: a cross-entropy loss function determined based on the student network output of the student network and the final output of the endoscope image classification model, and a student network output based on the student network Kullback Leibler divergence determined by the result and the output results of the plurality of expert sub-networks generated by the plurality of expert sub-networks.
  12. 根据权利要求3所述的方法,其中,所述共享子网络包括Vision Transformer,所述多个专家子网络中的每一个包括多层依次连接的Transformer编码器,以及一个分类器。The method according to claim 3, wherein said shared sub-network comprises Vision Transformer, each of said plurality of expert sub-networks comprises a multi-layer Transformer encoder connected in sequence, and a classifier.
  13. 一种内窥镜图像分类方法,包括:A method for classifying endoscopic images, comprising:
    获取待识别的内窥镜图像;Obtain the endoscopic image to be identified;
    基于训练好的内窥镜图像分类模型,获得所述内窥镜图像的分类结果;Obtaining a classification result of the endoscopic image based on the trained endoscopic image classification model;
    其中,所述训练好的内窥镜图像分类模型是基于根据权利要求1-12中任一项所述的内窥镜图像分类模型的训练方法所获得的。Wherein, the trained endoscopic image classification model is obtained based on the training method of the endoscopic image classification model according to any one of claims 1-12.
  14. 一种内窥镜图像分类方法,包括:A method for classifying endoscopic images, comprising:
    获取待识别的内窥镜图像;Obtain the endoscopic image to be identified;
    基于训练好的内窥镜图像分类模型中的学生网络,获得所述内窥镜图像的分类结果;Based on the student network in the trained endoscope image classification model, the classification result of the endoscope image is obtained;
    其中,所述训练好的内窥镜图像分类模型基于根据权利要求7-12中任一项所述的内窥镜图像分类模型的训练方法所获得的。Wherein, the trained endoscopic image classification model is obtained based on the training method of the endoscopic image classification model according to any one of claims 7-12.
  15. 一种内窥镜图像分类系统,包括:An endoscopic image classification system comprising:
    图像获取部件,用于获取待识别的内窥镜图像;An image acquisition component, configured to acquire an endoscopic image to be identified;
    处理部件,用于基于训练好的内窥镜图像分类模型获得所述内窥镜图像的分类结果;A processing component, configured to obtain a classification result of the endoscopic image based on the trained endoscopic image classification model;
    输出部件,用于输出待识别图像的分类结果,an output component, configured to output the classification result of the image to be recognized,
    其中,所述训练好的内窥镜图像分类模型是基于根据权利要求1-12中任一项所述的内窥镜图像分类模型的训练方法所获得的。Wherein, the trained endoscopic image classification model is obtained based on the training method of the endoscopic image classification model according to any one of claims 1-12.
  16. 一种内窥镜图像分类系统,包括:An endoscopic image classification system comprising:
    图像获取部件,用于获取待识别的内窥镜图像;An image acquisition component, configured to acquire an endoscopic image to be identified;
    处理部件,用于基于训练好的内窥镜图像分类模型中的学生网络获得所述内窥镜图像的分类结果;A processing component for obtaining a classification result of the endoscopic image based on a trained student network in the endoscopic image classification model;
    输出部件,用于输出待识别图像的分类结果,an output component, configured to output the classification result of the image to be recognized,
    其中,所述训练好的内窥镜图像分类模型基于根据权利要求7-12中任一项所述的内窥镜图像分类模型的训练方法所获得的。Wherein, the trained endoscopic image classification model is obtained based on the training method of the endoscopic image classification model according to any one of claims 7-12.
  17. 一种基于多专家决策的内窥镜图像分类模型的训练装置,其中所述内窥镜图像分类模型包括多个专家子网络,所述装置包括:A training device for an endoscopic image classification model based on multi-expert decision-making, wherein the endoscopic image classification model includes a plurality of expert sub-networks, and the device includes:
    训练数据集获取部件,用于获取训练数据集,所述训练数据集包括多个内窥镜影像图像以及所述多个内窥镜影像图像的标注标签,其中所述训练数据集呈现长尾分布;以及The training data set acquisition component is used to obtain a training data set, the training data set includes a plurality of endoscopic image images and the label labels of the plurality of endoscopic image images, wherein the training data set presents a long-tail distribution ;as well as
    训练部件,用于基于所述训练数据集对所述内窥镜图像分类模型进行训练,直到所述内窥镜图像分类模型的目标损失函数收敛,以获得训练完成的内窥镜图像分类模型,a training component, configured to train the endoscopic image classification model based on the training data set until the target loss function of the endoscopic image classification model converges to obtain a trained endoscopic image classification model,
    其中,所述目标损失函数是至少基于所述多个专家子网络的相应多个输出结果来确定的。Wherein, the target loss function is determined based at least on the corresponding multiple output results of the multiple expert sub-networks.
  18. 一种电子设备,包括存储器和处理器,其中,所述存储器上存储有所述处理器可读的程序代码,当所述处理器执行所述程序代码时,执行根据权利要求1-14中任一项所述的方法。An electronic device, comprising a memory and a processor, wherein the memory stores program codes readable by the processor, and when the processor executes the program codes, any one of the methods described.
  19. 一种计算机可读存储介质,其上存储有计算机可执行指令,所述计算机可执行指令用于执行上述权利要求1-14中任一项所述的方法。A computer-readable storage medium, on which computer-executable instructions are stored, and the computer-executable instructions are used to execute the method described in any one of claims 1-14 above.
PCT/CN2022/117043 2021-09-06 2022-09-05 Training method and apparatus of endoscope image classification model, and image classification method WO2023030520A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111039189.1A CN113486990B (en) 2021-09-06 2021-09-06 Training method of endoscope image classification model, image classification method and device
CN202111039189.1 2021-09-06

Publications (1)

Publication Number Publication Date
WO2023030520A1 true WO2023030520A1 (en) 2023-03-09

Family

ID=77946539

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117043 WO2023030520A1 (en) 2021-09-06 2022-09-05 Training method and apparatus of endoscope image classification model, and image classification method

Country Status (2)

Country Link
CN (1) CN113486990B (en)
WO (1) WO2023030520A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116168255A (en) * 2023-04-10 2023-05-26 武汉大学人民医院(湖北省人民医院) Retina OCT (optical coherence tomography) image classification method with robust long tail distribution
CN116258914A (en) * 2023-05-15 2023-06-13 齐鲁工业大学(山东省科学院) Remote sensing image classification method based on machine learning and local and global feature fusion
CN116703953A (en) * 2023-03-17 2023-09-05 南通大学 Inferior mesenteric artery segmentation method based on CT image
CN116612336B (en) * 2023-07-19 2023-10-03 浙江华诺康科技有限公司 Method, apparatus, computer device and storage medium for classifying smoke in endoscopic image
CN117056678A (en) * 2023-10-12 2023-11-14 北京宝隆泓瑞科技有限公司 Machine pump equipment operation fault diagnosis method and device based on small sample
CN117197472A (en) * 2023-11-07 2023-12-08 四川农业大学 Efficient teacher and student semi-supervised segmentation method and device based on endoscopic images of epistaxis

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486990B (en) * 2021-09-06 2021-12-21 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device
CN113706526B (en) * 2021-10-26 2022-02-08 北京字节跳动网络技术有限公司 Training method and device for endoscope image feature learning model and classification model
CN113822373B (en) * 2021-10-27 2023-09-15 南京大学 Image classification model training method based on integration and knowledge distillation
CN113743384B (en) * 2021-11-05 2022-04-05 广州思德医疗科技有限公司 Stomach picture identification method and device
CN113822389B (en) * 2021-11-24 2022-02-22 紫东信息科技(苏州)有限公司 Digestive tract disease classification system based on endoscope picture
CN114464152B (en) * 2022-04-13 2022-07-19 齐鲁工业大学 Music genre classification method and system based on visual transformation network
CN115019183B (en) * 2022-07-28 2023-01-20 北京卫星信息工程研究所 Remote sensing image model migration method based on knowledge distillation and image reconstruction
CN115905533B (en) * 2022-11-24 2023-09-19 湖南光线空间信息科技有限公司 Multi-label text intelligent classification method
CN116152612B (en) * 2023-04-21 2023-08-15 粤港澳大湾区数字经济研究院(福田) Long-tail image recognition method and related device
CN117455878A (en) * 2023-11-08 2024-01-26 中国医学科学院北京协和医院 CCTA image-based coronary vulnerable plaque identification method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523522A (en) * 2018-10-30 2019-03-26 腾讯科技(深圳)有限公司 Processing method, device, system and the storage medium of endoscopic images
CN110288597A (en) * 2019-07-01 2019-09-27 哈尔滨工业大学 Wireless capsule endoscope saliency detection method based on attention mechanism
CN111666998A (en) * 2020-06-03 2020-09-15 电子科技大学 Endoscope intelligent intubation decision-making method based on target point detection
CN113034500A (en) * 2021-05-25 2021-06-25 紫东信息科技(苏州)有限公司 Digestive tract endoscope picture focus identification system based on multi-channel structure
CN113486990A (en) * 2021-09-06 2021-10-08 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN109726619A (en) * 2017-10-31 2019-05-07 深圳市祈飞科技有限公司 A kind of convolutional neural networks face identification method and system based on parameter sharing
CN108280488B (en) * 2018-02-09 2021-05-07 哈尔滨工业大学 Grippable object identification method based on shared neural network
US10643602B2 (en) * 2018-03-16 2020-05-05 Microsoft Technology Licensing, Llc Adversarial teacher-student learning for unsupervised domain adaptation
CN110059717A (en) * 2019-03-13 2019-07-26 山东大学 Convolutional neural networks automatic division method and system for breast molybdenum target data set
CN110472730A (en) * 2019-08-07 2019-11-19 交叉信息核心技术研究院(西安)有限公司 A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks
CN110680326B (en) * 2019-10-11 2022-05-06 北京大学第三医院(北京大学第三临床医学院) Pneumoconiosis identification and grading judgment method based on deep convolutional neural network
CN111062951B (en) * 2019-12-11 2022-03-25 华中科技大学 Knowledge distillation method based on semantic segmentation intra-class feature difference
CN111782937A (en) * 2020-05-15 2020-10-16 北京三快在线科技有限公司 Information sorting method and device, electronic equipment and computer readable medium
CN111695698B (en) * 2020-06-12 2023-09-12 北京百度网讯科技有限公司 Method, apparatus, electronic device, and readable storage medium for model distillation
CN112183818A (en) * 2020-09-02 2021-01-05 北京三快在线科技有限公司 Recommendation probability prediction method and device, electronic equipment and storage medium
CN112200795A (en) * 2020-10-23 2021-01-08 苏州慧维智能医疗科技有限公司 Large intestine endoscope polyp detection method based on deep convolutional network
CN112862095B (en) * 2021-02-02 2023-09-29 浙江大华技术股份有限公司 Self-distillation learning method and device based on feature analysis and readable storage medium
CN113065558B (en) * 2021-04-21 2024-03-22 浙江工业大学 Lightweight small target detection method combined with attention mechanism
CN113239985B (en) * 2021-04-25 2022-12-13 北京航空航天大学 Distributed small-scale medical data set-oriented classification detection method
CN113344206A (en) * 2021-06-25 2021-09-03 江苏大学 Knowledge distillation method, device and equipment integrating channel and relation feature learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523522A (en) * 2018-10-30 2019-03-26 腾讯科技(深圳)有限公司 Processing method, device, system and the storage medium of endoscopic images
CN110288597A (en) * 2019-07-01 2019-09-27 哈尔滨工业大学 Wireless capsule endoscope saliency detection method based on attention mechanism
CN111666998A (en) * 2020-06-03 2020-09-15 电子科技大学 Endoscope intelligent intubation decision-making method based on target point detection
CN113034500A (en) * 2021-05-25 2021-06-25 紫东信息科技(苏州)有限公司 Digestive tract endoscope picture focus identification system based on multi-channel structure
CN113486990A (en) * 2021-09-06 2021-10-08 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703953A (en) * 2023-03-17 2023-09-05 南通大学 Inferior mesenteric artery segmentation method based on CT image
CN116703953B (en) * 2023-03-17 2024-05-24 南通大学 Inferior mesenteric artery segmentation method based on CT image
CN116168255A (en) * 2023-04-10 2023-05-26 武汉大学人民医院(湖北省人民医院) Retina OCT (optical coherence tomography) image classification method with robust long tail distribution
CN116168255B (en) * 2023-04-10 2023-12-08 武汉大学人民医院(湖北省人民医院) Retina OCT (optical coherence tomography) image classification method with robust long tail distribution
CN116258914A (en) * 2023-05-15 2023-06-13 齐鲁工业大学(山东省科学院) Remote sensing image classification method based on machine learning and local and global feature fusion
CN116258914B (en) * 2023-05-15 2023-08-25 齐鲁工业大学(山东省科学院) Remote Sensing Image Classification Method Based on Machine Learning and Local and Global Feature Fusion
CN116612336B (en) * 2023-07-19 2023-10-03 浙江华诺康科技有限公司 Method, apparatus, computer device and storage medium for classifying smoke in endoscopic image
CN117056678A (en) * 2023-10-12 2023-11-14 北京宝隆泓瑞科技有限公司 Machine pump equipment operation fault diagnosis method and device based on small sample
CN117056678B (en) * 2023-10-12 2024-01-02 北京宝隆泓瑞科技有限公司 Machine pump equipment operation fault diagnosis method and device based on small sample
CN117197472A (en) * 2023-11-07 2023-12-08 四川农业大学 Efficient teacher and student semi-supervised segmentation method and device based on endoscopic images of epistaxis
CN117197472B (en) * 2023-11-07 2024-03-08 四川农业大学 Efficient teacher and student semi-supervised segmentation method and device based on endoscopic images of epistaxis

Also Published As

Publication number Publication date
CN113486990A (en) 2021-10-08
CN113486990B (en) 2021-12-21

Similar Documents

Publication Publication Date Title
WO2023030520A1 (en) Training method and apparatus of endoscope image classification model, and image classification method
WO2023030521A1 (en) Endoscope image classification model training method and device, and endoscope image classification method
WO2023071680A1 (en) Endoscope image feature learning model training method and apparatus, and endoscope image classification model training method and apparatus
EP3876190A1 (en) Endoscopic image processing method and system and computer device
WO2020098539A1 (en) Image processing method and apparatus, computer readable medium, and electronic device
TW201922174A (en) Image diagnosis assistance apparatus, data collection method, image diagnosis assistance method, and image diagnosis assistance program
WO2021103938A1 (en) Medical image processing method, apparatus and device, medium and endoscope
CN113470029B (en) Training method and device, image processing method, electronic device and storage medium
WO2020224153A1 (en) Nbi image processing method based on deep learning and image enhancement, and application thereof
Oliveira et al. Deep transfer learning for segmentation of anatomical structures in chest radiographs
JP7363883B2 (en) Image processing methods, devices and computer readable storage media
EP4120186A1 (en) Computer-implemented systems and methods for object detection and characterization
Jia et al. Face spoofing detection under super-realistic 3D wax face attacks
TWI728369B (en) Method and system for analyzing skin texture and skin lesion using artificial intelligence cloud based platform
Lin et al. A desmoking algorithm for endoscopic images based on improved U‐Net model
CN113177940A (en) Gastroscope video part identification network structure based on Transformer
WO2023165332A1 (en) Tissue cavity positioning method, apparatus, readable medium, and electronic device
EP4241650A1 (en) Image processing method, and electronic device and readable storage medium
CN115100723A (en) Face color classification method, device, computer readable program medium and electronic equipment
Gangrade et al. Colonoscopy polyp segmentation using deep residual u-net with bottleneck attention module
CN106296631A (en) A kind of gastroscope video summarization method based on attention priori
CN117338378B (en) Articulated laparoscopic forceps and rapid abdominal image segmentation method based on SBB U-NET
US20240087115A1 (en) Machine learning enabled system for skin abnormality interventions
Pan et al. Real‐time coloring method of laser surgery video based on generative adversarial network
US20240013509A1 (en) Computer-implemented systems and methods for intelligent image analysis using spatio-temporal information

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE