CN111932561A - Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation - Google Patents

Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation Download PDF

Info

Publication number
CN111932561A
CN111932561A CN202010997859.XA CN202010997859A CN111932561A CN 111932561 A CN111932561 A CN 111932561A CN 202010997859 A CN202010997859 A CN 202010997859A CN 111932561 A CN111932561 A CN 111932561A
Authority
CN
China
Prior art keywords
segmentation
teacher
model
training
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010997859.XA
Other languages
Chinese (zh)
Inventor
李坚强
陈杰
黄志超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202010997859.XA priority Critical patent/CN111932561A/en
Publication of CN111932561A publication Critical patent/CN111932561A/en
Priority to PCT/CN2020/130114 priority patent/WO2022057078A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30028Colon; Small intestine

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time enteroscopy image segmentation method and device based on integrated knowledge distillation. The method comprises the following steps: acquiring a plurality of training images, wherein the training images are divided into a plurality of training image sets, and the training images of the same training image set come from the same data set; firstly, training teacher models, wherein different teacher models respectively obtain a first segmentation graph according to different training image sets; and then the trained teacher model is used for refining a student model together. The training image is an enteroscope image screenshot, and the trained student model can generate a real-time enteroscope image segmentation image according to the real-time enteroscope image. Therefore, the problem that data sets among different hospitals are discontinuous and cannot be gathered together to train the colonoscopy automatic image segmentation model is solved.

Description

Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation
Technical Field
The invention relates to the field of image segmentation, in particular to a real-time enteroscopy image segmentation method and device based on integrated knowledge distillation.
Background
Minimally invasive surgery has limitations on the field of view, particularly where blind fields of view often exist in colonoscopy. Real-time colonoscopy automated image segmentation therefore plays an important role for intestinal surgery. The existing automatic image segmentation model for colonoscopy usually needs data sets from different hospitals when training, however, the data sets have discontinuity between different hospitals and can not be collected together to train the automatic image segmentation model for colonoscopy.
The prior art therefore remains to be improved.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a real-time enteroscopy image segmentation method based on integrated knowledge distillation and a storage medium thereof, aiming at solving the problem that in the prior art, data sets between different hospitals cannot be collected to train an automatic image segmentation model for colonoscopy.
The technical scheme adopted by the invention for solving the problems is as follows:
in a first aspect, an embodiment of the present invention provides a real-time enteroscopy image segmentation method based on integrated knowledge distillation, where the method includes:
acquiring a training image; the training image is a colonoscopy image screenshot used for training a teacher model and a student model; the training images are divided into a plurality of training image sets, and the training images of the same training image set come from the same data set;
inputting the training image into the teacher model to obtain a first segmentation graph; wherein the number of the teacher models is greater than or equal to two; different teacher models respectively obtain a first segmentation graph according to different training image sets;
correcting parameters of the teacher model according to the first segmentation graph and the first real label, and continuously executing the step of inputting the training image to the teacher model to obtain the first segmentation graph until preset training conditions of the teacher model are met to obtain a trained teacher model; the first real label is used for reflecting a real classification condition corresponding to the pixels on the training image under a first preset classification condition;
inputting the training image into the student model to obtain a second segmentation graph;
correcting parameters of the student model according to the second segmentation graph, a teacher label and a second real label, and continuously executing the step of inputting the training image to the student model to obtain a second segmentation graph until preset training conditions of the student model are met to obtain a trained student model; the second real label is used for reflecting the real classification condition corresponding to the pixels on the training image under a second preset classification condition; the teacher label is used for reflecting the classification condition of the training images in the trained teacher model;
and inputting the real-time enteroscopy image into the trained student model to generate a real-time enteroscopy image segmentation map.
In one embodiment, the acquiring a training image, which is a colonoscopy video screenshot for training a teacher model and a student model, includes:
acquiring an enteroscope image screenshot;
compressing according to the enteroscope image screenshot to obtain the training image; the height, the width and the number of channels of the training images are all constant.
In one embodiment, the teacher model includes a first down-sampling encoder and a first up-sampling decoder; the inputting the training image into the teacher model to obtain a first segmentation graph includes:
extracting features of the training image according to the first down-sampling encoder to obtain a first feature map; the first feature map contains feature information of the training image;
analyzing the first feature map according to the first up-sampling decoder to obtain the first segmentation map;
wherein the first segmentation map comprises first standard probabilities and first abnormal probabilities corresponding to pixels in the training image; the first standard probability is the probability that the pixel belongs to a standard under a first preset classification condition, and the first abnormal probability is the probability that the pixel belongs to an abnormal under the first preset classification condition; the sum of the probability values of the first anomaly probability and the first standard probability is 1.
In one embodiment, the modifying the parameters of the teacher model according to the first segmentation graph and the first real label, and continuing to perform the step of inputting the training image to the teacher model to obtain the first segmentation graph until a preset training condition of the teacher model is met to obtain a trained teacher model includes:
calculating a first loss value from the first segmentation map and the first real label;
adjusting parameters of the first upsampling decoder according to the first loss value to update the teacher model;
and continuing to execute the step of inputting the training image into the teacher model to obtain a first segmentation graph until a preset training condition of the teacher model is met, so as to obtain a trained teacher model.
In one embodiment, the student model includes a second downsampling encoder and a second upsampling decoder; the inputting the training image into the student model to obtain a second segmentation map comprises:
performing feature extraction on the training image according to the second downsampling encoder and outputting a second feature map; the second feature map contains feature information of the training image;
analyzing the second feature map according to the second up-sampling decoder to obtain the second segmentation map
Wherein; the second segmentation map comprises second standard probabilities and second abnormal probabilities corresponding to pixels in the training image; the second standard probability is the probability that the pixel belongs to the standard under a second preset classification condition, and the second abnormal probability is the probability that the pixel belongs to the abnormal under the second preset classification condition; the number of categories of the second preset classification condition is more than that of the first preset classification condition.
In one embodiment, the modifying the parameters of the student model according to the second segmentation map, the teacher label and the second real label, and continuing to perform the step of generating the second segmentation map according to the training image until a preset training condition of the student model is met to obtain a trained student model includes:
calculating a second loss value according to the second segmentation graph, the teacher label and a second real label;
adjusting parameters of the second upsampling decoder according to the second loss value to update the student model;
and continuing to execute the step of generating a second segmentation graph according to the training image until the preset training condition of the student model is met, so as to obtain the trained student model.
In one embodiment, said calculating a second loss value from said second segmentation graph, said teacher label and a second true label comprises:
obtaining a total probability value according to the first segmentation graphs output by all the trained teacher models;
and adjusting the first segmentation chart output by all the trained teacher models according to the total probability value to obtain a teacher label.
In a second aspect, an embodiment of the present invention further provides an apparatus for real-time enteroscopic image segmentation based on integrated knowledge distillation, wherein the apparatus includes:
the image acquisition module is used for acquiring a training image;
a teacher model unit for obtaining a first segmentation graph from the training image;
the first parameter correction module is used for correcting the parameters of the teacher model according to the first segmentation graph and the first real label;
the student model unit is used for obtaining a second segmentation graph according to the training image;
the second parameter correction module is used for correcting the parameters of the student model according to the second segmentation chart, the teacher label and the second real label;
the teacher model unit further includes:
the first down-sampling encoder module is used for extracting the features of the training image to obtain a first feature map;
a first upsampling decoder module, configured to parse the first feature map to obtain the first segmentation map;
the student model unit further includes:
the second downsampling encoder module is used for extracting the features of the training image to obtain a second feature map;
and the second up-sampling decoder module is used for analyzing the second characteristic diagram to obtain the second segmentation diagram.
In a third aspect, an embodiment of the present invention also provides a terminal, which includes a memory and one or more programs, where the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include a processor configured to execute any of the methods described above.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement any of the steps of the method for real-time enteroscopic image segmentation based on integrated knowledge distillation described above.
The invention has the beneficial effects that: according to the method, a plurality of training images are obtained, the training images are divided into a plurality of training image sets, and the training images of the same training image set are from the same data set; firstly, training teacher models, wherein different teacher models respectively obtain a first segmentation graph according to different training image sets; and then the trained teacher model is used for refining a student model together. The training image is an enteroscope image screenshot, and the trained student model can generate a real-time enteroscope image segmentation image according to the real-time enteroscope image. Therefore, the problem that data sets among different hospitals are discontinuous and cannot be gathered together to train the colonoscopy automatic image segmentation model is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a first flowchart of a real-time enteroscopy image segmentation method based on integrated knowledge distillation according to an embodiment of the present invention.
Fig. 2 is a second flowchart of the real-time enteroscopy image segmentation method based on integrated knowledge distillation according to the embodiment of the present invention.
Fig. 3 is a third flowchart of the real-time enteroscopy image segmentation method based on integrated knowledge distillation according to the embodiment of the present invention.
Fig. 4 is a schematic diagram of a connection relationship between a down-sampling encoder and an up-sampling decoder according to an embodiment of the present invention.
Fig. 5 is a fourth flowchart of a real-time enteroscopy image segmentation method based on integrated knowledge distillation according to an embodiment of the present invention.
Fig. 6 is a fifth flowchart illustrating a real-time enteroscopy image segmentation method based on integrated knowledge distillation according to an embodiment of the present invention.
Fig. 7 is a sixth flowchart of a real-time enteroscopy image segmentation method based on integrated knowledge distillation according to an embodiment of the present invention.
Fig. 8 is a seventh flowchart illustrating a real-time enteroscopy image segmentation method based on integrated knowledge distillation according to an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.
Fig. 10 is a schematic diagram of an internal structure of a real-time enteroscopy image segmentation device based on integrated knowledge distillation according to an embodiment of the present invention.
Fig. 11 is a diagram of the predicted effect of the student model provided by the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
With the development of minimally invasive surgery, the application of artificial intelligence assisted surgery represented by a robot system is more and more widespread. The auxiliary surgery is a system for assisting a doctor to complete a surgery by using a robot, and the purpose of the auxiliary surgery is mainly to overcome the limitation of the visual field of the existing minimally invasive surgery. Assisted surgery is particularly important in colonoscopy. Colonoscopy techniques are one of the important techniques in intestinal surgery, however, in colonoscopy many colon lesions often have similar properties to normal mucosa, such as similar color, or too flat, and such colon lesions with deceptive nature are often difficult to find without the aid of special methods. Real-time colonoscopy automated image segmentation therefore plays an important role in colonoscopy techniques. In particular, in past studies, many have attempted to perform medical image detection, such as automatic endoscopic detection and classification of colorectal polyps, using deep neural network models of natural images, or convolutional networks of biomedical image segmentation, and have achieved considerable detection results: studies have shown that deep learning can locate and identify polyps in screening colonoscopy with high accuracy in real time (e.g., using YOLO to detect polyps in colonoscopy video can locate and identify polyps in real time with 96% accuracy).
Currently, colonoscopy automatic image segmentation models usually require data sets from different hospitals when training, however, data sets between different hospitals have discontinuity and cannot be directly aggregated together to train the models. Furthermore, prior art studies of automated image segmentation for colonoscopy have focused primarily on polyp detection, and lack studies on automated detection of ulcers, bleeding, and merkel diverticulum.
Based on the defects of the prior art, the invention provides a real-time enteroscopy image segmentation method based on integrated knowledge distillation, which is a technology for assisting a doctor to judge a colon image of a patient. Briefly, in the colon image, the diseased tissue (lesion) is morphologically different from the normal tissue, such as differences in color, contour, texture, and other features. Therefore, after the automatic detection model learns a large number of colon examination images with known results, corresponding reference opinions can be given to doctors by carrying out image analysis on the input colon examination images and giving prediction results. However, in the training process of the automatic detection model, a large amount of computing resources are needed, so that information is extracted from a very large and highly redundant data set, and therefore, the scale of the trained model is very large, and the large-scale model is inconvenient to deploy in actual application, so that the model is compressed to be an important problem. The knowledge distillation is a model compression method, the main idea is to train a small network model to simulate a large network or an integrated network which is trained in advance, and in the knowledge distillation, a teacher teaches knowledge to students by the following method: and adding a loss function taking probability distribution of the teacher predicted result as a target in the process of training the student.
Briefly, the method includes the steps that firstly, a plurality of binary classification models are trained through data sets of different hospitals respectively, the binary classification models can detect common problems in colonoscopy such as polyps, ulcers, bleeding and merkel diverticula respectively, and then the trained binary classification models are used for refining a multi-element classification model together, so that the multi-element classification model can automatically detect the polyps, the ulcers, the bleeding and the merkel diverticula. In the invention, the teacher model is a binary classification model, the student model is a multivariate classification model, and the purpose of training the teacher model and the student model is to determine the optimal parameters of the teacher model and the student model and realize the optimal classification effect. And training different teacher models through different training image sets respectively to obtain the teacher models with different classification conditions. Training images of the same teacher model are from the data set of the same hospital, so that the problem that the data sets of different hospitals are discontinuous and cannot be gathered together to train the colonoscopy automatic image segmentation model is solved. And finally, distilling and extracting the knowledge contained in the trained teacher model into the student models by using a knowledge distilling method (namely distilling and extracting the knowledge contained in the trained models into the other models), thereby effectively compressing the student models and reducing the sizes of the student models.
As shown in fig. 1, the real-time enteroscopy image segmentation method based on integrated knowledge distillation provided in this embodiment includes the following steps:
s100, acquiring a training image; the training image is a colonoscopy image screenshot used for training a teacher model and a student model; the training images are divided into a plurality of training image sets, and the training images of the same training image set come from the same data set.
Briefly, the first step in training a model is to acquire available training images and then use the training images to train the teacher model and the student models. The data in the same training image set all come from the same hospital, so that the problem that data sets among different hospitals are discontinuous and cannot be gathered together to train the colonoscopy automatic image segmentation model is solved.
In one implementation, the step S100 shown in fig. 2 further includes the following steps:
s110, acquiring a colonoscopy image screenshot;
s120, compressing according to the enteroscope image screenshot to obtain the training image; the height, the width and the number of channels of the training images are all constant.
Specifically, since the dimensions of the network parameters in the teacher model and the student model are fixed, the sizes of the input images of the teacher model and the student model must be adapted to the dimensions of the network parameters to avoid causing dynamic changes in the network in the models, i.e., the sizes of the input images of the teacher model and the student model need to be fixed. The size of the input image is mainly related to the height, the width and the number of channels of the image, so that the size of the input image is fixed, namely the height, the width and the number of channels of the input image are all constant. In specific implementation, after the enteroscope image screenshot is obtained, the enteroscope image screenshot is compressed, so that the height, the width and the channel number of the enteroscope image screenshot meet the standards of the input images of the teacher model and the student model. And the enteroscope image screenshot after the compression processing is finished is formed into the training image, and the training image can be directly input into the teacher model and the student model and is used for training the teacher model and the student model. In addition, the training images can be divided into a plurality of training image sets, and the data of the same training image set are all from the data set of the same hospital.
After the step S100 is completed, as shown in fig. 1, the method further includes a step S200 of inputting the training image to the teacher model to obtain a first segmentation graph; wherein the number of the teacher models is greater than or equal to two; and different teacher models respectively obtain a first segmentation graph according to different training image sets.
The invention firstly trains a plurality of binary classification models, and then uses the trained binary classification models to jointly refine a multi-element classification model, namely, firstly trains a teacher model, and then uses the trained teacher model to jointly refine a student model. It is therefore necessary to train the teacher model first. For example, there are currently two teacher models A, B, teacher model a is used for automatically detecting polyps, teacher model B is used for automatically detecting bleeding, teacher model A, B is trained through different training image sets, data in the training image set of teacher model a may come from a hospital with the strongest local polyp treatment technology, and data in the training image set of teacher model B may come from a hospital with the strongest local colonic bleeding treatment technology. The teacher model classifies pixels on the training images according to preset classification conditions by collecting specific characteristic information in the training images, and outputs a classification result, wherein the classification result is the first segmentation graph.
The specific classification process is as follows, in one implementation, the teacher model includes a first down-sampling encoder and a first up-sampling decoder, and as shown in fig. 3, the step S200 further includes the following steps:
step S210, extracting the features of the training image according to the first down-sampling encoder to obtain a first feature map; the first feature map contains feature information of the training image;
step S220, analyzing the first feature map according to the first up-sampling decoder to obtain the first segmentation map;
wherein the first segmentation map comprises first standard probabilities and first abnormal probabilities corresponding to pixels in the training image; the first standard probability is the probability that the pixel belongs to a standard under a first preset classification condition, and the first abnormal probability is the probability that the pixel belongs to an abnormal under the first preset classification condition; the sum of the probability values of the first anomaly probability and the first standard probability is 1.
Briefly, the teacher model mainly adopts a random gradient descent algorithm during training. The teacher model is mainly composed of a first down-sampling encoder and a first up-sampling decoder. The first downsampling encoder and the first upsampling decoder are respectively composed of four downsampling layers and four upsampling layers, a connection relation exists between the first downsampling encoder and the first upsampling decoder, the four downsampling layers are connected with the four upsampling layers in a one-to-one correspondence mode, the output of the four downsampling layers is added into the corresponding upsampling layers respectively, and an upsampling process is participated, so that the gradient of the teacher model is kept. After the training image is input into the teacher model, the training image sequentially passes through four downsampling layers of the first downsampling encoder to obtain the first feature map. And then the first characteristic diagram sequentially passes through four upsampling layers of the first upsampling decoder, and the final output result is the first segmentation diagram.
Furthermore, since deep learning models generally need to rely on powerful computing power as support, it is difficult to deploy them in devices with limited computing resources and limited storage space. To address this issue, the first downsampling encoder is constructed using a lightweight network MobileNetv2 when constructing the teacher model. The lightweight network MobileNetv2 is a lightweight model provided for equipment with limited computing resources, and a lightweight deep neural network is constructed by using deep separable convolution, so that the network structure is simplified, and the lightweight network MobileNetv2 has high accuracy and good model compression capability. In one implementation, as shown in fig. 4, the present embodiment adopts four layers in MobileNetv2 as the first downsampling layer 10, the second downsampling layer 20, the third downsampling layer 30, and the fourth downsampling layer 40 in the first downsampling encoder, respectively. Inputting a training image 1 into the first downsampling encoder, wherein the training image 1 is firstly input into the first downsampling layer 10, then the output of the first downsampling layer 10 is used as the input of the second downsampling layer 20, and by analogy, the output image of the last downsampling layer is used as the input image of the next downsampling layer, and the step of performing feature extraction on the input image is continuously performed until the fourth downsampling layer 40 finishes feature extraction on the input image and then outputs the first feature map 2. Wherein each downsampled layer in the first downsampling encoder is comprised of an inverse residual module constructed from a depth separable convolution. Specifically, the reverse residual error module performs point-by-point convolution calculation on the input image to expand the number of channels of the image; then, extracting image features by executing depth convolution calculation; then, the number of channels of the image is compressed by performing point-by-point convolution calculation. Therefore, the size of the teacher model is reduced, and meanwhile, the accuracy of the teacher model for extracting the image features is not lost.
The first down-sampling encoder performs feature extraction on an input training image, outputs the first feature map, uses the first feature map as an input image of the first up-sampling decoder, and then performs the step S220.
Specifically, four upsampling layers in the upsampling decoder respectively consist of a transposed convolutional layer and a standard layer, the transposed convolutional layer is used for expanding an image and extracting image features, and the standard layer is used for avoiding the mutual influence of parameters between different upsampling layers. For example, the following steps are carried out: as shown in fig. 4, after the first feature fig. 2 is inputted to the up-sampling decoder, the output result of the first up-sampling layer 50 is combined with the output result of the fourth down-sampling layer 40 in the down-sampling encoder through the first up-sampling layer 50, and is inputted to the second up-sampling layer 60 as an input image of the second up-sampling layer 60. The combining of the output result of the second upsampling layer 60 with the output result of the third downsampling layer 30 in the downsampling encoder is continuously performed as an input image of the third upsampling layer 70 and is input to the third upsampling layer 70. The combining of the output of the third upsampling layer 70 with the output of the second downsampling layer 20 in the downsampling encoder is then performed again as an input image for the fourth upsampling layer 80 and input to the fourth upsampling layer 80. Finally, the output result of the fourth upsampling layer 80 is combined with the output result of the first downsampling layer 10 in the downsampling encoder to obtain the first segmentation map 3.
In one implementation, the first segmentation map includes a first standard probability and a first abnormal probability corresponding to pixels in the training image; the first standard probability is the probability that the pixel belongs to a standard under a first preset classification condition, and the first abnormal probability is the probability that the pixel belongs to an abnormal under the first preset classification condition; the sum of the probability values of the first anomaly probability and the first standard probability is 1. For example, if the teacher model is a model for automatically detecting polyps, the corresponding first predetermined classification condition is whether polyps are present, the first standard probability is a probability that pixels in the training image are corresponding to normal (i.e., not having polyps), and the first abnormal probability is a probability that pixels in the training image are corresponding to polyps.
Specifically, after the training image is input into the teacher model, the first downsampling encoder downsamples the training image to obtain a downsampled training image with a size of
Figure 199949DEST_PATH_IMAGE001
The first characteristic diagram of (1), wherein
Figure 833056DEST_PATH_IMAGE002
Figure 746785DEST_PATH_IMAGE003
Figure 236890DEST_PATH_IMAGE004
Is the height, width and number of channels of the first signature. The first up-sampling decoder performs up-sampling according to the first feature map to obtain a first segmentation map
Figure 818044DEST_PATH_IMAGE005
. Wherein,
Figure 294156DEST_PATH_IMAGE006
represents the first
Figure 960760DEST_PATH_IMAGE007
And k is the number of the teacher models, and each teacher model corresponds to a specific classification category.
Figure 977258DEST_PATH_IMAGE008
Figure 412919DEST_PATH_IMAGE009
Each pixel is predicted to be normal under a first predetermined classification conditionThe probability of (a) is the first standard probability,
Figure 653407DEST_PATH_IMAGE010
is the probability that each pixel is predicted to be abnormal under the first preset classification condition, i.e. the first abnormal probability,
Figure 541729DEST_PATH_IMAGE009
and
Figure 361917DEST_PATH_IMAGE010
the sum of (1).
Figure 448822DEST_PATH_IMAGE011
For the image channel, j =1 denotes channel 1, outputting a first standard probability; j =2 denotes channel 2, which outputs a first anomaly probability, R being a real number set.
For example, there are currently 4 teacher models: the teacher model A, the teacher model B, the teacher model C and the teacher model D respectively have corresponding classification conditions of whether polyps are suffered or not, whether merck diverticulums are suffered or not, whether ulcers are suffered or not and whether bleeding is suffered or not, and the classification conditions are respectively used for training to automatically detect the polyps, the merck diverticulums, the ulcers and the bleeding. Inputting the training image into the teacher model A to obtain a first segmentation graph (0.1, 0.9), wherein 0.1 is the probability that the prediction pixel output by the channel 1 is normal, and 0.9 is the probability that the prediction pixel output by the channel 2 is polyp; inputting the training image into the teacher model B to obtain a first segmentation graph (0.2, 0.8), wherein 0.2 is the probability that the predicted pixel output by the channel 1 is normal, and 0.8 is the probability that the predicted pixel output by the channel 2 has a Merck diverticulum; inputting the training image into the teacher model C to obtain a first segmentation graph (0.3, 0.7), wherein 0.3 is the probability that the predicted pixel output by the channel 1 is normal, and 0.7 is the probability that the predicted pixel output by the channel 2 is ulcer; and inputting the training image into the teacher C model to obtain a first segmentation graph (0.4, 0.6), wherein 0.4 is the probability that the predicted pixel output by the channel 1 is normal, and 0.6 is the probability that the predicted pixel output by the channel 2 has bleeding.
In order to obtain the correctness of the teacher model prediction during training, the method further comprises a step S300 of correcting parameters of the teacher model according to the first segmentation graph and the first real label, and continuing to execute the step of inputting the training image into the teacher model to obtain the first segmentation graph until a preset training condition of the teacher model is met, so as to obtain a trained teacher model; the first real label is used for reflecting a real result corresponding to the pixel on the training image under a first preset classification condition.
In the actual training process, each training image has its corresponding real label to evaluate the classification effect (prediction effect) of the model. The real label used for training the teacher model is the first real label to indicate a corresponding real result of the training image under the first preset classification condition. The training is to make the output result of the teacher model approach the real label continuously, so that the teacher model will perform parameter correction continuously during the training process, thereby controlling the training process and guiding the training process to converge towards the optimal direction.
As shown in fig. 5, the step S300 specifically includes the following steps:
step S310, calculating a first loss value according to the first segmentation chart and the first real label;
step S320, adjusting parameters of the first up-sampling decoder according to the first loss value to update the teacher model;
and S330, continuing to input the training image into the teacher model to obtain a first segmentation graph until a preset training condition of the teacher model is met to obtain a trained teacher model.
By continuously comparing the first segmentation graph with the first real label, the difference between the prediction result and the real result of the teacher model can be obtained, so that the teacher model can determine how to correct the parameters according to the difference between the prediction result and the real result, and a better prediction effect is achieved. Specifically, the teacher model classifies the training images according to the first preset classification condition to obtain a first segmentation graph, and substitutes the first segmentation graph and the first real label into a calculation formula of the first loss value to obtain the first loss value, where the first loss value may represent a difference between the first segmentation graph and the first real label. The calculation formula of the first loss value is as follows:
Figure 63474DEST_PATH_IMAGE012
since the first loss value refers to the difference between the first segmentation graph and the first real label, the larger the value of the first loss value, i.e. the larger the difference between the first segmentation graph and the first real label, the poorer the classification effect of the teacher model; the smaller the value of the first loss value is, the smaller the difference between the first segmentation graph and the first real label is, the better the classification effect of the teacher model is.
Wherein
Figure 439092DEST_PATH_IMAGE013
Is a teacher model
Figure 594129DEST_PATH_IMAGE014
Output the first
Figure 738803DEST_PATH_IMAGE015
First of the channel
Figure 524356DEST_PATH_IMAGE016
And row and column
Figure 184008DEST_PATH_IMAGE017
First segmentation of column pixels
Figure 80420DEST_PATH_IMAGE018
To show the teacher model
Figure 345179DEST_PATH_IMAGE014
The predicted junction of the pixelAnd (5) fruit.
Figure 363950DEST_PATH_IMAGE019
Is the first of the training image
Figure 714160DEST_PATH_IMAGE015
First of the channel
Figure 414263DEST_PATH_IMAGE016
And row and column
Figure 736791DEST_PATH_IMAGE017
Corresponding real label of column pixel
Figure 129726DEST_PATH_IMAGE020
The pixel may be indicated in the teacher model
Figure 967232DEST_PATH_IMAGE014
True classification under the corresponding classification conditions.
Figure 267764DEST_PATH_IMAGE021
Wherein
Figure 241536DEST_PATH_IMAGE022
Is that the pixel is in the teacher model
Figure 539793DEST_PATH_IMAGE014
A normal tag of (1), and
Figure 864595DEST_PATH_IMAGE023
is that the pixel is in the teacher model
Figure 703238DEST_PATH_IMAGE014
The abnormal signature (diseased signature) in (1). The real label
Figure 531517DEST_PATH_IMAGE024
Is a one-hot vector, i.e. only one channel in the same label corresponds to a result that is not 0, the other channels all correspond to a result that is 0,in other words, the real label of the teacher model has only two forms, namely (1, 0) and (0, 1), wherein (1, 0) indicates that the real condition corresponding to the pixel is normal, and (0, 1) indicates that the real condition corresponding to the pixel is abnormal. According to the real label
Figure 266255DEST_PATH_IMAGE025
To evaluate the first segmentation graph output by the teacher model
Figure 812774DEST_PATH_IMAGE026
The predicted effect is obtained by
Figure 720687DEST_PATH_IMAGE027
And
Figure 669051DEST_PATH_IMAGE028
substituting the first loss value into the calculation formula of the first loss value to obtain a first loss value of the teacher model, and evaluating the prediction effect of the teacher model according to the obtained first loss value, wherein the larger the obtained first loss value is, the worse the prediction effect of the teacher model is; the smaller the first loss value obtained, the better the prediction effect of the teacher model.
And obtaining a trained teacher model after the teacher model is trained, wherein the trained teacher model can be used for refining the student model, and the process of refining the student model is the training process of the student model.
Therefore, as shown in fig. 1, the method further includes step S400 of inputting the training image to the student model to obtain a second segmentation map. Specifically, the student model collects specific feature information in the training image, classifies pixels on the training image according to preset classification conditions, and outputs a classification result, wherein the classification result is the second segmentation map.
As shown in fig. 6, the step S400 specifically includes the following steps:
step S410, extracting the features of the training image according to the second down-sampling encoder and outputting a second feature map; the second feature map contains feature information of the training image;
and step S420, analyzing the second feature map according to the second up-sampling decoder to obtain the second segmentation map.
Wherein; the second segmentation map comprises second standard probabilities and second abnormal probabilities corresponding to pixels in the training image; the second standard probability is the probability that the pixel belongs to the standard under a second preset classification condition, and the second abnormal probability is the probability that the pixel belongs to the abnormal under the second preset classification condition; the number of categories of the second preset classification condition is more than that of the first preset classification condition.
Specifically, the student model is similar to the teacher model in construction, and each student model comprises a down-sampling encoder and an up-sampling decoder. The down-sampling encoder in the student model is the second down-sampling encoder, the up-sampling decoder in the student model is the second up-sampling decoder, the second down-sampling encoder with the second up-sampling decoder constitutes by four down-sampling layers and four up-sampling layers respectively equally, the second down-sampling encoder with there is the relation of connection between the second up-sampling decoder, four down-sampling layers with four up-sampling layer one-to-one are connected, and will in the up-sampling layer that its correspondence was added respectively to the output on four down-sampling layers, participate in the up-sampling process, in order to keep the gradient of student model. And after the training image is input into the student model, the training image sequentially passes through four down-sampling layers of the second down-sampling encoder to obtain the second feature map. And then the second feature map sequentially passes through four upsampling layers of the second upsampling decoder, and the final output result is the first segmentation map. The student model is mainly different from the teacher model in that the number of classes of the classification condition of the student model is greater than that of the class of the classification condition of the teacher model, so that the dimension of the prediction result output by the student model is greater than that of the prediction result output by the teacher model. And the number of channels of the middle layer in the student model is smaller than that of the channels of the middle layer in the teacher model, so that the whole size of the student model is reduced.
Specifically, the present embodiment also adopts four layers in MobileNetv2 as four downsampling layers in the second downsampling encoder, and uses the output image of the previous downsampling layer as the input image of the next downsampling layer, and continues to perform the step of performing feature extraction on the input image until the fourth downsampling layer finishes feature extraction on the input image, and then outputs the second feature map (the detailed process may refer to step S210). Likewise, each downsampled layer in the second downsampled encoder is comprised of an inverse residual module constructed from depth separable convolutions. Specifically, the inverse residual error module performs point-by-point convolution calculation on the input image to expand the number of channels of the image. Then, by performing a depth convolution calculation, image features are extracted. Then, the number of channels of the image is compressed by performing point-by-point convolution calculation. Therefore, the size of the student model is reduced, and meanwhile, the accuracy of the student model for extracting the image features is not lost.
The second downsampling encoder performs feature extraction on the input training image, outputs the second feature map, uses the second feature map as an input image of the second upsampling decoder, and then performs the step S420.
In specific implementation, the outputs of the four down-sampling layers in the second down-sampling encoder are added into the corresponding four up-sampling layers in the second up-sampling decoder one by one to participate in the up-sampling process, so as to maintain the gradient of the student model. The second feature map passes through the four upsampling layers of the second upsampling decoder in sequence, and the final output result is the second segmentation map (the detailed process may refer to step S220).
In one implementation, the second segmentation map includes a second standard probability and a second abnormal probability corresponding to pixels in the training image; the second standard probability is the probability that the pixel belongs to the standard under a second preset classification condition, and the second abnormal probability is the probability that the pixel belongs to the abnormal under the second preset classification condition; the number of categories of the second preset classification condition is more than that of the first preset classification condition.
Specifically, after the training image is input into the student model, the second down-sampling encoder performs down-sampling according to the training image to obtain a value of
Figure 309111DEST_PATH_IMAGE029
A second characteristic diagram of
Figure 139664DEST_PATH_IMAGE030
Figure 511653DEST_PATH_IMAGE031
Figure 314524DEST_PATH_IMAGE032
Is the height, width and number of channels of the second signature. The second up-sampling decoder performs up-sampling according to the second feature map to obtain a second segmentation map
Figure 187802DEST_PATH_IMAGE033
. Wherein,Srepresents a model of a student,
Figure 708913DEST_PATH_IMAGE034
Figure 630733DEST_PATH_IMAGE035
n is an image channel (for example, N =1 indicates a channel 1), N is a positive integer and N is equal to or greater than 2, the value of N is related to the number of teacher models, and R is a real number set.
Figure 553690DEST_PATH_IMAGE036
Is the minimum standard probability that the pixel corresponds under the second preset classification condition,
Figure 597869DEST_PATH_IMAGE037
indicating pixels in a second predetermined classification barThe probability of an anomaly under condition belonging to the first class,
Figure 606276DEST_PATH_IMAGE038
representing the probability that the pixel belongs to the second class of anomaly under the second preset classification condition, and so on.
For example, since the invention extracts the knowledge contained in the trained teacher models into the student models, the classification conditions set by the student models are all related to the classification conditions of the trained teacher models. For example, there are currently 4 teacher models: the teacher model A, the teacher model B, the teacher model C and the teacher model D respectively have corresponding classification conditions of whether polyps are suffered or not, whether merck diverticulums are suffered or not, whether ulcers are suffered or not and whether bleeding is suffered or not, and the classification conditions are respectively used for training to automatically detect the polyps, the merck diverticulums, the ulcers and the bleeding. The second preset classification conditions of the student models distilled and extracted from the above four teacher models are four types: the first category of classification conditions is whether polyps are present, the second category of classification conditions is whether merkel diverticula are present, the third category of classification conditions is whether ulcers are present, and the fourth category of classification conditions is whether bleeding is present. Inputting the training image into the student model to obtain a second segmentation chart
Figure 331787DEST_PATH_IMAGE039
It means that the probability of the predicted pixel having polyp is 0.8, the probability of having merkel diverticulum is 0.7, the probability of having ulcer is 0.5, and the probability of having hemorrhage is 0.3. The probability of not suffering from polyps is 1-0.8=0.2, the probability of not suffering from polyps is 1-0.7=0.3, the probability of not suffering from polyps is 1-0.5=0.5, the probability of not suffering from polyps is 1-0.3=0.7, and the probability of not suffering from polyps is the minimum value of normal probabilities in all diseases, so that the probability of not suffering from polyps 0.2 is kept in the second segmentation map as the normal probability corresponding to pixels, and therefore the inaccurate prediction effect of the student model caused by the existence of the overhigh normal probability is avoided.
In specific implementation, the predicted effect graph of the student model is shown in fig. 11, where a column is a training image input into the student model; column B is the corresponding second true label (true category graph); column C is the second segmentation map (predicted effect map) corresponding to the output.
In order to obtain the correctness of the classification of the student model during training, the method further comprises the following steps:
step S500, correcting parameters of the student model according to the second segmentation graph, a teacher label and a second real label, and continuously executing the step of inputting the training image to the student model to obtain a second segmentation graph until preset training conditions of the student model are met to obtain a trained student model; the second real label is used for reflecting the real classification condition corresponding to the pixels on the training image under a second preset classification condition; the teacher label is used for reflecting the classification condition of the training images in the trained teacher model.
And the real label used for training the student model is a second real label to indicate the real classification condition corresponding to the pixel in the training image under the second preset classification condition. The training is to make the output result of the student model approach the real label continuously, so that the student model can perform parameter correction continuously in the training process, thereby controlling the training process and guiding the training process to converge towards the optimal direction.
As shown in fig. 7, the step S500 specifically includes the following steps:
step S510, calculating a second loss value according to the second segmentation chart, the teacher label and a second real label;
step S520, adjusting parameters of the second upsampling decoder according to the second loss value to update the student model;
and step S530, continuing to execute the step of generating a second segmentation graph according to the training image until the preset training condition of the student model is met, so as to obtain the trained student model.
By continuously comparing the second segmentation graph with the second real label, the difference between the prediction result and the real classification result of the student model can be obtained, so that the student model can determine how to correct the parameters according to the difference between the prediction result and the real classification result, and a better prediction effect is achieved. Specifically, the student model classifies the training images according to the second preset classification condition to obtain a second segmentation map, and substitutes the second segmentation map and the second real label into a calculation formula of a second loss value to obtain the second loss value, where the second loss value can represent a difference between the second segmentation map and the second real label. The calculation formula of the second loss value is as follows:
Figure 109250DEST_PATH_IMAGE040
since the second loss value refers to the difference between the second segmentation map and the second real label, the larger the value of the second loss value, i.e. the larger the difference between the second segmentation map and the second real label, the poorer the classification effect of the student model; the smaller the value of the second loss value is, the smaller the difference between the second segmentation map and the second real label is, the better the classification effect of the student model is.
Wherein,
Figure 324331DEST_PATH_IMAGE041
is the output of the student model
Figure 288876DEST_PATH_IMAGE042
First of the channel
Figure 83656DEST_PATH_IMAGE043
And row and column
Figure 512364DEST_PATH_IMAGE044
Second segmentation of column pixels
Figure 101608DEST_PATH_IMAGE045
Figure 819028DEST_PATH_IMAGE046
Is the first of the training image
Figure 948658DEST_PATH_IMAGE042
First of the channel
Figure 435134DEST_PATH_IMAGE043
And row and column
Figure 195280DEST_PATH_IMAGE044
The teacher label of a column of pixels, which may represent the predicted result of that pixel in the trained teacher model,
Figure 399996DEST_PATH_IMAGE047
is the first of the training image
Figure 67738DEST_PATH_IMAGE042
First of the channel
Figure 408721DEST_PATH_IMAGE043
And row and column
Figure 605347DEST_PATH_IMAGE044
The true label of a column pixel, i.e. the second true label, may indicate the true classification condition of the pixel in the training image under the second preset classification condition.
In particular, the second authentic tag
Figure 828518DEST_PATH_IMAGE048
In the form of
Figure 768792DEST_PATH_IMAGE049
. Wherein
Figure 964281DEST_PATH_IMAGE050
Is the label of the pixel belonging to the criterion under said second predetermined classification condition, y1 SThe pixels belong to a label of a first class anomaly, y, under said second preset classification condition2 SThe pixel is at the secondLabels belonging to a second class of anomalies under two predetermined classification conditions, yn SAnd the pixels belong to the labels of the n-th class of abnormity under the second preset classification condition. The second real label is also a single-time thermal vector, that is, the result corresponding to only one channel in the same label is not 0, and the results corresponding to other channels are all 0, in other words, the same label can only refer to one of normal pixels or n-type anomalies.
In one implementation, the teacher label is derived from the first segmentation map output by all the trained teacher models, as shown in fig. 8, and the step S510 includes the following steps:
step S511, obtaining a probability total value according to the first segmentation graphs output by all the trained teacher models;
and S512, adjusting the first segmentation graphs output by all the trained teacher models according to the total probability value to obtain teacher labels.
Since the dimensions of the first segmentation graph output by the teacher model and the second segmentation graph output by the student model are different (which is equivalent to different channel numbers), the first segmentation graphs output by all the teacher models need to be adjusted first so that the teacher label is matched with the dimensions of the output result of the student model. In this embodiment, the first segmentation graph output by all teacher models is adjusted by the following formula, and a teacher label is obtained:
Figure 862967DEST_PATH_IMAGE051
Figure 42275DEST_PATH_IMAGE052
wherein D is the total probability value,
Figure 255082DEST_PATH_IMAGE053
is a teacher label. Specifically, first, a first segmentation chart is output according to i teacher models
Figure 570657DEST_PATH_IMAGE054
Obtaining a first vector
Figure 905823DEST_PATH_IMAGE055
The first vector retains the i teacher model outputs in the first segmentation graph
Figure 41270DEST_PATH_IMAGE056
And keeping the smallest probability value in the first segmentation graphs output by the i teacher models
Figure 323346DEST_PATH_IMAGE057
As
Figure 555745DEST_PATH_IMAGE058
. According to a formula for calculating the total probability value D, the first vector is divided into a plurality of first vectors
Figure 999495DEST_PATH_IMAGE058
And all of the first vector
Figure 887817DEST_PATH_IMAGE056
And adding to obtain a probability total value. Then, the first vector is measured
Figure 504743DEST_PATH_IMAGE059
Dividing the total probability value D to obtain a second vector, wherein the second vector is the teacher label
Figure 806629DEST_PATH_IMAGE053
For example, currently, there are A, B, C, D four teacher models, and the first segmentation maps output by A, B, C, D teacher models are (0.1, 0.9), (0.2, 0.8), (0.3, 0.7), (0.4, 0.6), respectively, then first a first vector (0.1, 0.9, 0.8, 0.7, 0.6) is obtained according to the first segmentation map output by A, B, C, D teacher model, then all probabilities in the first vector are added to obtain a total probability value of 3.1, i.e., 0.1+0.9+0.8+0.7+0.6=3.1, and then the first vector is divided by the total probability value, i.e., it is equivalent to divide all probabilities in the first vector by 3.1, respectively, to obtain a second vector, which is the teacher label.
The second vector is represented as
Figure 155702DEST_PATH_IMAGE060
After the training of the student model is completed, a trained student model is obtained, and the trained student model can be used for real-time enteroscopy image segmentation, namely as shown in fig. 1.
Based on the above embodiment, as shown in fig. 10, the present invention further provides a device for real-time enteroscopy image segmentation based on integrated knowledge distillation, wherein the device comprises: an image acquisition module 120, wherein the image acquisition module 120 is configured to acquire a training image; a teacher model unit 130, the teacher model unit 130 configured to obtain a first segmentation map from the training image; a first parameter modification module 110, where the first parameter modification module 110 is configured to modify parameters of the teacher model according to the first segmentation map and the first real label; a student model unit 170, wherein the student model unit 170 is configured to obtain a second segmentation map according to the training image; a second parameter modification module 160, wherein the second parameter modification module 160 is configured to modify parameters of the student model according to the second segmentation map, the teacher label, and the second real label;
the teacher model unit 130 further includes: a first downsampling encoder module 90, where the first downsampling encoder module 90 is configured to perform feature extraction on the training image to obtain a first feature map; a first upsampling decoder module 100, where the first upsampling decoder module 100 is configured to parse the first feature map to obtain the first segmentation map;
the student model unit 170 further includes: a second downsampling encoder module 140, where the second downsampling encoder module 140 is configured to perform feature extraction on the training image to obtain a second feature map; a second upsampling decoder module 150, wherein the second upsampling decoder module 150 is configured to parse the second feature map to obtain the second segmentation map.
Based on the above embodiments, the present invention also provides a non-transitory computer readable storage medium, on which a data storage program is stored, the data storage program, when executed by a processor, implements the steps of the integrated knowledge distillation based real-time enteroscopy image segmentation method as described above.
Any reference to memory, storage, database or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Based on the foregoing embodiments, the present invention further provides a terminal, which includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include a method for performing the integrated knowledge distillation-based real-time enteroscopy image segmentation method as described in any one of the above. A functional block diagram of the terminal may be as shown in fig. 9. The terminal comprises a processor, a memory and a network interface which are connected through a system bus. Wherein the processor of the terminal is configured to provide computing and control capabilities. The memory of the terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the terminal is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a real-time enteroscopy image segmentation method based on integrated knowledge distillation.
It will be understood by those skilled in the art that the block diagram of fig. 9 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have different arrangements of components. In addition, the method for real-time segmentation of an enteroscopy image based on distillation of integrated knowledge as described in any one of the above embodiments may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the above embodiments of the method. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, in the present invention, a plurality of training images are obtained, the training images are divided into a plurality of training image sets, and the training images of the same training image set are from the same data set; firstly, training teacher models, wherein different teacher models respectively obtain a first segmentation graph according to different training image sets; and then the trained teacher model is used for refining a student model together. The training image is an enteroscope image screenshot, and the trained student model can generate a real-time enteroscope image segmentation image according to the real-time enteroscope image. Therefore, the problem that data sets among different hospitals are discontinuous and cannot be gathered together to train the colonoscopy automatic image segmentation model is solved.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method for real-time enteroscopy image segmentation based on integrated knowledge distillation, the method comprising:
acquiring a training image; the training image is a colonoscopy image screenshot used for training a teacher model and a student model; the training images are divided into a plurality of training image sets, and the training images of the same training image set come from the same data set;
inputting the training image into the teacher model to obtain a first segmentation graph; wherein the number of the teacher models is greater than or equal to two; different teacher models respectively obtain a first segmentation graph according to different training image sets;
correcting parameters of the teacher model according to the first segmentation graph and the first real label, and continuously executing the step of inputting the training image to the teacher model to obtain the first segmentation graph until preset training conditions of the teacher model are met to obtain a trained teacher model; the first real label is used for reflecting a real classification condition corresponding to the pixels on the training image under a first preset classification condition;
inputting the training image into the student model to obtain a second segmentation graph;
correcting parameters of the student model according to the second segmentation graph, a teacher label and a second real label, and continuously executing the step of inputting the training image to the student model to obtain a second segmentation graph until preset training conditions of the student model are met to obtain a trained student model; the second real label is used for reflecting the real classification condition corresponding to the pixels on the training image under a second preset classification condition; the teacher label is used for reflecting the classification condition of the training images in the trained teacher model;
and inputting the real-time enteroscopy image into the trained student model to generate a real-time enteroscopy image segmentation map.
2. The method of claim 1, wherein the obtaining of the training images, which are screenshots of the colonoscopy images for training the teacher model and the student model, comprises:
acquiring an enteroscope image screenshot;
compressing according to the enteroscope image screenshot to obtain the training image; the height, the width and the number of channels of the training images are all constant.
3. The method of claim 1, wherein the teacher model comprises a first downsampling encoder and a first upsampling decoder; the inputting the training image into the teacher model to obtain a first segmentation graph includes:
extracting features of the training image according to the first down-sampling encoder to obtain a first feature map; the first feature map contains feature information of the training image;
analyzing the first feature map according to the first up-sampling decoder to obtain the first segmentation map;
wherein the first segmentation map comprises first standard probabilities and first abnormal probabilities corresponding to pixels in the training image; the first standard probability is the probability that the pixel belongs to a standard under a first preset classification condition, and the first abnormal probability is the probability that the pixel belongs to an abnormal under the first preset classification condition; the sum of the probability values of the first anomaly probability and the first standard probability is 1.
4. The method of claim 3, wherein the modifying the parameters of the teacher model according to the first segmentation map and the first real labels and continuing to input the training image into the teacher model to obtain the first segmentation map until a preset training condition of the teacher model is met to obtain the trained teacher model comprises:
calculating a first loss value from the first segmentation map and the first real label;
adjusting parameters of the first upsampling decoder according to the first loss value to update the teacher model;
and continuing to execute the step of inputting the training image into the teacher model to obtain a first segmentation graph until a preset training condition of the teacher model is met, so as to obtain a trained teacher model.
5. The method of claim 1, wherein the student model comprises a second downsampling encoder and a second upsampling decoder; the inputting the training image into the student model to obtain a second segmentation map comprises:
performing feature extraction on the training image according to the second downsampling encoder and outputting a second feature map; the second feature map contains feature information of the training image;
analyzing the second feature map according to the second up-sampling decoder to obtain the second segmentation map
Wherein; the second segmentation map comprises second standard probabilities and second abnormal probabilities corresponding to pixels in the training image; the second standard probability is the probability that the pixel belongs to the standard under a second preset classification condition, and the second abnormal probability is the probability that the pixel belongs to the abnormal under the second preset classification condition; the number of categories of the second preset classification condition is more than that of the first preset classification condition.
6. The method of claim 5, wherein the modifying the parameters of the student model according to the second segmentation map, the teacher label and the second real label and continuing to perform the step of generating the second segmentation map according to the training image until a preset training condition of the student model is met to obtain the trained student model comprises:
calculating a second loss value according to the second segmentation graph, the teacher label and a second real label;
adjusting parameters of the second upsampling decoder according to the second loss value to update the student model;
and continuing to execute the step of generating a second segmentation graph according to the training image until the preset training condition of the student model is met, so as to obtain the trained student model.
7. The method of claim 6, wherein said calculating a second loss value from said second segmentation graph, said teacher label, and a second true label comprises:
obtaining a total probability value according to the first segmentation graphs output by all the trained teacher models;
and adjusting the first segmentation chart output by all the trained teacher models according to the total probability value to obtain a teacher label.
8. An apparatus for real-time enteroscopic image segmentation based on integrated knowledge distillation, the apparatus comprising:
the image acquisition module is used for acquiring a training image;
a teacher model unit for obtaining a first segmentation graph from the training image;
the first parameter correction module is used for correcting the parameters of the teacher model according to the first segmentation graph and the first real label;
the student model unit is used for obtaining a second segmentation graph according to the training image;
the second parameter correction module is used for correcting the parameters of the student model according to the second segmentation chart, the teacher label and the second real label;
the teacher model unit further includes:
the first down-sampling encoder module is used for extracting the features of the training image to obtain a first feature map;
a first upsampling decoder module, configured to parse the first feature map to obtain the first segmentation map;
the student model unit further includes:
the second downsampling encoder module is used for extracting the features of the training image to obtain a second feature map;
and the second up-sampling decoder module is used for analyzing the second characteristic diagram to obtain the second segmentation diagram.
9. A terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs being configured to be executed by the one or more processors comprises instructions for performing the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for real-time enteroscopic image segmentation based on integrated knowledge distillation of any one of claims 1-7.
CN202010997859.XA 2020-09-21 2020-09-21 Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation Pending CN111932561A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010997859.XA CN111932561A (en) 2020-09-21 2020-09-21 Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation
PCT/CN2020/130114 WO2022057078A1 (en) 2020-09-21 2020-11-19 Real-time colonoscopy image segmentation method and device based on ensemble and knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010997859.XA CN111932561A (en) 2020-09-21 2020-09-21 Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation

Publications (1)

Publication Number Publication Date
CN111932561A true CN111932561A (en) 2020-11-13

Family

ID=73335334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010997859.XA Pending CN111932561A (en) 2020-09-21 2020-09-21 Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation

Country Status (2)

Country Link
CN (1) CN111932561A (en)
WO (1) WO2022057078A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686856A (en) * 2020-12-29 2021-04-20 杭州优视泰信息技术有限公司 Real-time enteroscopy polyp detection device based on deep learning
CN112802023A (en) * 2021-04-14 2021-05-14 北京小白世纪网络科技有限公司 Knowledge distillation method and device for pleural lesion segmentation based on lifelong learning
CN112819831A (en) * 2021-01-29 2021-05-18 北京小白世纪网络科技有限公司 Segmentation model generation method and device based on convolution Lstm and multi-model fusion
CN113343803A (en) * 2021-05-26 2021-09-03 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium
CN113470025A (en) * 2021-09-02 2021-10-01 北京字节跳动网络技术有限公司 Polyp detection method, training method and related device
CN113538480A (en) * 2020-12-15 2021-10-22 腾讯科技(深圳)有限公司 Image segmentation processing method and device, computer equipment and storage medium
WO2022057078A1 (en) * 2020-09-21 2022-03-24 深圳大学 Real-time colonoscopy image segmentation method and device based on ensemble and knowledge distillation
CN114926471A (en) * 2022-05-24 2022-08-19 北京医准智能科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN116091773A (en) * 2023-02-02 2023-05-09 北京百度网讯科技有限公司 Training method of image segmentation model, image segmentation method and device
WO2023212997A1 (en) * 2022-05-05 2023-11-09 五邑大学 Knowledge distillation based neural network training method, device, and storage medium
CN117496509A (en) * 2023-12-25 2024-02-02 江西农业大学 Yolov7 grapefruit counting method integrating multi-teacher knowledge distillation

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677565B (en) * 2022-04-08 2023-05-05 北京百度网讯科技有限公司 Training method and image processing method and device for feature extraction network
CN115760868A (en) * 2022-10-14 2023-03-07 广东省人民医院 Colorectal and colorectal cancer segmentation method, system, device and medium based on topology perception
CN115829983B (en) * 2022-12-13 2024-05-03 广东工业大学 High-speed industrial scene visual quality detection method based on knowledge distillation
CN115965609B (en) * 2023-01-03 2023-08-04 江南大学 Intelligent detection method for flaws of ceramic substrate by utilizing knowledge distillation
CN115908441B (en) * 2023-01-06 2023-10-10 北京阿丘科技有限公司 Image segmentation method, device, equipment and storage medium
CN116385274B (en) * 2023-06-06 2023-09-12 中国科学院自动化研究所 Multi-mode image guided cerebral angiography quality enhancement method and device
CN116993694B (en) * 2023-08-02 2024-05-14 江苏济远医疗科技有限公司 Non-supervision hysteroscope image anomaly detection method based on depth feature filling
CN116825130B (en) * 2023-08-24 2023-11-21 硕橙(厦门)科技有限公司 Deep learning model distillation method, device, equipment and medium
CN117765532B (en) * 2024-02-22 2024-05-31 中国科学院宁波材料技术与工程研究所 Cornea Langerhans cell segmentation method and device based on confocal microscopic image

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506740A (en) * 2017-09-04 2017-12-22 北京航空航天大学 A kind of Human bodys' response method based on Three dimensional convolution neutral net and transfer learning model
CN109325443A (en) * 2018-09-19 2019-02-12 南京航空航天大学 A kind of face character recognition methods based on the study of more example multi-tag depth migrations
CN110033026A (en) * 2019-03-15 2019-07-19 深圳先进技术研究院 A kind of object detection method, device and the equipment of continuous small sample image
CN110472681A (en) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 The neural metwork training scheme and image procossing scheme of knowledge based distillation
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111428191A (en) * 2020-03-12 2020-07-17 五邑大学 Antenna downward inclination angle calculation method and device based on knowledge distillation and storage medium
CN111524124A (en) * 2020-04-27 2020-08-11 中国人民解放军陆军特色医学中心 Digestive endoscopy image artificial intelligence auxiliary system for inflammatory bowel disease

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932561A (en) * 2020-09-21 2020-11-13 深圳大学 Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506740A (en) * 2017-09-04 2017-12-22 北京航空航天大学 A kind of Human bodys' response method based on Three dimensional convolution neutral net and transfer learning model
CN109325443A (en) * 2018-09-19 2019-02-12 南京航空航天大学 A kind of face character recognition methods based on the study of more example multi-tag depth migrations
CN110033026A (en) * 2019-03-15 2019-07-19 深圳先进技术研究院 A kind of object detection method, device and the equipment of continuous small sample image
CN110472681A (en) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 The neural metwork training scheme and image procossing scheme of knowledge based distillation
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111428191A (en) * 2020-03-12 2020-07-17 五邑大学 Antenna downward inclination angle calculation method and device based on knowledge distillation and storage medium
CN111524124A (en) * 2020-04-27 2020-08-11 中国人民解放军陆军特色医学中心 Digestive endoscopy image artificial intelligence auxiliary system for inflammatory bowel disease

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SONG C F 等: "Mask-guided contrastive attention model for person re-identification", 《2018 IEEE/CVF》 *
ZHICHAO HUANG 等: "Real-time Colonoscopy Image Segmentation Based on Ensemble Knowledge Distillation", 《2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM)》 *
袁配配 等: "基于深度学习的行人属性识别", 《激光与光电子学进展》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022057078A1 (en) * 2020-09-21 2022-03-24 深圳大学 Real-time colonoscopy image segmentation method and device based on ensemble and knowledge distillation
CN113538480A (en) * 2020-12-15 2021-10-22 腾讯科技(深圳)有限公司 Image segmentation processing method and device, computer equipment and storage medium
CN112686856A (en) * 2020-12-29 2021-04-20 杭州优视泰信息技术有限公司 Real-time enteroscopy polyp detection device based on deep learning
CN112819831A (en) * 2021-01-29 2021-05-18 北京小白世纪网络科技有限公司 Segmentation model generation method and device based on convolution Lstm and multi-model fusion
CN112819831B (en) * 2021-01-29 2024-04-19 北京小白世纪网络科技有限公司 Segmentation model generation method and device based on convolution Lstm and multi-model fusion
CN112802023A (en) * 2021-04-14 2021-05-14 北京小白世纪网络科技有限公司 Knowledge distillation method and device for pleural lesion segmentation based on lifelong learning
CN113343803A (en) * 2021-05-26 2021-09-03 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium
CN113343803B (en) * 2021-05-26 2023-08-22 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium
CN113470025A (en) * 2021-09-02 2021-10-01 北京字节跳动网络技术有限公司 Polyp detection method, training method and related device
WO2023212997A1 (en) * 2022-05-05 2023-11-09 五邑大学 Knowledge distillation based neural network training method, device, and storage medium
CN114926471A (en) * 2022-05-24 2022-08-19 北京医准智能科技有限公司 Image segmentation method and device, electronic equipment and storage medium
CN116091773A (en) * 2023-02-02 2023-05-09 北京百度网讯科技有限公司 Training method of image segmentation model, image segmentation method and device
CN116091773B (en) * 2023-02-02 2024-04-05 北京百度网讯科技有限公司 Training method of image segmentation model, image segmentation method and device
CN117496509A (en) * 2023-12-25 2024-02-02 江西农业大学 Yolov7 grapefruit counting method integrating multi-teacher knowledge distillation
CN117496509B (en) * 2023-12-25 2024-03-19 江西农业大学 Yolov7 grapefruit counting method integrating multi-teacher knowledge distillation

Also Published As

Publication number Publication date
WO2022057078A1 (en) 2022-03-24

Similar Documents

Publication Publication Date Title
CN111932561A (en) Real-time enteroscopy image segmentation method and device based on integrated knowledge distillation
US10861134B2 (en) Image processing method and device
WO2020168934A1 (en) Medical image segmentation method, apparatus, computer device, and storage medium
CN111325739B (en) Method and device for detecting lung focus and training method of image detection model
CN110110808B (en) Method and device for performing target labeling on image and computer recording medium
CN111429421A (en) Model generation method, medical image segmentation method, device, equipment and medium
CN111091559A (en) Depth learning-based auxiliary diagnosis system for small intestine sub-scope lymphoma
CN110648331B (en) Detection method for medical image segmentation, medical image segmentation method and device
CN111667459B (en) Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion
CN113344896A (en) Breast CT image focus segmentation model training method and system
CN113902945A (en) Multi-modal breast magnetic resonance image classification method and system
CN115423754A (en) Image classification method, device, equipment and storage medium
CN110674726A (en) Skin disease auxiliary diagnosis method and system based on target detection and transfer learning
CN115223193B (en) Capsule endoscope image focus identification method based on focus feature importance
CN110570425B (en) Pulmonary nodule analysis method and device based on deep reinforcement learning algorithm
CN112733873A (en) Chromosome karyotype graph classification method and device based on deep learning
Wang et al. Automatic consecutive context perceived transformer GAN for serial sectioning image blind inpainting
KR102569285B1 (en) Method and system for training machine learning model for detecting abnormal region in pathological slide image
Fu et al. Deep supervision feature refinement attention network for medical image segmentation
CN111209946B (en) Three-dimensional image processing method, image processing model training method and medium
CN117314935A (en) Diffusion model-based low-quality fundus image enhancement and segmentation method and system
CN117095014A (en) Semi-supervised medical image segmentation method, system, equipment and medium
CN117151162A (en) Cross-anatomical-area organ incremental segmentation method based on self-supervision and specialized control
CN111047582A (en) Crohn's disease auxiliary diagnosis system under enteroscope based on degree of depth learning
CN114974522A (en) Medical image processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201113

RJ01 Rejection of invention patent application after publication