CN117197472A

CN117197472A - Efficient teacher and student semi-supervised segmentation method and device based on endoscopic images of epistaxis

Info

Publication number: CN117197472A
Application number: CN202311471582.7A
Authority: CN
Inventors: 赖孟珍; 李军; 陈骏扬; 罗序濠; 王先垚; 郭敬杰; 卫泽东
Original assignee: Sichuan Agricultural University
Current assignee: Sichuan Agricultural University
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2023-12-08
Anticipated expiration: 2043-11-07
Also published as: CN117197472B

Abstract

The invention relates to the field of epistaxis data processing, in particular to a high-efficiency teacher-student semi-supervised segmentation method and device based on epistaxis endoscopic images, which provide accurate prediction data for epistaxis treatment. The scheme comprises the following steps: collecting and enhancing a nasal hemorrhage medical image data set, dividing the nasal hemorrhage medical image data set into two parts according to a set proportion, marking one part of image data, marking the other part of image data, inputting the marked image data into a U-Net detector of a student network, training the student network, performing positive and negative sample distribution on unlabeled data passing through the teacher network according to marked data and unlabeled data by a pseudo-label distributor, setting iteration parameters of a teacher-student network model, inputting the nasal hemorrhage medical image data acquired in real time into the trained teacher-student network model, and predicting the nasal hemorrhage condition. The invention is suitable for predicting epistaxis.

Description

Efficient teacher and student semi-supervised segmentation method and device based on endoscopic images of epistaxis

Technical Field

The invention relates to the field of epistaxis data processing, in particular to a high-efficiency teacher and student semi-supervised segmentation method and device based on epistaxis endoscopic images.

Background

Nasal bleeding is a common medical condition for emergency treatment. When the conservative treatment is ineffective, the nasal cavity local blood vessel cauterization occlusion treatment needs to be guided by the nasal endoscope as soon as possible. In clinical experience of operators, recurrent complications such as nasal ulcers and perforation of nasal septum sometimes occur due to insufficient or excessive cauterization.

A system, device, storage medium for predicting a disease based on nasal bleeding concomitant symptoms as disclosed in the prior art in CN112259220a, the system comprising: the data acquisition module is used for acquiring the nose bleeding accompanying symptoms and the corresponding disease names of the cases from the case library and establishing a nose bleeding disease original data set; the data dividing module is used for dividing the epistaxis into local diseases and systemic diseases and dividing the original data set of the epistaxis diseases; and a vectorization module: for vectorizing symptoms associated with epistaxis; the data clustering module is used for clustering the local disease data set and the whole body disease data set by adopting a clustering algorithm optimized based on an improved whale algorithm respectively; and the disease prediction module is used for carrying out data division and clustering on the cases to be detected and carrying out disease prediction in a clustering type through a mode of calculating semantic similarity.

The improved whale algorithm is adopted to cluster cases, and then disease subdivision is carried out, so that the auxiliary diagnosis speed is improved. However, the main purpose of the scheme is to divide and cluster data of the case to be detected, and the method lacks a fine processing process of the epistaxis data, so that accurate prediction of epistaxis conditions through the epistaxis data acquired in real time cannot be realized.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a high-efficiency teacher and student semi-supervised segmentation method and device based on a nasal hemorrhage endoscope image, which realize accurate prediction of nasal hemorrhage conditions through nasal hemorrhage data acquired in real time and provide accurate prediction data for nasal hemorrhage treatment.

The invention adopts the following technical scheme to achieve the aim, and provides a high-efficiency teacher and student semi-supervised segmentation method based on a nasal hemorrhage endoscope image, which comprises the following steps:

s1, acquiring a nasal hemorrhage medical image data set;

s2, enhancing the acquired nasal hemorrhage medical image data set;

s3, dividing the enhanced data set into two parts according to a set proportion, wherein one part of image data is marked, and the other part of image data is not marked;

s4, inputting the marked image data into a U-Net detector of a student network, and training the student network, wherein the student network uses a gradient descent method to update parameters;

s5, based on the trained student network, semi-supervised training a teacher network according to the marked data and the unmarked data;

s6, positive and negative sample distribution is carried out on the data which passes through the teacher network and is not marked through the pseudo tag distributor;

s7, setting iteration times, batch size and initial learning rate of the teacher-student network model, and performing iterative training to obtain a trained teacher-student network model;

s8, inputting the medical image data of the nasal bleeding obtained in real time into a trained teacher-student network model, and predicting the nasal bleeding condition.

Further, the step S1 specifically includes:

the nasal hemorrhage image data is acquired through the nasal endoscopic equipment, and desensitization treatment is carried out on the acquired nasal hemorrhage image data, wherein the nasal hemorrhage image data comprises blurred view, light reflection, massive hemorrhage of a nasal cavity, punctiform and tendril hemorrhage and vascular malformations.

Further, the step S2 specifically includes:

enhancing the acquired nasal hemorrhage medical image dataset through random angular rotation, brightness adjustment, contrast enhancement, chroma sharpness enhancement and mirror image flip operations; meanwhile, the improved GAN network is used for enhancing and expanding the nasal hemorrhage medical image data set, the improved GAN network is used for fusing the nasal hemorrhage area probability map and the characteristic map, and channel-level connection and Dot-Product Attention mechanism are used for fusing the nasal hemorrhage area probability map with the characteristic representation generated by the encoder in the generator.

Further, in step S3, when the image is labeled, the bleeding sites and abnormal blood vessels are labeled on all the intranasal endoscopic nasal bleeding images, and the bleeding site types are classified into punctate bleeding and tendril bleeding.

Further, the step S5 specifically includes:

inputting unlabeled image data into a teacher network, labeling the unlabeled image data through the teacher network, wherein the teacher network is completely the same as a model structure of a student network, and the teacher network does not use a gradient descent method to update parameters;

the teacher network is obtained by updating the student network by using the exponential average movement, and the teacher network obtains a new model by referring to the weights of the current model and the previous model, and the calculation formula is as follows:

，/>representing a new model->Representing the current model +_>The model before the representation is made,representing the weights.

Further, the step S6 specifically includes:

setting a low threshold value and a high threshold value, classifying unlabeled data passing through a teacher network into a trusted class and an uncertain class through a pseudo tag distributor, wherein the trusted class is a tag larger than the high threshold value and is used for supervised learning and participates in classified reconstruction Loss, cross entropy Loss and Focal Loss calculation; the uncertainty class is a label between a low threshold and a high threshold and is used for unsupervised learning;

the process of calculating the loss function from the pseudo tag is as follows:

the loss function in the detector is the sum of marked data and unmarked data, and the calculation formula is as follows:

，/>representing a loss function of marked data obtained by training a student network, ">Representing a loss function of unlabeled data trained by a teacher's network,>representing a balancing parameter for balancing the two loss functions;

for the sum of reconstruction Loss, cross entropy Loss and Focal Loss of all tag data, the calculation method is as follows:

where x is the output of the student's network,is the result of a pseudo tag dispenser; the reconstruction loss is calculated using the MAE loss and the weighted cross entropy loss for calculating the difference between the output result and the input of the model. Meanwhile, the weighted FocalLoss loss is added;

the method comprises three parts of reconstruction Loss, cross entropy Loss and Focal Loss, and the calculation method is as follows:

，/>representing reconstruction loss->Representing cross entropy loss, < >>Representing Focal Loss.

High-efficient teacher semi-supervised segmentation device based on epistaxis endoscope image, the device includes:

the data acquisition module is used for acquiring a nasal hemorrhage medical image data set;

the data processing module is used for enhancing the acquired nasal hemorrhage medical image data set;

the data labeling module is used for dividing the enhanced data set into two parts according to a set proportion, wherein one part of image data is labeled, the other part of image data is not labeled, and when the image is labeled, the bleeding sites and deformed blood vessels of all the endoscopic nasal bleeding images are labeled, and the bleeding site types are divided into punctiform bleeding and tendril bleeding;

the model training module is used for inputting the marked image data into a U-Net detector of the student network to train the student network, and the student network updates parameters by using a gradient descent method;

based on the trained student network, semi-supervised training a teacher network according to the marked data and the unmarked data;

the data distribution module is used for distributing positive and negative samples of the data which passes through the teacher network and is not marked by the pseudo tag distributor;

the iterative training module is used for setting the iterative times, batch size and initial learning rate of the teacher-student network model to carry out iterative training so as to obtain a trained teacher-student network model;

the prediction module is used for inputting the nasal hemorrhage medical image data acquired in real time into a trained teacher-student network model to predict the nasal hemorrhage condition.

Further, the data acquisition module is specifically used for acquiring and obtaining nasal bleeding image data through nasal endoscope equipment, and desensitizing the acquired nasal bleeding image data, wherein the nasal bleeding image data comprises blurred view, light reflection, massive nasal bleeding, punctiform and tendril bleeding and vascular malformations.

Further, the data processing module is specifically configured to enhance the collected epistaxis medical image dataset through random angular rotation, brightness adjustment, contrast enhancement, chroma sharpness enhancement, and mirror image flip operations; meanwhile, the improved GAN network is used for enhancing and expanding the nasal hemorrhage medical image data set, the improved GAN network is used for fusing the nasal hemorrhage area probability map and the characteristic map, and channel-level connection and Dot-Product Attention mechanism are used for fusing the nasal hemorrhage area probability map with the characteristic representation generated by the encoder in the generator.

The data distribution module is specifically configured to set a low threshold and a high threshold, and then classify the unlabeled data passing through the teacher network into a trusted class and an uncertain class through a pseudo tag distributor, wherein the trusted class is a tag larger than the high threshold, is used for supervised learning, and participates in reconstruction Loss, cross entropy Loss and calculation of Focal Loss; the uncertainty class is a label between a low threshold and a high threshold and is used for unsupervised learning;

the process of calculating the loss function from the pseudo tag is as follows:

for the sum of all the three parts of the reconstruction loss, the cross entropy loss and the FocalLoss of the tag data, the calculation mode is as follows:

where x is the output of the student's network,is the result of a pseudo tag dispenser; reconstruction loss MAE loss was usedThe weighted cross entropy loss is calculated for calculating the difference between the output result and the input of the model. Meanwhile, a weighted FocalLoss is added.

The beneficial effects of the invention are as follows:

the method comprises the steps of collecting a nasal hemorrhage medical image data set, enhancing the collected nasal hemorrhage medical image data set, dividing the data set into two parts according to a set proportion after enhancement treatment, marking one part of image data, marking the other part of image data, inputting the marked image data into a U-Net detector of a student network, training the student network, updating parameters of the student network by using a gradient descent method, training a teacher network based on trained student networks and semi-supervised according to marked data and unmarked data, performing positive and negative sample distribution on the unmarked data passing through the teacher network through a pseudo-tag distributor, setting iteration times, batch size and initial learning rate of a teacher-student network model, obtaining a trained teacher-student network model, and finally inputting the nasal hemorrhage medical image data obtained in real time into the trained teacher-student network model to predict the nasal hemorrhage condition. The method realizes accurate prediction of the nasal bleeding condition through the nasal bleeding data acquired in real time, and provides accurate prediction data for nasal bleeding treatment.

The method and the device fully utilize unlabeled data by introducing ideas of semi-supervised learning and transfer learning, and provide higher flexibility and adaptability by real-time prediction, thereby having obvious advantages in the aspect of accurate prediction of the nasal bleeding condition. Meanwhile, certain saving can be brought in the aspects of data marking cost and time.

Drawings

Fig. 1 is a flowchart of a method for efficiently performing semi-supervised segmentation of teachers and students based on endoscopic images of epistaxis, which is provided by the embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

As shown in fig. 1, the invention provides a high-efficiency teacher and student semi-supervised segmentation method based on a nasal hemorrhage endoscope image, which specifically comprises the following steps:

step one, data acquisition.

Nasal hemorrhage image data was acquired by specialized endoscopic nasal equipment. After the image is acquired, all data is desensitized to ensure that there are no ethical and privacy related issues involved. The data set obtained by the arrangement comprises various conditions in the examination process, including blurred view, reflex, massive hemorrhage of nasal cavities, punctiform and tendril hemorrhage and vascular malformations.

And step two, data processing.

The collected data is subjected to a series of data enhancement operations including random angular rotation, brightness adjustment, contrast enhancement, chroma sharpness enhancement, and mirror-flip. The diversity of the data set can be increased by data augmentation, more spatial diversity is expressed, and the situation in the augmentation operation may occur due to the realistic clinical operation.

Meanwhile, after the GAN network structure is improved, the GAN network structure is used for data enhancement. The improved GAN incorporates both the nasal hemorrhage region probability map and the feature map, and uses channel-level connections and Dot-Product Attention mechanisms to fuse the nasal hemorrhage region probability map with the feature representation generated by the encoder in the generator. Therefore, the attention of the generator to the key area can be enhanced, and the accuracy of the generated result is improved. The GAN network structure expands the data set by generating a virtual sample, so that the robustness of the model can be improved, and the data enhancement is greatly helpful to the performance improvement of the model under the condition that the data set is smaller.

And thirdly, labeling data.

The original RGB color image was saved as JPG (765 x 570,24bit, 960 dpi). The image is divided into the following 9: the proportion of 1 is divided into two parts, 10% of images are marked, and 90% of images are not marked. The labeling of the images utilizes a Labelme image labeling tool based on an Anaconda prompt platform to label bleeding sites and malformed blood vessels of all nasal endoscopic nasal bleeding images, and the bleeding site types are classified into punctiform bleeding and tendril bleeding.

After the labeling is completed, a JSON file is obtained, which contains all the information about the image and the coordinate points used to generate the mask, for a total of 13223027 label pixels. Subsequently, we generate a mask in PNG format for each tag using the written Python script program.

And step four, training a student network on the data set by using supervised training.

Inputting 10% of marked data sets in the third step into a U-Net detector of a student network, wherein the specific operation steps comprise number loading, UNet network construction, data set division, training model, performance evaluation and modulation optimization, and the UNet framework: performing two convolutions, four downsampling convolutions, one upsampling convolution, 3 splicing convolution convolutions, one splicing convolution, and channel adjustment by using a 1x1 convolution kernel, and adjusting the channel number of the final feature layer into the class number of the data set. The current model of the student network is updated using a gradient descent method, and the loss function contains supervised losses (labeled data trained) and unsupervised losses (trained using pseudo labels generated by the teacher network).

The U-Net detector is improved. The improved Encoder part in U-Net consists of an initialization module and four Down Block. The initialization module is similar to the original U-Net downsampling module, but the activation function is Swish. Down Block consists of 2 1×1 convolution blocks, 1 Block convolution, 1 SE Block, 1 max pooling, and 2 convolution blocks.

The jump connection is adopted between the encoder and the decoder, the feature images of the same layer of the encoder and the decoder can be spliced, and finally, more original image semantic information can be contained in the feature images recovered during up-sampling, so that the fine degree of image segmentation is improved.

In order to further improve the accuracy of medical image segmentation, SC-transducer is introduced into the improved jump connection of U-Net for realizing multi-scale feature fusion jump connection. The SC-transducer consists of four transducers, each containing within it a multi-channel cross-attention module, multi-scale feature embedding and multiple MLPs.

And fifthly, training a teacher model based on the student network on the marked data and a large amount of unmarked data.

The teacher network receives the unlabeled data set in 90% of the third step, labels the unlabeled data, and the result predicted by the teacher network is used for guiding training of the student model. The model structures of the teacher network and the student network are identical and are not different in size. Instead of updating parameters using gradient descent, the teacher model is updated using an exponential average movement (EMA) for the student model, i.e., the weights of the current model and the previous model are referenced to obtain a new model, as follows:

And step six, positive and negative sample distribution is carried out on the data which is not marked by the teacher model through the pseudo tag distributor.

Positive samples refer to samples that the model can successfully identify or predict. For example, in a medical image recognition task, if the target is set to a model that can identify a bleeding point in the image, the image of the bleeding point is a positive sample. Conversely, images other than bleeding sites are negative samples.

Specifically, a low threshold T1 and a high threshold T2 are set, and the pseudo tags are classified into two categories, namely, useable and uncertain according to the classification scores of the pseudo tags. The available: tags above the high threshold T2 are considered trusted for supervised learning, participating in reconstruction Loss, cross entropy Loss, and calculation of Focal Loss. uncertain: tags that are intermediate the two thresholds represent uncertainty. We use an unsupervised penalty to make efficient use of the pseudo tag of uncertain.

How to utilize pseudo tags will be explained from the loss function level:

the loss in the detector is the sum of marked data and unmarked data, and the formula is as follows:

，/>the loss obtained by the labeled data through student network training is composed of three parts of reconstruction loss, cross entropy loss and FocalLoss of all label data, and the calculation mode is as follows:

where x is the output of the student's network,is the result of a pseudo tag dispenser; the reconstruction loss is calculated using the MAE loss and the weighted cross entropy loss for calculating the difference between the output result and the input of the model. Meanwhile, the weighted FocalLoss loss is added, so that the learning capacity of the model on the difficult-to-reconstruct part in image reconstruction is enhanced, and the reconstruction effect of the model is effectively improved.

The Loss obtained by training unlabeled data through a teacher network comprises three parts including reconstruction Loss, cross entropy Loss and Focal Loss, and the calculation formula is as follows:

Represents a balancing parameter for balancing the two loss functions.

And step seven, setting parameters such as iteration times, batch size, initial learning rate and the like of the teacher-student network model to carry out iterative training.

Inputting the medical image data of the nasal bleeding acquired in real time into a trained teacher-student network model to predict the nasal bleeding condition.

After inputting the medical image data of the nasal hemorrhage acquired in real time into a trained teacher-student network model, the areas with bleeding symptoms can be predicted, deformed blood vessels can be segmented, and punctiform bleeding labels or tendril bleeding can be distinguished.

The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims

1. The efficient teacher-student semi-supervised segmentation method based on the endoscopic images of the epistaxis is characterized by comprising the following steps of:

s1, acquiring a nasal hemorrhage medical image data set;

s2, enhancing the acquired nasal hemorrhage medical image data set;

2. The method for highly efficient semi-supervised segmentation of teachers and students based on endoscopic images of epistaxis according to claim 1, wherein the step S1 specifically comprises:

3. The method for highly efficient semi-supervised segmentation of teachers and students based on endoscopic images of epistaxis according to claim 1, wherein step S2 specifically comprises:

4. The method for semi-supervised segmentation of a patient based on endoscopic images of epistaxis according to claim 1, wherein in step S3, when the images are labeled, the bleeding sites and malformed blood vessels are labeled on all the images of episodic endoscopic epistaxis, and the types of the bleeding sites are classified into punctiform bleeding and tendril bleeding.

5. The method for highly efficient semi-supervised segmentation of teachers and students based on endoscopic images of epistaxis according to claim 1, wherein step S5 specifically comprises:

，/>representing a new model->Representing the current model +_>Representing the previous model, ++>Representing the weights.

6. The method for highly efficient semi-supervised segmentation of teachers and students based on endoscopic images of epistaxis according to claim 1, wherein step S6 specifically comprises:

setting a low threshold value and a high threshold value, classifying unlabeled data passing through a teacher network through a pseudo tag distributor, and classifying the unlabeled data into a trusted class and an uncertain class, wherein the trusted class is a tag larger than the high threshold value and is used for supervised learning and participating in reconstruction Loss, cross entropy Loss and calculation of Focal Loss; the uncertainty class is a label between a low threshold and a high threshold and is used for unsupervised learning;

the process of calculating the loss function from the pseudo tag is as follows:

where x is the output of the student's network,is the result of a pseudo tag dispenser; the reconstruction loss is calculated by using MAE loss and weighted cross entropy loss, and is used for calculating the difference between the output result and the input of the model, and meanwhile, weighted Focalloss is added;

7. High-efficient teacher semi-supervised segmentation device based on epistaxis endoscope image, its characterized in that, the device includes:

8. The efficient teachers and students semi-supervised segmentation device based on the endoscopic images of the epistaxis according to claim 7, wherein the data acquisition module is specifically used for acquiring epistaxis image data through an episodic endoscopic device and performing desensitization processing on the acquired epistaxis image data, wherein the epistaxis image data comprises visual angle blurring, light reflection, massive nasal bleeding, punctiform and tendril bleeding and vascular malformations.

9. The efficient teachers and students semi-supervised segmentation device based on the endoscopic images of epistaxis according to claim 7, wherein the data processing module is specifically configured to enhance the collected data set of the medical images of epistaxis by random angular rotation, brightness adjustment, contrast enhancement, chroma sharpness enhancement and mirror flip operation; meanwhile, the improved GAN network is used for enhancing and expanding the nasal hemorrhage medical image data set, the improved GAN network is used for fusing the nasal hemorrhage area probability map and the characteristic map, and channel-level connection and Dot-Product Attention mechanism are used for fusing the nasal hemorrhage area probability map with the characteristic representation generated by the encoder in the generator.

10. The efficient teachers and students semi-supervised segmentation device based on the endoscopic images of epistaxis according to claim 7, wherein the data distribution module is specifically used for setting a low threshold and a high threshold, and classifying the unlabeled data passing through a teacher network into a trusted class and an uncertain class through a pseudo-label distributor, wherein the trusted class is a label larger than the high threshold and used for supervised learning, participating in reconstruction Loss, cross entropy Loss and calculation of Focal Loss; the uncertainty class is a label between a low threshold and a high threshold and is used for unsupervised learning;

the process of calculating the loss function from the pseudo tag is as follows: