CN115641443B

CN115641443B - Method for training image segmentation network model, method for processing image and product

Info

Publication number: CN115641443B
Application number: CN202211573797.5A
Authority: CN
Inventors: 宋凯敏; 贺婉佶; 史晓宇; 王界闻; 张弘
Original assignee: Beijing Airdoc Technology Co Ltd
Current assignee: Beijing Airdoc Technology Co Ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-04-11
Anticipated expiration: 2042-12-08
Also published as: CN115641443A

Abstract

The disclosure discloses a method for training an image segmentation network model, a method for processing images and a product. The method for training the image segmentation network model comprises the steps of training the image segmentation network model by using an initial label in a medical image to obtain an initial training weight; initializing network weight of a label denoising network model based on the initial training weight; performing label denoising processing on the medical image based on the initialized label denoising network model; and determining the segmentation result of the image segmentation network model according to the output result of the label denoising network model so as to realize the optimization training of the image segmentation network model. According to the scheme disclosed by the invention, the influence of the label noise on the model training result can be effectively reduced, so that the accuracy of the model segmentation result is improved.

Description

Method for training image segmentation network model, method for processing image and product

Technical Field

The present disclosure relates generally to the field of image processing technology. More particularly, the present disclosure relates to a method of training an image segmentation network model, a method of processing an image using an image segmentation network model, an electronic device, and a computer-readable storage medium.

Background

In the field of machine learning, it is very common to use labeled data to train neural networks and then perform classification, regression, or other purposes, but in the related model learning, if a large amount of erroneous data (such as label noise) is used, the accuracy of the final prediction model is affected. For example, in the field of medical image analysis, the degree of lesion and corresponding location are different in etiology, and the accuracy of model training is affected by slightly deviating markers. Particularly, for the current segmentation task with the labeled noise, the accuracy of the segmentation result of the model is poor, and the actual application requirements cannot be met.

In view of the above, it is desirable to provide a scheme for training an image segmentation network model so as to improve accuracy of a model segmentation result.

Disclosure of Invention

To address at least one or more of the technical problems mentioned above, the present disclosure proposes, in various aspects, a scheme of training an image segmentation network model.

In a first aspect, the present disclosure provides a method of training an image segmentation network model, the method comprising: training the image segmentation network model by using an initial label in a medical image to obtain an initial training weight; initializing a network weight of a label denoising network model based on the initial training weight; performing label denoising processing on the medical image based on the initialized label denoising network model; and determining the segmentation result of the image segmentation network model according to the output result of the label denoising network model so as to realize the optimization training of the image segmentation network model.

In some embodiments, wherein the label denoising network model comprises a teacher model and a student model, the network weight initialization of the label denoising network model based on the initial training weights comprises: and respectively carrying out network weight initialization on the teacher model and the student model based on the initial training weights.

In some embodiments, performing label denoising processing on the medical image based on the initialized label denoising network model comprises: obtaining a target soft tag loss function and a target hard tag loss function in a tag denoising processing stage; training the student model based on the target soft tag loss function and the target hard tag loss function; and determining an output result of the label denoising network model based on the student labels output by the trained student model.

In some embodiments, obtaining the target soft tag loss function at the tag denoising processing stage comprises: calculating a first soft label loss function for the teacher model and the student model for a background class and a non-background class in the medical image; computing a second soft label loss function for the teacher model and the student model for non-background classes in the medical image; determining the target soft tag loss function based on the first soft tag loss function and the second soft tag loss function.

In some embodiments, wherein the first soft label loss function comprises a soft label loss function within a first threshold range, the second soft label loss function comprises a soft label loss function within a second threshold range, training the student model based on the target soft label loss function comprises: in the process of training the student model, selecting a soft label loss function within a first threshold value range and a soft label loss function within a second threshold value range from the target soft label loss functions, and returning the selected soft label loss functions without adding the reverse return of the loss functions.

In some embodiments, obtaining a target hard tag loss function for a tag denoising processing stage comprises: acquiring a teacher label output by the teacher model; acquiring student labels output by the student models in a training stage; and calculating a hard tag loss function between the teacher tag and the student tags to obtain the target hard tag loss function.

In some embodiments, training the teacher model based on the target hard tag loss function comprises: obtaining an exponential moving average weight of the student model; and updating the network weight of the teacher model based on the exponential moving average weight.

In a second aspect, the present disclosure provides a method of processing an image using an image segmentation network model, wherein the image segmentation network model is trained according to the method of the first aspect of the present disclosure, the method of processing an image using an image segmentation network model comprising performing a segmentation process on a medical image based on the image segmentation network model to obtain an initial segmentation result, wherein the initial segmentation result comprises a marked lesion region; and determining the pathological grade of the medical image according to the focus area in the initial segmentation result.

In some embodiments, wherein the medical image comprises a fundus image, determining the pathology level to which the medical image belongs from the lesion region in the initial segmentation result comprises: and determining the pathological myopia grade of the fundus image according to the marked focus area and/or the position of the focus area in the fundus image.

In a third aspect, the present disclosure provides an electronic device comprising: a processor; and a memory storing computer instructions for training an image segmentation network model and/or computer instructions for processing an image using an image segmentation network model, which when executed by the processor, cause the electronic device to perform the method according to the first aspect of the disclosure and/or the method according to the second aspect of the disclosure.

By the scheme for training the image segmentation network model, the label denoising processing is performed on the medical image by introducing the label denoising network model in the training process of the image segmentation network model, so that the influence of label noise on the model training result can be effectively reduced, and the accuracy of the model segmentation result is improved. Further, in some embodiments, the student model is trained by the target soft label loss function determined for the background class and the non-background class in the medical image and each non-background class, and the loss function values corresponding to part of samples which are easy to cause confusion can be discarded in the reverse return process of the loss function, so that a good label denoising effect is achieved, and the false positive probability of the whole model is reduced. Further, in some embodiments, in the course of training the teacher model, the network weight of the teacher model is updated by using the exponential moving average weight of the student model, so that a faster feedback loop can be provided between the student and the teacher model, thereby effectively improving the accuracy of the output result of the model. In particular, for training samples of relatively large magnitude, errors in small batches are unlikely to cause training bias. In addition, in some embodiments, an initial segmentation result is generated through a trained image segmentation network model, and the pathological grade of the medical image is determined according to the initial segmentation result, especially for the determination of the pathological myopia grade of the fundus image, the problem of large error caused by artificial grade determination can be avoided, so that the efficiency and the accuracy of pathological myopia grade division are effectively improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to like or corresponding parts and in which:

FIG. 1 illustrates an exemplary flow diagram of a method of training an image segmentation network model of an embodiment of the present disclosure;

FIG. 2 illustrates an exemplary flow diagram of a method of processing an image using an image segmentation network model according to an embodiment of the present disclosure;

FIG. 3 illustrates an architectural diagram of a system for processing an image using an image segmentation network model in accordance with an embodiment of the present disclosure;

FIG. 4 illustrates an architectural diagram of a system for training an image segmentation network model in accordance with an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of an initial segmentation result of an embodiment of the present disclosure; and

fig. 6 shows an exemplary structural block diagram of an electronic device of an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, not all embodiments of the present disclosure. All other embodiments, which can be derived by one skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In the field of machine learning, neural networks are trained using labeled data to accomplish various tasks. In the related model learning, the accuracy of the model is affected once the training data used contains wrong information. In this regard, it has been found through research that error information contained in data can be classified into feature noise and tag noise. The feature noise refers to a deviation between a feature of a training sample instance itself and a true feature thereof, for example, gaussian noise is artificially added to an existing feature. And label noise refers to the deviation between the target label used for training and the real label of the corresponding instance, such as the class to which the label is set by mistake when the label is set. Both of these noise types result in a degradation of the model, where the tag noise is relatively more harmful. For example, in expression recognition, uncertainty of high-quality facial expressions to low-quality micro-expressions can lead to labeling inconsistencies and even labeling errors, ultimately reducing the final classification performance of the system. For another example, in the field of medical image analysis, the degree of lesion and the corresponding location have different etiologies, and a mark with a slight deviation may affect the final analysis result. Also for example, in the military field, tanks and self-propelled grenades and the like are also susceptible to being mislabeled due to similarity in appearance, resulting in false detections in target recognition scenarios. In semantic segmentation, for example, the labeling rules are not reasonable, so that the training model is not good. From this, the processing importance of the tag noise can be seen. However, the current label denoising processing means is more and learning is performed according to the noise level, and such denoising means is not suitable for segmenting the medical image with the label noise.

In view of this, the embodiments of the present disclosure provide a scheme for training an image segmentation network model, where label denoising processing is performed on a medical image through an initialized label denoising network model, so that label denoising processing can be performed in a training process of the image segmentation network model, and influence of label noise on a model training result is effectively reduced, thereby improving accuracy of the model segmentation result.

Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates an exemplary flow diagram of a method of training an image segmentation network model according to some embodiments of the present disclosure.

At step S101, the image segmentation network model may be trained with initial labels in the medical image to obtain initial training weights. The image segmentation network model can adopt various network segmentation models. For example, in some embodiments, a U-connected full convolutional neural network (Unet network for short), a pyramid pooling model (pspnet for short), a hole model (depeplab for short), and the like may be employed. In the application process, the medical image may be input into the image segmentation network model, and the initial training weight is obtained by training the image segmentation network model with the initial label in the medical image.

Next, at step S102, a network weight initialization may be performed on the label denoising network model based on the initial training weights. In some embodiments, an initial label in the medical image is used to perform initial training on an image segmentation network model such as pspnet and the like to obtain an initial training weight, and then the initial training weight is used to initialize a label denoising network model, so that label noise in the medical image segmentation task can be effectively denoised based on the label denoising network model.

Next, at step S103, a label denoising process with respect to the medical image may be performed based on the initialized label denoising network model. Considering that confusion is easy to occur between the background class and the non-background class and between the non-background classes in the medical image due to the label noise, the label denoising network model can be used for carrying out label denoising processing on the medical image in a targeted manner so as to reduce the influence of the label noise on the segmentation result. The non-background class in the medical image may be understood as a lesion region, the inter-class of the non-background class may be understood as a region between different lesion regions, and the background class may be understood as a non-lesion region.

Finally, in step S104, a segmentation result of the image segmentation network model may be determined according to the output result of the label denoising network model, so as to implement optimization training of the image segmentation network model.

Therefore, label denoising processing is carried out on the medical image by introducing the label denoising network model in the training process of the image segmentation network model, the influence of label noise on a model training result can be effectively reduced, and the accuracy of the model segmentation result is improved.

Further, in some embodiments, the aforementioned label denoising network model may employ an average teacher model for semi-supervised learning, which may employ a teacher model and a student model. In some embodiments, the teacher model and the student model may be initialized with network weights separately and simultaneously with initial training weights. In some implementation scenarios, the initial training weights may employ a hard tag loss function, such as a Cross entropy loss function (ce-loss). It should be noted that, the description of details related to the label denoising network model is only an exemplary illustration, and the technical solution of the present disclosure is not limited thereto.

Further, in some embodiments, in the process of performing label denoising processing on the medical image by using the label denoising network model, a target soft label loss function and a target hard label loss function in a label denoising processing stage may be specifically obtained, and then the student model is trained based on the target soft label loss function and the target hard label loss function. And finally, determining an output result of the label denoising network model based on the student labels output by the trained student model. Wherein the target soft tag loss function is understood to be a soft tag and the target hard tag loss function is understood to be a hard tag. Soft labels differ from traditional hard labels (only the value 0 or 1), soft labels being understood in particular as defining the weight of a sample according to probability. The soft label may contain more useful information that represents the degree of approval for each category. In some specific implementation scenarios, the target soft tag loss function may adopt a soft tag loss function such as KL divergence loss function (KL-loss), and the target hard tag loss function may adopt a hard tag loss function such as ce-loss. It should be noted that, the specific types of the target hard tag loss function and the target hard tag loss function are not limited herein, and may be specifically adjusted according to the requirement.

Furthermore, in the process of obtaining the target soft label loss function in the label denoising processing stage, the first soft label loss function related to the teacher model and the student model may be calculated specifically for the background class and the non-background class in the medical image. Next, a second soft label loss function is computed for non-background classes in the medical image with respect to the teacher model and the student model. The target soft tag loss function is then determined based on the first soft tag loss function and the second soft tag loss function. In some embodiments, the first soft tag loss function comprises a soft tag loss function within a first threshold range and the second soft tag loss function comprises a soft tag loss function within a second threshold range. At the moment, in the process of training the student model, a soft label loss function within a first threshold value range and a soft label loss function within a second threshold value range are selected from the target soft label loss functions, and the selected soft label loss functions are not added with the reverse return of the loss functions.

Taking KL divergence loss function as a soft label loss function as an example, the determination process of the target soft label loss function and the training process of the student model by using the target soft label loss function are further explained.

First, KL-loss can be calculated for the teacher model and the student model for the background class and the non-background class in the medical image. Within the same batch, KL-loss within the first threshold range may be selected. Specifically, all the calculated values of KL-loss are ranked from high to low in the same batch, and then the KL-loss ranked in the last 10% range is selected. KL-loss may be calculated for the teacher model and the student model for each of the non-background classes. Within the same batch, KL-loss within a second threshold range may be selected. Specifically, all the calculated KL-loss values are ranked from high to low in the same batch, and then the KL-loss ranked in the last 5% range is selected. Then, in the same batch, the union of the KL-loss in the range of the last 10% and the KL-loss in the range of 5% is selected, the union is understood as the label noise, and the inverse feedback of the loss function is not added. It should be noted that the above description of the KL-loss and the specific range values (e.g., 10%, 5%, etc.) used is only exemplary, and the specific type of the required soft tag loss function and the corresponding value range can be specifically adjusted according to the task splitting requirement.

In a general image segmentation task, the segmentation objects of different classes have larger inter-class distances, and the segmentation objects of different classes can be effectively distinguished based on the inter-class distances. In the medical image segmentation task, especially in the pathological myopia segmentation task, the inter-class spacing between non-background classes (i.e., between lesions of each class) is small, and the loss function which does not participate in the reverse return is locked only by the inter-class spacing between non-background classes, which additionally results in discarding some effective samples. In this regard, the inventors have further found that the inter-class distance between the background class and the non-background class is relatively large, and the above valid samples that may be discarded additionally can be well distinguished between the background class and the non-background class. Therefore, in the label denoising processing process of the medical image, the effectiveness of the sample in distinguishing the background class, the non-background class and the non-background class is fully considered, the union of the soft label loss function in the first threshold range and the soft label loss function in the second threshold range is used as the label noise, and the label noise is not added when the loss function reversely passes back, so that the label noise of the medical image is effectively denoised.

Further, when the target hard tag loss function in the tag denoising processing stage is obtained, a teacher tag output by the teacher model and a student tag output by the student model in the training stage can be obtained, and then the target hard tag loss function is determined by using the hard tag loss function between the teacher tag and the student tag. In some embodiments, a cross-entropy loss function between the teacher label and the student label may be specifically calculated, and the calculated cross-entropy loss function may be used as the target hard label loss function.

Further, in the training process of the teacher model, the exponential moving average weight of the student model can be obtained first. The teacher model's network weights are then updated based on the aforementioned exponential moving average weights. The teacher model here uses the exponential moving average weights (EMA weights for short) of the student models, instead of sharing the weights with the student models. Thus, the information can be integrated at each step rather than at each time interval. Furthermore, the entire model has a better intermediate representation because the weight average improves all layer outputs, not just the top output. Therefore, faster feedback circulation can be promoted between the student model and the teacher model, and the test accuracy is improved. In addition, for training samples with larger orders of magnitude, errors of small batches are not easy to cause training deviation.

It can be seen that, by performing display processing and implicit processing on the tag noise of the medical image by using the tag denoising network model, the method and the device can effectively reduce the influence of the tag noise on the medical segmentation network, reduce the false positive probability of the segmentation result, and the like. Specifically, a label denoising network model is constructed by utilizing a teacher model and a student model to realize explicit processing of label noise of the medical image. Meanwhile, in the denoising process, a soft tag loss function and a hard tag loss function are adopted to train the student model respectively, and EMA weights of the student model are used to update network weights of the teacher model, so that the implicit processing of tag noise is realized.

FIG. 2 illustrates an exemplary flow chart of a method of processing an image using an image segmentation network model according to an embodiment of the disclosure. It is to be understood that the image segmentation network model herein can be obtained by using the foregoing training method in conjunction with the method 100 in fig. 1. Therefore, the detailed description about the image segmentation network model in the foregoing is also applicable to the following.

At step S201, a medical image may be subjected to a segmentation process based on the image segmentation network model to obtain an initial segmentation result. As described above, the image segmentation network model herein may employ any one of the Unet network, the pspnet, the depelab, and the like. The image segmentation network model obtained through the training of the image segmentation method in the figure 1 is used for segmenting the medical image, so that the influence of the label noise in the medical image on the segmentation result can be effectively reduced, and the accuracy of the initial segmentation result is ensured. In some embodiments, the initial segmentation result may include a marked lesion region. In some implementations, fig. 5 illustrates one possible form of the initial segmentation result. In this scenario, fig. 5 shows the initial segmentation result of a fundus image, where 501, 503, and 503 denote non-background classes (i.e., focal regions), and 502 denotes a background class (i.e., non-focal regions).

Different lesion areas

501, 503 and 503 may be separately marked.

Then, at step S202, a pathology grade to which the medical image belongs may be determined according to the lesion area in the initial segmentation result. After the initial segmentation result is obtained, the lesion region marked in the initial segmentation result can be further used to determine the pathological grade. Therefore, accurate division of case grades is achieved by means of accurate initial segmentation results, pathological classification of medical images is not needed to be carried out manually through experience, and the problem that classification is not accurate due to manual reasons is effectively avoided.

In some embodiments, the aforementioned medical image may specifically comprise a fundus image, and the task of segmenting the fundus image may specifically comprise segmentation of a pathological myopic lesion region. Namely, the fundus image is segmented by the image segmentation network model to obtain the segmentation result about the pathological myopia. In some implementations, the segmentation result for pathological myopia may include lesion regions such as background type, non-background type of arc pigment spot, choroidal arc spot, sclera arc spot, diffuse atrophy focus, patch atrophy focus, etc. If the decomposition result contains the focus areas, the focuses can be marked on the fundus picture (for example, the focus areas are circled by lines to realize the marking of the focuses). And then, according to the marked focus area and/or the position of the focus area in the fundus image, determining the pathological myopia grade of the fundus image. In some implementations, the pathological myopia level may be divided into multiple levels, such as four levels, one, two, three, and four (the division of levels herein is merely illustrative). If the initial segmentation result contains the arc-shaped spots, the fundus image can be determined to belong to the first level. If the initial segmentation result contains diffuse atrophy but does not contain plaque atrophy, the fundus image is determined to belong to the second level. If the initial segmentation result contains patch atrophy, the position of the patch atrophy area needs to be further determined, and for example, if the patch atrophy area is not in a macular area, the fundus image is determined to belong to the third level. And if the patch atrophy area is in a macular area, determining that the fundus image belongs to the fourth grade. It should be noted that the fundus image and the task of segmenting the pathological myopia are taken as examples for explanation, and the solution of the present disclosure is not limited to this. The medical image in the present disclosure may also be an image of other parts where there is a need for image segmentation, and the segmentation task for the image may be segmentation and grading needs for some pathologies with progressive development features of the fundus image or other medical images.

FIG. 3 illustrates an architectural diagram of a system 300 for processing an image using an image segmentation network model according to an embodiment of the present disclosure. The system 300 can be understood to be a specific system for implementing the methods described above in connection with fig. 1 and 2. Therefore, the same applies to the following description in connection with the details of fig. 1 and 2.

As shown in fig. 3, the system 300 includes an image segmentation network model 301, a label denoising network model (including a teacher model 302 and a student model 303), and a pathological myopia grading module 304. Wherein, in the training phase, the network model 301, the teacher model 302 and the student model 303 can be segmented using images in the system 300. While in the inference stage, the student model 303 and the pathological myopia grading module 304 in the system 300 are mainly used.

In some embodiments, the image segmentation network model 301 may employ any one of a Unet network, a pspnet, a deplab, and the like. Preferably, pspnet may be employed, which has significant advantages over other network models for the segmentation task of medical images (in particular for pathological myopic segmentation tasks of fundus images). The last layer of the pspnet network model can adopt a softmax excitation function, for example, and the pspnet network model can realize the segmentation of focus areas such as background class, pigment arc spot, choroid arc spot, sclera arc spot, diffuse atrophy focus, patch atrophy focus and the like aiming at the fundus image.

The label denoising network model specifically includes a teacher model 302 and a student model 303. The image segmentation network model can be optimized by using the teacher model 302 and the student model 303. Specifically, in the training stage, after the medical image serving as the training sample is input to the image segmentation network model 301, the image segmentation network model 301 may be trained by using the initial labels in the medical image to obtain the initial training weights. The teacher model 302 and the student models 303 are then initialized with network weights using the initial training weights. Then, the initialized teacher model 302 and the initialized student model 303 perform label denoising processing on the medical image in an explicit mode and an implicit mode, and determine an initial segmentation result according to an output result of the student model 303, thereby implementing optimization of the segmentation result of the image segmentation network model 301 (the detailed optimization process may refer to the description in fig. 4 below).

In the inference phase, a medical image to be processed (e.g., a fundus map) may be input to the student model 303, and an initial segmentation result is output based on the student model 303. This initial segmentation result is then input into the pathological myopia grading model 304. The pathological myopia classification model 304 is a pre-trained neural network model, which can determine the pathological myopia grade of the fundus image according to the lesion area marked in the initial segmentation result and/or the position of the lesion area in the fundus image. For example, if the initial segmentation result contains diffuse atrophy or macular atrophy, the grade of the fundus image can be determined directly from the marked lesion region. If the initial segmentation result contains patch atrophy, the grade of the fundus image needs to be determined by further combining the position of the patch atrophy area in the fundus image. Therefore, the system can reduce the influence of label noise on the medical segmentation network (including reducing the false positive probability of the segmentation network, reducing the influence of label missing/label segmentation edge errors on the performance of the segmentation network, improving the performance of segmentation average cross-over ratio and improving the performance of case myopia classification). In addition, the problem of swing caused by confusion of two adjacent focus grades due to pathological myopia is effectively solved through the system without manual identification, so that the grading accuracy of the pathological myopia is effectively improved, and the misdiagnosis rate of the pathological myopia is reduced.

The optimization process of the image segmentation network model 301 is further explained below in conjunction with fig. 4. FIG. 4 illustrates a specific architecture of a system 400 for training an image segmentation network model according to an embodiment of the present disclosure.

As shown in fig. 4, the system 400 specifically includes an image segmentation network model, a teacher model, and a student model. At step S1, the medical image may be input into an image segmentation network model (e.g. pspnet). At step S2, the teacher model and the student models are simultaneously initialized with initial training weights, respectively, with network weights. At step S3, KL-loss may be calculated for the teacher model and the student model for the background class and the non-background class in the medical image. Within the same batch, KL-loss within a first threshold range (e.g., ranked within the last 10%) may be selected. KL-loss may be calculated for the teacher model and the student model for each of the non-background classes. Within the same batch, KL-loss within a second threshold range (e.g., ranked within the last 5%) may be selected. Then, in the same batch, a union of the KL-loss within the first threshold range and the KL-loss within the second threshold range is selected, the union is understood as the tag noise, and the inverse feedback of the loss function is not added. It should be noted that the above description of the KL-loss and the specific range values (e.g., 10%, 5%, etc.) used is only an exemplary description, and the specific type of the required soft tag loss function and the corresponding value range can be specifically adjusted according to the task splitting requirement.

At step S4, the teacher model outputs the teacher label and the student model outputs the student model, calculates a target hard label loss function between the student label and the teacher label, and trains the student model based on the target hard label loss function (e.g., a ce-loss hard label loss function). It should be noted that the training of the student model includes the training using the target soft tag loss function in S3 and the training using the target hard tag loss function in S4. In addition, for the training of the teacher model, the EMA weight of the student model can be obtained firstly. The teacher model's network weights are then gradually updated based on the EMA weights previously described.

At step S5, the segmentation result is determined using the student labels output by the trained student model. Therefore, label denoising processing is carried out on the medical image by introducing the label denoising network model (comprising a teacher model and a student model) in the training process of the image segmentation network model, so that the influence of label noise on the model training result can be effectively reduced, and the accuracy of the model segmentation result is improved.

The disclosed embodiment also provides an electronic device. As shown in fig. 6, electronic device 600 may include a processor 601 and a memory 602. Wherein the memory 602 stores computer instructions for training the image segmentation network model, which when executed by the processor 601, cause the electronic device 600 to perform the method according to the preceding description in connection with fig. 1, and/or the memory 602 stores computer instructions for processing an image using the image segmentation network model, which when executed by the processor 601, cause the electronic device 600 to perform the method according to the preceding description in connection with fig. 2. For example, in some embodiments, the electronic device 600 may introduce the label denoising network model in a training process of the image segmentation network model to perform label denoising processing on the medical image, and/or perform segmentation processing on the medical image by using the trained image segmentation network model to obtain an initial segmentation result, and perform determination of a pathology grade based on the initial segmentation result, and the like. Based on this, the electronic device 600 can effectively reduce the influence of the label noise on the model training result, thereby improving the accuracy of the model segmentation result and/or effectively improving the efficiency and the accuracy of pathological myopia grade division.

While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that equivalents or alternatives within the scope of these claims be covered thereby.

Claims

1. A method of training an image segmentation network model, the method comprising:

training the image segmentation network model by using an initial label in a medical image to obtain an initial training weight;

performing network weight initialization on a label denoising network model based on the initial training weight, wherein the label denoising network model comprises a teacher model and a student model, and performing network weight initialization on the teacher model and the student model respectively based on the initial training weight;

executing label denoising processing on the medical image based on the initialized label denoising network model, specifically comprising obtaining a target soft label loss function and a target hard label loss function in a label denoising processing stage, training the student model based on the target soft label loss function and the target hard label loss function, and determining an output result of the label denoising network model based on student labels output by the trained student model; and

and determining the segmentation result of the image segmentation network model according to the output result of the label denoising network model so as to realize the optimization training of the image segmentation network model.

2. The method of claim 1, wherein obtaining a target soft label loss function for a label denoising stage comprises:

calculating a first soft label loss function for the teacher model and the student model for a background class and a non-background class in the medical image;

computing a second soft label loss function for the teacher model and the student model for non-background classes in the medical image;

determining the target soft tag loss function based on the first soft tag loss function and the second soft tag loss function.

3. The method of claim 2, wherein the first soft label loss function comprises a soft label loss function within a first threshold range, wherein the second soft label loss function comprises a soft label loss function within a second threshold range, and wherein training the student model based on the target soft label loss function comprises:

in the process of training the student model, selecting a soft label loss function within a first threshold value range and a soft label loss function within a second threshold value range from the target soft label loss functions, and returning the selected soft label loss function without adding the reverse return of the loss function.

4. The method of claim 1, wherein obtaining a target hard tag loss function for a tag denoising stage comprises:

acquiring a teacher label output by the teacher model;

acquiring student labels output by the student model in a training stage; and

calculating a hard tag loss function between the teacher tag and the student tags to obtain the target hard tag loss function.

5. The method of claim 4, further comprising:

obtaining an exponential moving average weight of the student model; and

updating the teacher model's network weights based on the exponential moving average weights.

6. A method of processing an image using an image segmentation network model trained according to the method of any one of claims 1 to 5, comprising:

performing segmentation processing on the medical image based on the image segmentation network model to obtain an initial segmentation result, wherein the initial segmentation result comprises a marked focus region; and

and determining the pathological grade of the medical image according to the focus area in the initial segmentation result.

7. The method of claim 6, wherein the medical image comprises a fundus image, and wherein determining the pathology level to which the medical image belongs based on the lesion area in the initial segmentation result comprises:

and determining the pathological myopia grade of the fundus according to the marked focus area and/or the position of the focus area in the fundus image.

8. An electronic device, comprising:

a processor; and

a memory storing computer instructions for training an image segmentation network model and/or computer instructions for processing an image with an image segmentation network model, which when executed by the processor, cause the electronic device to perform the method of any one of claims 1-5 and/or the method of claim 6 or 7.

9. A computer-readable storage medium containing program instructions for training an image segmentation network model and/or program instructions for processing an image using an image segmentation network model, which program instructions, when executed by a processor, cause the method according to any one of claims 1-5 and/or the method according to claim 6 or 7 to be carried out.