CN116503684A

CN116503684A - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN116503684A
Application number: CN202310111927.1A
Authority: CN
Inventors: 黄秋婧; 赵培泉; 冯伟; 琚烈; 马彤; 张大磊
Original assignee: XinHua Hospital Affiliated To Shanghai JiaoTong University School of Medicine; Beijing Airdoc Technology Co Ltd
Current assignee: XinHua Hospital Affiliated To Shanghai JiaoTong University School of Medicine; Beijing Airdoc Technology Co Ltd
Priority date: 2023-02-06
Filing date: 2023-02-06
Publication date: 2023-07-28

Abstract

The application provides a model training method, a model training device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a fundus image sample, comprising: a first number of stage-tagged ROP fundus images and a second number of stage-untagged ROP fundus images; inputting a first number of fundus image samples into a fundus image stage prediction model to obtain stage prediction output, and calculating a classification loss value; calculating a predicted consistency loss value based on the predicted output of the fundus image staging prediction model on the first number of fundus image samples and the second number of fundus image samples; calculating to obtain a semantic association consistency loss value based on the characteristics extracted from the fundus image sample by the fundus image stage prediction model; calculating a target loss value based on the classification loss value, the predicted consistency loss value and the semantic association consistency loss value; and under the condition that the target loss value is in a preset range, obtaining a final stage prediction model. The method and the device can improve the classification performance and the recognition accuracy of the model.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of fundus image processing technologies, and in particular, to a model training method, a device, an electronic apparatus, and a storage medium.

Background

Retinopathy of prematurity (ROP) is a proliferative disease during retinal development in premature infants. It remains a major cause of blindness in children worldwide. Ophthalmic examinations of premature infants require frequent and close monitoring, resulting in a large amount of manual image reading effort. Clinically, however, there are subjective and diagnostic differences in the diagnosis of stage 1, 2, and 3 of ocular fundus. If ROP is not found and treated in a timely manner, it can lead to late ROP, resulting in poor vision and even blindness. In addition, the experience of pediatric ophthalmic doctors is in short supply, mostly concentrated in metropolitan or large medical centers. Infants residing in remote areas must travel long distances before they can be transferred, which delays treatment. Therefore, telemedicine and computer-aided ROP fundus image reading are of great significance.

Currently, in the field of ROP image analysis, many deep learning models are also proposed for computer-aided screening and diagnosis. However, deep learning methods often require a significant amount of annotation data for model training. In a clinical setting, annotating large amounts of data is often time consuming and laborious, even for experienced doctors. For annotation of ROP fundus images, a clinician typically determines the phase of ROP from the shape and size of the spine in the ROP fundus image. The ROP fundus images at stage 4 or stage 5 are easily distinguished, whereas the normal fundus image, ROP image at stage 1 or ROP image at stage 2 is not obvious due to the early stage of the disease, and may be misinterpreted by the clinician, thus increasing the burden on the clinician.

Disclosure of Invention

The embodiment of the application provides a model training method, a model training device, an electronic device and a model training storage medium, which are used for solving the problems that a large amount of annotation data are required for a deep learning method in the prior art, time and labor are wasted, and annotation errors are possibly caused by misunderstanding of a clinician on fundus images at an early stage, so that the burden of the clinician is increased.

In order to solve the above technical problems, embodiments of the present application are implemented as follows:

in a first aspect, an embodiment of the present application provides a model training method, where the method includes:

acquiring a fundus image sample; the fundus image sample includes: a first number of stage-tagged ROP fundus images and a second number of stage-untagged ROP fundus images, the first number being less than the second number;

inputting a first number of fundus image samples into a fundus image stage prediction model to obtain stage prediction output of the first number of fundus images, and calculating a classification loss value;

processing the first number of fundus image samples and the second number of fundus image samples based on the fundus image stage prediction model to obtain prediction output, and calculating to obtain a prediction consistency loss value of the fundus image stage prediction model;

Calculating a semantic association consistency loss value of the fundus image stage prediction model according to the characteristics extracted from the fundus image sample based on the fundus image stage prediction model;

calculating a target loss value of the fundus image stage prediction model based on the classification loss value, the prediction consistency loss value and the semantic association consistency loss value;

and under the condition that the target loss value is in a preset range, taking the converged fundus image stage prediction model as a final stage prediction model.

Optionally, the acquiring a fundus image sample includes:

acquiring an original fundus image sample;

the size of the original fundus image sample is adjusted to be a set size, and a processed fundus image sample is obtained;

and performing data enhancement operation on the processed fundus image sample to obtain the fundus image sample.

Optionally, the fundus image stage prediction model includes: a first stage predictor module and a second stage predictor model, the first stage predictor model comprising: a first feature extraction layer and a first fully connected layer having N output units, the second temporal predictor model comprising: a second feature extraction layer and a second full connection layer with N output units, N is a positive integer,

The processing the first number of fundus image samples and the second number of fundus image samples based on the fundus image stage prediction model to obtain prediction output comprises the following steps:

applying first random noise to the ROP fundus image without the stage label to obtain a first fundus image sample;

applying second random noise to the ROP fundus image without the stage label to obtain a second fundus image sample;

inputting the first fundus image sample to the first phase predictor model and the second fundus image sample to the second phase predictor model;

invoking the first feature extraction layer to extract high-level semantic features from the first fundus image sample, invoking the first full-connection layer to conduct stage prediction on the first fundus image sample based on the high-level semantic features, and outputting a first stage prediction label of the first fundus image sample;

and calling the second feature extraction layer to extract high-level semantic features from the second fundus image sample, calling the second full-connection layer to carry out stage prediction on the second fundus image sample based on the high-level semantic features, and outputting a second stage prediction label of the second fundus image sample.

Optionally, the calculating obtains a predicted consistency loss value of the fundus image stage prediction model, including:

acquiring first model parameters of the first stage predictor model and second model parameters of the second stage predictor model;

and calculating the prediction consistency loss value between the first stage prediction sub-model and the second stage prediction sub-model based on the first model parameter, the second model parameter, the first quantity, the second quantity, the first random noise and the second random noise.

Optionally, the calculating, based on the fundus image stage prediction model, a semantic association consistency loss value of the fundus image stage prediction model according to the features extracted from the fundus image sample includes:

acquiring similarity between high-level semantic features of the first fundus image sample and the second fundus image sample;

and calculating to obtain the semantic association consistency loss value between the first stage prediction sub-model and the second stage prediction sub-model based on the first model parameter, the second model parameter, the first quantity, the second quantity, the first random noise, the second random noise and the similarity.

Optionally, the calculating, based on the classification loss value, the predicted consistency loss value, and the semantically-related consistency loss value, a target loss value of the fundus image stage prediction model includes:

and calculating the target loss value based on the classification loss value, the predicted consistency loss value, the semantically-related consistency loss value and a loss balance coefficient.

In a second aspect, embodiments of the present application provide a model training apparatus, the apparatus including:

the image sample acquisition module is used for acquiring fundus image samples; the fundus image sample includes: a first number of stage-tagged ROP fundus images and a second number of stage-untagged ROP fundus images, the first number being less than the second number;

the classifying loss calculation module is used for inputting a first number of fundus image samples into a fundus image stage prediction model, obtaining stage prediction output of the first number of fundus images and calculating classifying loss values;

the prediction consistency loss calculation module is used for processing the first number of fundus image samples and the second number of fundus image samples based on the fundus image stage prediction model to obtain prediction output, and calculating to obtain a prediction consistency loss value of the fundus image stage prediction model;

The semantic consistency loss calculation module is used for calculating a semantic association consistency loss value of the fundus image stage prediction model according to the characteristics extracted from the fundus image sample based on the fundus image stage prediction model;

the target loss value calculation module is used for calculating the target loss value of the fundus image stage prediction model based on the classification loss value, the prediction consistency loss value and the semantic association consistency loss value;

and the stage prediction model acquisition module is used for taking the converged fundus image stage prediction model as a final stage prediction model under the condition that the target loss value is in a preset range.

Optionally, the image sample acquisition module includes:

an original image sample acquisition unit for acquiring an original fundus image sample;

a processed image sample acquisition unit for adjusting the size of the original fundus image sample to a set size to obtain a processed fundus image sample;

a fundus image sample acquiring unit configured to perform a data enhancement operation on the processed fundus image sample to obtain the fundus image sample.

The classification loss calculation module includes:

a first image sample acquisition unit, configured to apply a first random noise to the ROP fundus image without the stage label, to obtain a first fundus image sample;

a second image sample acquisition unit, configured to apply a second random noise to the ROP fundus image without the stage label, to obtain a second fundus image sample;

a fundus image sample input unit configured to input the first fundus image sample to the first phase predictor model and input the second fundus image sample to the second phase predictor model;

the first prediction tag output unit is used for calling the first feature extraction layer to extract high-level semantic features from the first fundus image sample, calling the first full-connection layer to carry out stage prediction on the first fundus image sample based on the high-level semantic features, and outputting a first stage prediction tag of the first fundus image sample;

the second prediction label output unit is used for calling the second feature extraction layer to extract high-level semantic features from the second fundus image sample, calling the second full-connection layer to conduct stage prediction on the second fundus image sample based on the high-level semantic features, and outputting a second stage prediction label of the second fundus image sample.

Optionally, the predictive consistency loss calculation module includes:

a model parameter obtaining unit, configured to obtain a first model parameter of the first stage predictor model and a second model parameter of the second stage predictor model;

and the prediction consistency loss calculation unit is used for calculating the prediction consistency loss value between the first stage prediction sub-model and the second stage prediction sub-model based on the first model parameter, the second model parameter, the first quantity, the second quantity, the first random noise and the second random noise.

Optionally, the semantic consistency loss calculation module includes:

a semantic similarity obtaining unit configured to obtain a similarity between high-level semantic features of the first fundus image sample and the second fundus image sample;

the semantic consistency loss calculation unit is used for calculating the semantic association consistency loss value between the first stage prediction sub-model and the second stage prediction sub-model based on the first model parameter, the second model parameter, the first quantity, the second quantity, the first random noise, the second random noise and the similarity.

Optionally, the target loss value calculation module includes:

and the target loss value calculation unit is used for calculating the target loss value based on the classification loss value, the prediction consistency loss value, the semantic association consistency loss value and the loss balance coefficient.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the model training method of any of the above.

In a fourth aspect, embodiments of the present application provide a readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the model training method of any one of the above.

In an embodiment of the present application, a fundus image sample is acquired, the fundus image sample including: a first number of stage-labeled ROP fundus images and a second number of stage-label-free ROP fundus images, the first number being less than the second number. And inputting the first number of fundus image samples into a fundus image stage prediction model to obtain stage prediction output of the first number of fundus images, and calculating a classification loss value. And processing the first number of fundus image samples and the second number of fundus image samples based on the fundus image stage prediction model to obtain prediction output, and calculating to obtain a prediction consistency loss value of the fundus image stage prediction model. And calculating a semantic association consistency loss value of the fundus image stage prediction model according to the characteristics extracted from the fundus image sample based on the fundus image stage prediction model. And calculating the target loss value of the fundus image stage prediction model based on the classification loss value, the prediction consistency loss value and the semantic association consistency loss value. And under the condition that the target loss value is in a preset range, taking the converged fundus image stage prediction model as a final stage prediction model. According to the embodiment of the application, the ROP automatic classification can be carried out by utilizing a small quantity of ROP fundus images with labels and a large quantity of ROP fundus images without labels, so that the labeling burden and cost of doctors can be relieved. Meanwhile, useful discrimination information is fully mined from the label-free data through predicting consistency loss, so that the classification performance of the deep learning model is greatly improved. Through the disease semantic association consistency loss, the evolution relation of the disease features in different periods of ROP is additionally considered, and the recognition accuracy of the model is improved.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flowchart illustrating steps of a model training method according to an embodiment of the present application;

fig. 2 is a flowchart of steps of a fundus image sample acquiring method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating steps of a method for calculating classification loss according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating steps of a method for calculating a predicted consistency loss value according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating steps of a method for calculating a semantic association consistency loss value according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating steps of a method for calculating a target loss value according to an embodiment of the present application;

fig. 7 is a schematic diagram of an ROP fundus image according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an overall structure of a semi-supervised classification model according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a ResNet-50 model structure according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a model training device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Referring to fig. 1, a flowchart illustrating steps of a model training method provided in an embodiment of the present application is shown, and as shown in fig. 1, the device control method may include: step 101, step 102, step 103, step 104, step 105 and step 106.

Step 101: acquiring a fundus image sample; the fundus image sample includes: a first number of stage-tagged ROP fundus images and a second number of stage-untagged ROP fundus images, the first number being less than the second number.

In this embodiment, the fundus image sample is a ROP fundus image for training the fundus image stage prediction model. In this example, the fundus image sample may include: a first number of stage-labeled ROP fundus images and a second number of stage-label-free ROP fundus images, the first number being less than the second number. That is, in the present embodiment, a small number of the noted ROP fundus images and a large number of the untagged ROP fundus images are used for model training.

In a specific implementation, when training of the fundus image staged prediction model is performed, a preset number of fundus image samples may be acquired to perform model training. Specifically, an original fundus image sample may be acquired first, and then, preprocessing, data enhancement operation, and the like may be performed on the original fundus image sample to obtain a processed fundus image. This implementation may be described in detail below in conjunction with fig. 2.

Referring to fig. 2, a flowchart illustrating steps of a fundus image sample acquiring method according to an embodiment of the present application is shown, and as shown in fig. 2, the fundus image sample acquiring method may include: step 201, step 202 and step 203.

Step 201: an original fundus image sample is acquired.

In the present embodiment, an original fundus image can be acquired at the time of training of the fundus image stage prediction model. In a specific implementation, the original fundus image may be an ROP fundus image stored in advance in a medical database in a medical institution, or may be an ROP fundus image downloaded from a medical website, or the like, and specifically, the acquiring manner of the original fundus image may be determined according to a service requirement, which is not limited in this embodiment.

After the original fundus image sample is acquired, step 202 is performed.

Step 202: and adjusting the size of the original fundus image sample to be a set size to obtain a processed fundus image.

After the original fundus image sample is acquired, the size of the original fundus image sample is adjusted to a set size to obtain a processed fundus image sample. For example, the size of the original fundus image may be adjusted to 256×256 pixels or the like.

After the pixels of the original fundus image sample are resized to the set size to obtain a processed fundus image sample, step 203 is performed.

Step 203: and performing data enhancement operation on the processed fundus image sample to obtain the fundus image sample.

After the size of the original fundus image sample is adjusted to the set size to obtain a processed fundus image sample, a data enhancement operation may be performed on the processed fundus image sample to obtain a fundus image sample. Specifically, data enhancement operations such as random rotation, random horizontal/vertical flipping, and the like may be performed on the processed fundus image samples to increase the generalization performance of the model.

In this example, the fundus image samples subjected to model training may contain fundus image samples of different stages at the same time, and in practical application, the stage of ROP fundus image may include: normal images (fig. 7 (a)), ROP first-phase images (fig. 7 (b)), ROP second-phase images (fig. 7 (c)), ROP third-phase images (fig. 7 (d)), ROP fourth-phase images (fig. 7 (e)), and ROP fifth-phase images (fig. 7 (f)). In performing model training for each round, the fundus image input to the model should contain the ROP fundus images for the six phases described above at the same time so that the model is sufficiently learned.

After the fundus image sample is acquired, step 102 is performed.

Step 102: inputting a first number of fundus image samples into a fundus image stage prediction model, obtaining a stage prediction output of the first number of fundus images, and calculating a classification loss value.

After the fundus image sample is acquired, the fundus image phase prediction model can be trained based on the fundus image sample to obtain a phase prediction label corresponding to the fundus image sample output by the fundus image phase prediction model.

After the fundus image samples are acquired, a first number of fundus image samples may be input to a fundus image stage prediction model, a stage prediction output of the first number of fundus images is obtained, and a classification loss value is calculated. Specifically, the first number of fundus image samples may be input into a fundus image stage prediction model, and the first number of fundus image samples may be processed by the fundus image stage prediction model to obtain a corresponding stage prediction label, and the classification loss value may be calculated by combining the stage prediction label and the labeled stage label. In this example, the classification loss value may be a cross entropy loss value.

Step 103: and processing the first number of fundus image samples and the second number of fundus image samples based on the fundus image stage prediction model to obtain prediction output, and calculating to obtain a prediction consistency loss value of the fundus image stage prediction model.

After obtaining the fundus image samples, the first number of fundus image samples and the second number of fundus image samples can be processed based on the fundus image stage prediction model to obtain prediction output, and a prediction consistency loss value of the fundus image stage prediction model is calculated.

In this example, the fundus image staging prediction model may include: the system comprises a first stage prediction sub-module and a second stage prediction sub-model, wherein the first stage prediction sub-module and the second stage prediction sub-model have the same model structure and different model parameters. The first and second prediction sub-modules may be trained simultaneously. As shown in FIG. 8, in the Training phase, the model includes two parts, namely a Student model (i.e., a first phase predictor model) and a Teacher model (i.e., a second phase predictor model).

The process of processing fundus image samples for the first and second session prediction sub-modules may be described in detail below in conjunction with fig. 3.

Referring to fig. 3, a step flowchart of a method for obtaining a staged predicted tag according to an embodiment of the present application is shown, and as shown in fig. 3, the method for obtaining a staged predicted tag may include: step 301, step 302, step 303, step 304 and step 305.

Step 301: and applying first random noise to the ROP fundus image without the stage label to obtain a first fundus image sample.

Step 302: and applying second random noise to the ROP fundus image without the stage label to obtain a second fundus image sample.

In the present embodiment, after the fundus image sample is acquired, first random noise may be applied to the ROP fundus image without the stage label in the fundus image sample. And applying second random noise to the ROP fundus image without the stage label in the fundus image sample to obtain a second fundus image sample.

After the first fundus image sample and the second fundus image sample are acquired, step 303 is performed.

Step 303: the first fundus image sample is input to the first phase predictor model and the second fundus image sample is input to the second phase predictor model.

After the first fundus image sample and the second fundus image sample are obtained, the first fundus image sample may be input to the first phase predictor model and the second fundus image sample may be input to the second phase predictor model.

In this example, the model structures of the first and second segment predictor models are the same, and the first segment predictor model may include: the first feature extraction layer and the first full connection layer having N class output units, the second temporal predictor model may include: the second feature extraction layer and the second full-connection layer with N category output units, wherein N is a positive integer.

The feature extraction layer (i.e., the first feature extraction layer and the second feature extraction layer) may be a res net-50, and the model structure of the res net-50 may be as shown in fig. 9, which alleviates the problems that the learning efficiency becomes low and the accuracy cannot be effectively improved due to the deepening of the layer number of the deep learning model through residual learning, and meanwhile, the network has better feature extraction capability and better model classification performance due to the deepening of the layer number, and the network has good performance in various fields such as image classification, segmentation, detection and the like.

Of course, the feature extraction layer is not limited to the ResNet-50 model described above, and other model structures may be employed, which are not limited in this embodiment.

The fully connected layers (i.e., the first fully connected layer and the second fully connected layer) may connect the feature extraction layers, i.e., the first fully connected layer connects the first feature extraction layer and the second fully connected layer connects the second feature extraction layer. The full connection layer may include a number of output units, the number of output units of the full connection layer is the same as the number of classification categories, i.e. 6 output units per full connection layer.

After the first bottom-eye image sample is input into the first staged predictor model, step 304 is performed.

After the second fundus image sample is input to the second phase predictor model, step 305 is performed.

Step 304: and calling the first feature extraction layer to extract high-level semantic features from the first fundus image sample, calling the first full-connection layer to conduct stage prediction on the first fundus image sample based on the high-level semantic features, and outputting a first stage prediction label of the first fundus image sample.

The first stage prediction label refers to a stage label of the first fundus image sample predicted by the first stage predictor model.

After the first fundus image sample is input to the first stage predictor model, a feature extraction layer may be invoked to extract high-level semantic features in the first fundus image sample, and a first full-connection layer may be invoked to stage predict the first fundus image sample based on the high-level semantic features, and obtain a first stage prediction label of the first fundus image sample.

Step 305: and calling the second feature extraction layer to extract high-level semantic features from the second fundus image sample, calling the second full-connection layer to carry out stage prediction on the second fundus image sample based on the high-level semantic features, and outputting a second stage prediction label of the second fundus image sample.

The second phase prediction label refers to a phase label of the second fundus image sample predicted by the second phase predictor model.

After the second fundus image sample is input to the second phase prediction sub-model, the feature extraction layer may be invoked to extract high-level semantic features in the second fundus image sample, and the second full-connection layer may be invoked to perform phase prediction on the second fundus image sample based on the high-level semantic features, and output a second phase prediction tag of the second fundus image sample.

After the fundus image stage prediction model is trained based on the fundus image sample to obtain a stage prediction label corresponding to the fundus image sample output by the fundus image stage prediction model, a prediction consistency loss value can be calculated according to the stage prediction label output by the model.

The process of calculating the predicted consistency loss value may be described in detail below in conjunction with FIG. 4.

Referring to fig. 4, a flowchart illustrating steps of a method for calculating a predicted consistency loss value according to an embodiment of the present application is shown, where, as shown in fig. 4, the method for calculating a predicted consistency loss value may include: step 401 and step 402.

Step 401: and acquiring first model parameters of the first stage predictor model and second model parameters of the second stage predictor model.

In this embodiment, the first model parameter refers to a model parameter of the first stage predictor model. The second model parameters refer to model parameters of the second phase-separated predictor model.

In calculating the predicted consistency loss value, a first model parameter of the first phase-separated predictor model and a second model parameter of the second phase-separated predictor model may be obtained.

After the first model parameters of the first phase-separated predictor model and the second model parameters of the second phase-separated predictor model are obtained, step 402 is performed.

Step 402: and calculating the prediction consistency loss value between the first stage prediction sub-model and the second stage prediction sub-model based on the first model parameter, the second model parameter, the first quantity, the second quantity, the first random noise and the second random noise.

The predicted consistency loss value refers to a consistency degree of the stage prediction labels for indicating the output of the first stage predictor model and the second stage predictor model.

After the first model parameter of the first periodic prediction sub-model and the second model parameter of the second periodic prediction sub-model are obtained, a prediction consistency loss value between the first periodic prediction sub-model and the second periodic prediction sub-model can be calculated based on the first model parameter, the second model parameter, the first number, the second number, the first random noise and the second random noise. The calculation formula for predicting the consistency loss value may be as shown in the following formula (1):

In the above formula (1), L _c To predict a consistency loss value τ _s And τ _t Is random data noise applied to the unlabeled ROP fundus image. f (f) _s And f _t Representing a first and a second segment predictor model, respectively. θ _s And theta _t Representing model parameters of the first and second phase predictor models, respectively. N (N) _l For a first number, N _u A second number. X is x _i Is the ith fundus image sample.

Step 104: and calculating a semantic association consistency loss value of the fundus image stage prediction model according to the characteristics extracted from the fundus image sample based on the fundus image stage prediction model.

The semantic association consistency loss value may be used to indicate a degree of semantic consistency of the stage predictions output by the first stage predictor model and the second stage predictor model.

After the high-level semantic features of the fundus image stage prediction model are acquired, the semantic association consistency loss value of the fundus image stage prediction model can be calculated based on the high-level semantic features. Specifically, the calculation of the semantic association consistency loss value can be performed by combining the high-level semantic features respectively output by the first and second stage prediction sub-models. The manner in which the semantic association consistency loss value is calculated may be described in detail below in conjunction with FIG. 5.

Referring to fig. 5, a flowchart illustrating steps of a method for calculating a semantic association consistency loss value according to an embodiment of the present application is shown, where, as shown in fig. 5, the method for calculating a semantic association consistency loss value may include: step 501 and step 502.

Step 501: and obtaining the similarity between the high-level semantic features of the fundus image samples output by the first model and the second model.

In the present embodiment, after the output of the feature extraction layer is obtained, the similarity between the high-level semantic features of the fundus image sample output by the first model and the second model may be acquired. Specifically, after obtaining the output of the feature extraction network, the gram matrix between the data in each batch is then calculated: g=q·q ^T Wherein G is _i,j Representing the similarity between samples i and j, which is then normalized:

after the similarity between the high-level semantic features of the fundus image samples output by the first model and the second model is acquired, step 502 is performed.

Step 502: and calculating to obtain the semantic association consistency loss value between the first stage prediction sub-model and the second stage prediction sub-model based on the first model parameter, the second model parameter, the first quantity, the second quantity, the first random noise, the second random noise and the similarity.

After the similarity between the high-level semantic features of the fundus image samples output by the first model and the second model is obtained, the semantic association consistency loss value between the first stage prediction sub-model and the second stage prediction sub-model can be calculated by the first model parameter, the second model parameter, the first quantity, the second quantity, the first random noise, the second random noise and the similarity. The calculation formula of the semantic association consistency loss value can be shown in the following formula (3):

in the above formula (3), L _s The value of the consistency loss of semantic association is B, the number of fundus image samples is P ^s 、P ^t Is the similarity of the high-level semantic features.

According to the embodiment of the application, the useful discrimination information is fully mined from the label-free data through predicting the consistency loss, so that the classification performance of the deep learning model is greatly improved. Through semantic association consistency loss, the evolution relation of the disease features in different periods of ROP is additionally considered, and the relation is utilized to improve the recognition accuracy of the classification model.

After calculating the predicted and semantically associated consistency loss values, step 105 is performed.

Step 105: and calculating the target loss value of the fundus image stage prediction model based on the classification loss value, the prediction consistency loss value and the semantic association consistency loss value.

After the predicted consistency loss value and the semantic association consistency loss value are calculated, a target loss value of the fundus image stage prediction model can be calculated based on the classification loss value, the predicted consistency loss value and the semantic association consistency loss value. In this embodiment, the corresponding cross entropy loss may be calculated by combining the fundus image samples with the session tags, and the target loss value may be calculated according to the cross entropy loss value, the predicted consistency loss value, and the semantic association consistency loss value. The calculation process for the target loss value may be described in detail below in connection with fig. 6.

Referring to fig. 6, a flowchart illustrating steps of a target loss value calculation method according to an embodiment of the present application is shown, and as shown in fig. 6, the target loss value calculation method may include: step 601.

Step 601: and calculating the target loss value based on the classification loss value, the predicted consistency loss value, the semantically-related consistency loss value and a loss balance coefficient.

In this embodiment, the target loss value may be calculated based on the cross entropy loss value, the predicted consistency loss value, the semantically associated consistency loss value, and the loss balance coefficient. The calculation formula of the target loss value may be as shown in the following formula (4):

L _total ＝L _ce +λ(L _c +L _s ) (4)

In the above formula (4), L _total For the target loss value, L _ce For cross entropy loss values, λ is the balance coefficient.

After the target loss value is calculated, step 106 is performed.

Step 106: and under the condition that the target loss value is in a preset range, taking the converged fundus image stage prediction model as a final stage prediction model.

After the target loss value is calculated, it may be determined whether the target loss value is within a preset range.

If the target loss value is not in the preset range, the fundus image stage prediction model is not converged, and the fundus image stage prediction model is continuously trained for the next round.

If the target loss value is within the preset range, the fundus image stage prediction model is converged, and at this time, the converged fundus image stage prediction model can be used as a final stage prediction model, and the final stage prediction model can be applied to the stage prediction process of the subsequent ROP fundus image.

It can be appreciated that when model training is performed, training (i.e., the first and second stage predictor models) is completed by adopting two parallel model structures together, and only one model structure, i.e., the first stage predictor model, is adopted in the model reasoning process. As shown in fig. 8, in the Test phase, a model reasoning process may be performed using the Student model to predict the ROP fundus image in stages.

According to the model training method provided by the embodiment of the application, a fundus image sample is acquired, and the fundus image sample comprises: a first number of stage-labeled ROP fundus images and a second number of stage-label-free ROP fundus images, the first number being less than the second number. And inputting the first number of fundus image samples into a fundus image stage prediction model to obtain stage prediction output of the first number of fundus images, and calculating a classification loss value. And processing the first number of fundus image samples and the second number of fundus image samples based on the fundus image stage prediction model to obtain prediction output, and calculating to obtain a prediction consistency loss value of the fundus image stage prediction model. And calculating a semantic association consistency loss value of the fundus image stage prediction model according to the characteristics extracted from the fundus image sample based on the fundus image stage prediction model. And calculating the target loss value of the fundus image stage prediction model based on the classification loss value, the prediction consistency loss value and the semantic association consistency loss value. And under the condition that the target loss value is in a preset range, taking the converged fundus image stage prediction model as a final stage prediction model. According to the embodiment of the application, the ROP automatic classification can be carried out by utilizing a small quantity of ROP fundus images with labels and a large quantity of ROP fundus images without labels, so that the labeling burden and cost of doctors can be relieved. Meanwhile, useful discrimination information is fully mined from the label-free data through predicting consistency loss, so that the classification performance of the deep learning model is greatly improved. Through the disease semantic association consistency loss, the evolution relation of the disease features in different periods of ROP is additionally considered, and the recognition accuracy of the model is improved.

Referring to fig. 10, a schematic structural diagram of a model training apparatus provided in an embodiment of the present application is shown, and as shown in fig. 10, the model training apparatus 1000 may include the following modules:

an image sample acquisition module 1001 for acquiring a fundus image sample; the fundus image sample includes: a first number of stage-tagged ROP fundus images and a second number of stage-untagged ROP fundus images, the first number being less than the second number;

a classification loss calculation module 1002, configured to input a first number of fundus image samples into a fundus image stage prediction model, obtain a stage prediction output of the first number of fundus images, and calculate a classification loss value;

a predicted consistent loss calculation module 1003, configured to process the first number of fundus image samples and the second number of fundus image samples based on the fundus image session prediction model to obtain a predicted output, and calculate to obtain a predicted consistent loss value between the first session prediction model and the second session prediction model;

a semantic consistency loss calculation module 1004, configured to calculate, based on the fundus image stage prediction model, a semantic relevance consistency loss value between the first stage prediction model and the second stage prediction model according to features extracted from the fundus image sample;

A target loss value calculation module 1005, configured to calculate a target loss value of the fundus image stage prediction model based on the classification loss value, the prediction consistency loss value, and the semantic association consistency loss value;

and a stage prediction model obtaining module 1006, configured to take the converged fundus image stage prediction model as a final stage prediction model when the target loss value is within a preset range.

Optionally, the image sample acquisition module includes:

The classification loss calculation module includes:

Optionally, the predictive consistency loss calculation module includes:

Optionally, the semantic consistency loss calculation module includes:

Optionally, the target loss value calculation module includes:

The embodiment of the application provides a model training device, acquires fundus image sample, and fundus image sample includes: a first number of stage-labeled ROP fundus images and a second number of stage-label-free ROP fundus images, the first number being less than the second number. Inputting a first number of fundus image samples into a first fundus image stage prediction model to obtain a stage prediction output of the first number of fundus images, and calculating a classification loss value. And calculating the predicted output of the fundus image sample based on the first and second phase prediction models to obtain a predicted consistency loss value between the first phase prediction model and the second phase prediction model. And calculating to obtain a semantic association consistency loss value between the first stage prediction model and the second stage prediction model based on the characteristics extracted from the fundus image sample by the first stage prediction model and the second stage prediction model. And calculating the target loss value of the fundus image stage prediction model based on the classification loss value, the prediction consistency loss value and the semantic association consistency loss value. And under the condition that the target loss value is in a preset range, taking the converged fundus image stage prediction model as a final stage prediction model. Meanwhile, useful discrimination information is fully mined from the label-free data through predicting consistency loss, so that the classification performance of the deep learning model is greatly improved. Through the disease semantic association consistency loss, the evolution relation of the disease features in different periods of ROP is additionally considered, and the recognition accuracy of the model is improved.

Additionally, the embodiment of the application also provides electronic equipment, which comprises: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the model training method.

Fig. 11 shows a schematic structural diagram of an electronic device 1100 according to an embodiment of the present invention. As shown in fig. 11, the electronic device 1100 includes a Central Processing Unit (CPU) 1101 that can perform various suitable actions and processes according to computer program instructions stored in a Read Only Memory (ROM) 1102 or computer program instructions loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM1103, various programs and data required for the operation of the electronic device 1100 can also be stored. The CPU 1101, ROM1102, and RAM1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106 such as a keyboard, mouse, microphone, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The various processes and treatments described above may be performed by the processing unit 1101. For example, the methods of any of the embodiments described above may be implemented as a computer software program tangibly embodied on a computer-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1100 via ROM1102 and/or communication unit 1109. When the computer program is loaded into the RAM1103 and executed by the CPU1101, one or more actions of the methods described above may be performed.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the above model training method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein the acquiring a fundus image sample comprises:

acquiring an original fundus image sample;

3. The method of claim 1, wherein the fundus image staging prediction model comprises: a first stage predictor module and a second stage predictor model, the first stage predictor model comprising: a first feature extraction layer and a first fully connected layer having N output units, the second temporal predictor model comprising: a second feature extraction layer and a second full connection layer with N output units, N is a positive integer,

4. A method according to claim 3, wherein said calculating results in a predicted consistency loss value for the fundus image session prediction model, comprising:

5. The method of claim 4, wherein the calculating, based on the fundus image stage prediction model, a semantic association consistency loss value of the fundus image stage prediction model from features extracted from the fundus image samples comprises:

6. The method of claim 1, wherein the calculating a target loss value for the fundus image staged prediction model based on the classification loss value, the predicted consistency loss value, and the semantically associated consistency loss value comprises:

7. A model training apparatus, the apparatus comprising:

8. The apparatus of claim 7, wherein the image sample acquisition module comprises:

9. An electronic device, comprising:

memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the model training method according to any of claims 1 to 6.

10. A readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the model training method of any one of claims 1 to 6.