CN115458174A - Method for constructing intelligent diagnosis model of diabetic retinopathy - Google Patents

Method for constructing intelligent diagnosis model of diabetic retinopathy Download PDF

Info

Publication number
CN115458174A
CN115458174A CN202211142980.XA CN202211142980A CN115458174A CN 115458174 A CN115458174 A CN 115458174A CN 202211142980 A CN202211142980 A CN 202211142980A CN 115458174 A CN115458174 A CN 115458174A
Authority
CN
China
Prior art keywords
task
encoder
network
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211142980.XA
Other languages
Chinese (zh)
Inventor
欧阳继红
逯晨阳
刘思光
郭泽琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202211142980.XA priority Critical patent/CN115458174A/en
Publication of CN115458174A publication Critical patent/CN115458174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Radiology & Medical Imaging (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for constructing an intelligent diagnosis model for diabetic retinopathy, which aims at the disease stage grade characteristic of DR and combines self-supervision contrast learning and supervision deep learning to improve the training process of DNN (deep dynamic network) so as to enable the model to excavate prior knowledge under the condition of no labeled data to assist the DR intelligent diagnosis model and be used for early DR detection. The construction method comprises data preprocessing, a front task and a downstream task. The preprocessing is to reduce noise and normalize the image in order to improve the quality of the data. The pre-task is an unsupervised model pre-training process, and is used for mining prior knowledge from unmarked data sets so as to assist model training. The downstream task is a supervised classification model fine-tuning process, and the prior knowledge is utilized to improve the training quality of the model, so that the DR classification performance of the model under the training of a small amount of labeled data is improved.

Description

Method for constructing intelligent diagnosis model of diabetic retinopathy
Technical Field
The invention relates to the technical field of deep learning, in particular to a method for constructing an intelligent diagnosis model of diabetic retinopathy.
Background
Diabetic Retinopathy (DR) is one of the most common Diabetic ophthalmic disease complications and a preventable disease. In clinic, the higher-experience ophthalmologist is mainly relied on to perform observation and evaluation on the color retina image to determine the disease stage grade, wherein the pathological stage grade is mostly established by adopting the DR international quintuple standard to make an optimized diagnosis and treatment scheme. Deep learning can effectively extract pathological features in an image according to labeled data and obtain a better diagnosis result, but the actual situation can face many challenges.
In the deep neural network training process, the existing method mostly trains a neural network in a large data driving mode, and because the existing fundus image open source data sets have the problems of equipment difference, regional difference of diseases and the like, the model training method for large-scale open source data can cause inaccurate model classification results, and is not suitable for expanding the model and applying the model to certain medical institutions with insufficient label data or different national regions.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method for constructing an intelligent diagnosis model of diabetic retinopathy.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for constructing an intelligent diagnosis model of diabetic retinopathy comprises the following specific processes:
s1, image preprocessing: carrying out noise reduction and normalization processing on the fundus image data;
s2, model training: the method mainly comprises two stages: a pre-task based on contrast self-supervision learning and a downstream task based on a convolutional neural network;
the specific process of the preposed task is as follows:
2.1.1, data enhancement: in the training process, a batch of input fundus images is givenImage set is X = { X = ×) k } k=1,2,…,N N is the number of images of the input fundus image set; for each input fundus image data x k And (3) carrying out random transformation to generate a positive sample fundus image data pair: (x) i ,x j ) (ii) a Thus, set X generates set X after data enhancement i ={x i } i=1,3,…,2N-1 And X j ={x j } j=2,4,…,2N
2.1.2, encoder: encoder f (-) uses a convolutional neural network shared by two weights, while assembling the input X i And X j Encoding into a corresponding 2048-dimensional feature representation sequence H i And H j As shown in equation (3):
H i =f(X i )
H j =f(X j ) (3)
2.1.3, projection network project Net: the projection network p (-) comprises two weight-sharing multilayer perceptron network modules, namely MLP modules, wherein each MLP module consists of two fully-connected network layers F1 and F2 connected through a ReLU activation function layer; after multiple nonlinear changes in the projection network, the feature is represented by H i And H j Further conversion to Z i And Z j (ii) a The projection process is shown in formula (4):
Z i =p(H i )=W 2 σ(W 1 H i )
Z j =p(H j )=W 2 σ(W 1 H j ) (4)
in the formula (4), W 1 And W 2 Respectively representing two fully-connected layer F1 and F2 network parameters in MLP, wherein sigma represents ReLU nonlinear transformation;
2.1.4, contrast loss function: selecting a comparative loss function NT-Xent as a loss function; inputting N pieces of fundus image data in each batch, wherein the batch size can be expanded to 2N after the data enhancement process; the NT-Xent loss calculation procedure for each batch of data is as follows:
first, the cosine similarity between each two fundus images in all 2N fundus images is calculated as shown in formula (5):
Figure BDA0003854450750000031
in formula (5), m ∈ {1,2, \ 8230;, 2N }, N ∈ {1,2, \ 8230;, 2N }; then, the cosine similarity sim (Z) is normalized using the Softmax function m ,Z n ) Obtaining the probability of similarity between the two fundus images; then, taking the negative value of the logarithm of the pair of fundus images as the loss of the pair of fundus images, and calling the loss as noise contrast estimation loss; positive sample pair (Z) i ,Z j ) The process of calculating the loss therebetween is shown in equation (6):
Figure BDA0003854450750000032
in the formula (6), k ≠ i represents dividing Z i All images except the image are processed, and tau represents an adjustable temperature parameter which is used for zooming input and expanding the range of cosine similarity to [ -1,1](ii) a Finally, comprises (X) i ,X j ) And (X) j ,X i ) The final loss of all the pairs of positive samples, i.e. the final loss of the whole network, is calculated as shown in equation (7):
Figure BDA0003854450750000033
by the pre-task training, the learned prior knowledge is migrated through a network by using a self-supervision contrast learning training method without labeled data, and the parameters learned by an encoder in the pre-task are migrated to a feature extraction network of a downstream task to be used as initialization network parameters;
the downstream task comprises three components (1) an encoder; (2) a classifier; (3) objective function:
2.2.1, encoder: the structure of a CNN encoder in a downstream task is the same as that of a preposed task encoder; loading parameters learned by an encoder in a preposed task into an encoder of a downstream task; inputting fundus image data x into an encoder to obtain a high-dimensional characteristic map representation sequence, then performing dimensionality reduction on the fundus image data x through an output layer of the encoder to obtain a characteristic vector h, and then inputting the characteristic vector h into a classifier for classification; the calculation process is as follows:
h=GAP(f(x)) (8)
in formula (8), f (·) is a convolution pooling process, GAP represents a global averaging pooling layer, and h represents a feature vector finally output by the encoder;
2.2.2, classifier: the output layer of the encoder is followed by a Softmax classifier, consisting of an FC layer and a Softmax function, for calculating the probability that a sample is positive
Figure BDA0003854450750000041
The calculation process is as follows:
{z 1 ,z 2 }=FC(h) (9)
Figure BDA0003854450750000042
in equation (9), FC denotes a fully connected network with two nodes at the output, z 1 And z 2 Respectively representing two output values; in equation (10), the output is normalized using the Softmax function to obtain the probability that the sample is positive
Figure BDA0003854450750000043
Figure BDA0003854450750000044
2.2.3, objective function: based on a Softmax classifier, adopting a two-class cross entropy loss function as a target function in the task; input sequence for a given batch { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x N ,y N ) The loss function computation procedure is as follows:
Figure BDA0003854450750000051
in the formula (11), y i For true labeling of samples, the positive sample label is set to 1 and the negative sample label is set to 0 in this task, so y i ∈{0,1},
Figure BDA0003854450750000052
Indicating the probability that the sample is predicted to be positive.
Further, the specific process of step S1:
firstly, cutting redundant black boundary pixels in a fundus image to remove information redundant noise;
then, normalizing the size of the fundus images, and uniformly adjusting the resolution of all the fundus images to 256 multiplied by 256;
finally, the brightness and contrast of the fundus image are improved as shown in the formula (1) and the formula (2):
Figure BDA0003854450750000053
I enhance =α·I(x,y)+β·I gaussian +γ (2)
in formula (1), an image I (x, y) with normalized size is input, and is subjected to gaussian filter convolution processing with standard deviation σ to obtain I gaussian Then obtaining an enhanced fundus image I through weighted summation of formula (2) enhabce Wherein the values of α, β, σ, γ are set to 4, -4, 10, 128, respectively.
Further, in step 2.1.1, data enhancement includes: randomly clipping the fundus image, randomly turning to a certain angle, randomly converting to a grey-scale image, and randomly modifying the brightness, contrast or saturation of the fundus image.
Further, in step 2.1.2, a ResNet50 network containing the ImageNet pre-training parameters is selected as the basic network structure of the encoder.
The invention has the beneficial effects that: the invention combines self-supervision contrast learning and supervision deep learning, improves the training process of DNN, and provides a method for constructing an intelligent diagnosis model of diabetic retinopathy. The construction method comprises data preprocessing, a front task and a downstream task. The preprocessing is to perform noise reduction and normalization processing on the image in order to improve the quality of the data. The pre-task is an unsupervised model pre-training process, and is used for mining prior knowledge from unmarked data sets so as to assist model training. The downstream task is a supervised classification model fine-tuning process, and the prior knowledge is utilized to improve the training quality of the model, so that the DR classification performance of the model under the training of a small amount of labeled data is improved.
Drawings
FIG. 1 is a schematic diagram illustrating data enhancement effect in a method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical solution, and the detailed implementation and the specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.
The embodiment provides a method for constructing an intelligent diagnosis model of diabetic retinopathy, wherein how to train the model by using unmarked data and a small amount of marked image data in a database is important.
Convolutional neural network (DNN), one of the best techniques for extracting image features, has become the most effective deep learning model in the field, and the model usually uses labeled data for supervised training, but feature mining capability in unlabeled data and a small amount of labeled data is not sufficient.
Aiming at the disease stage grade characteristic of DR, in order to enable a model to dig out a priori knowledge auxiliary intelligent diagnosis model under the condition of no labeled data and be used for early diabetic retinopathy detection, the embodiment combines self-supervision contrast learning and supervision deep learning, improves the training process of DNN, and provides a method for constructing the diabetic retinopathy intelligent diagnosis model (DR model). The construction method comprises the steps of data preprocessing, a pretreat Task (Pretext Task) and a Downstream Task (Downstream Task). The preprocessing is to reduce noise and normalize the image in order to improve the quality of the data. The pre-task is an unsupervised model pre-training process and is used for mining prior knowledge from an unmarked data set so as to assist model training. The downstream task is a supervised classification model fine-tuning process, and the prior knowledge is utilized to improve the training quality of the model, so that the DR classification performance of the model under the training of a small amount of labeled data is improved.
The technical route of the method of the embodiment is as follows: preprocessing an original picture in a database, inputting processed data into a model, and obtaining the diabetic retinopathy intelligent diagnosis model after training of a preposed task and a downstream task and training optimization. The specific process is as follows:
s1, image preprocessing:
the difference between individuals is large because the fundus image data sources and the imaging devices are different. Several phenomena can occur in the data set: large size color differences, overexposure, information redundancy, etc. In order to improve the quality of data, the fundus images in the database need to be subjected to noise reduction and normalization processing. Firstly, cutting redundant black boundary pixels in a fundus image to remove information redundant noise; then, normalizing the size of the fundus images, and uniformly adjusting the resolution of all the fundus images to 256 multiplied by 256; finally, the brightness and contrast of the fundus image are improved as shown in the formula (1) and the formula (2):
Figure BDA0003854450750000081
I enhance =α·I(x,y)+β·I gaussian +γ (2)
in formula (1), an image I (x, y) with normalized size is input, and is subjected to gaussian filter convolution processing with standard deviation σ to obtain I gaussian Then obtaining an enhanced fundus image I through weighted summation of formula (2) enhance Where the values of α, β, σ, and γ are 4, -4, 10, and 128, respectively, the original and the preprocessing result are shown in fig. 1, and (a) and (c) in fig. 1 are the original, and (b) and (d) are the corresponding preprocessing results, respectively.
S2, model training:
the clinical medical image database stores a large number of fundus images, which are not fully utilized because of no labeling. The comparison learning plays an important role in mining unmarked data samples to improve the generalization capability of the model.
As shown in fig. 2, the model training of the present embodiment mainly includes two stages: a pre-task based on contrast self-supervised learning, and a downstream task based on a convolutional neural network. The pre-task is an unsupervised model pre-training process, and is used for mining prior knowledge from unmarked data sets so as to assist model training. The downstream task is a supervised classification model fine-tuning process, and the prior knowledge is utilized to improve the training quality of the model, so that the DR classification performance of the model under the training of a small amount of labeled data is improved.
In the method, the key point of model training is how to mine the priori knowledge from the unmarked data by using the front-end task, and the model training is transferred to the downstream task by a knowledge migration method, so that the classification performance of the model is improved.
2.1, the pre-task comprises four stages of data enhancement, an encoder, a projection network and contrast loss. The encoder f (-) utilizes a convolutional neural network shared by two weights, and the convolutional neural network is migrated to a feature extraction network of a downstream task after training.
The specific process of the preposed task is as follows:
2.1.1, data enhancement: in the training process, a batch of input fundus image sets is given as X = { X = { (X) } k } k=1,2,…,N N is the number of images of the input fundus image set; fundus image data x for each input k Randomly transforming to generate a positive sample fundus image data pair (x) i ,x j ) (ii) a Therefore, setThe aggregate X generates a set X after data enhancement i ={x i } i=1,3,…,2N-1 And X j ={x j } j=2,4,…,2N . Finally, the input to the encoder for one batch is the set { (x) i ,x j )} i=1,3,…,2N-1;j=i+1 Wherein (x) 1 ,x 2 ) Is composed of x 1 e.X. Several data enhancement methods are selected in the embodiment, including: randomly cropping the fundus image, randomly flipping to a certain angle, randomly converting to a gray scale map, and randomly modifying the brightness, contrast or saturation of the fundus image.
2.1.2, encoder Base Encoder: encoder f (-) utilizes a convolutional neural network shared by two weights, while assembling the input X i And X j Encoding into a corresponding 2048-dimensional feature representation sequence H i And H j As shown in equation (3).
H i =f(X i )
H j =f(X j ) (3)
The ResNet50 network has proven to be very efficient in extracting high-level semantic information, which is critical for medical image analysis. Therefore, in the method of the embodiment, the ResNet50 network containing the ImageNet pre-training parameter is selected as the basic network structure of the encoder, so that the encoder network has better initialization parameters, and the convergence rate of the encoder network in the training process is increased. In addition, the encoder position design in the network architecture is generic, i.e., the ResNet50 can be replaced with other types of encoders.
2.1.3, projection network project Net: the projection network p (-) comprises two weight-shared multilayer perceptron (MLP) network modules, each MLP module consisting of two fully-connected network layers F1, F2 connected by a ReLU activation function layer, as shown in fig. 2. After a plurality of nonlinear changes in the projection network, the characteristic representation sequence H obtained in step 2.1.2 is i And H j Further conversion to Z i And Z j . The projection process is shown in equation (4).
Z i =p(H i )=W 2 σ(W 1 H i )
Z j =p(H j )=W 2 σ(W 1 H j ) (4)
In the formula (4), W 1 And W 2 Respectively represents two fully-connected layer F1 and F2 network parameters in MLP, and sigma represents ReLU nonlinear transformation.
2.1.4, contrast loss function: unlike the supervised learning task, there is no manual tagging involved in the process of self-supervised learning. Therefore, in the pre-task, the key of the model training is to construct a pseudo label by using the unlabeled training data and design a corresponding loss function to perform pseudo supervised training on the network. The loss function selected by the method of the embodiment is a comparative loss function NT-Xent, and the principle of the function is to calculate the loss by using the distance between a positive example and a negative example. The losses are then optimized to make the positive examples closer together and the negative examples farther apart.
In this task, when the network model is trained, N pieces of fundus image data are input for each batch, and after the data enhancement process, the batch size is expanded to 2N. The NT-Xent loss calculation flow for each batch of data is as follows. First, the cosine similarity between each two fundus images in all 2N fundus images is calculated as shown in formula (5):
Figure BDA0003854450750000101
in formula (5), m ∈ {1,2, \ 8230;, 2N }, N ∈ {1,2, \ 8230;, 2N }. The cosine similarity sim (Z) is then normalized using the Softmax function m ,Z n ) And obtaining the probability of similarity between the two fundus images. Then, the negative value of the logarithm thereof is taken as the loss of the pair of fundus images, which is referred to as noise contrast estimation loss (NCE loss for short). Positive sample pair (Z) i ,Z j ) The process of calculating the loss therebetween is shown in equation (6):
Figure BDA0003854450750000111
in the formula (6), k ≠ i represents dividing Z i All images except the image, and tau represents an adjustable temperature parameter which is used for zooming input and expanding the range of cosine similarity to be-1, 1]. Finally, comprising (X) i ,X j ) And (X) j ,X i ) The final loss of all the positive sample pairs, i.e. the final loss calculation process of the whole network, is shown in equation (7):
Figure BDA0003854450750000112
through the training of the preposed task, the learned priori knowledge is migrated through a network by using a self-supervision contrast learning training method without marked data, and parameters learned by an encoder in the preposed task are migrated into a feature extraction network of a downstream task to be used as initialization network parameters, as shown in fig. 2.
2.2, because the model already learns some priori knowledge through a front task, good classification performance can be obtained only by using a small amount of labeled data to carry out fine tuning training on the encoder and the classifier, and a downstream task comprises three components (1) of the encoder; (2) a classifier; and (3) an objective function.
2.2.1, encoder: the structure of a CNN encoder in a downstream task is the same as that of a front task encoder, and is also a ResNet50 network structure. And loading the parameters learned by the encoder in the front task into the encoder of the downstream task. The fundus image data x is input into an encoder to obtain a high-dimensional feature map representation sequence, then the fundus image data x is subjected to dimension reduction processing through an output layer (GAP layer) of the encoder to obtain a feature vector h, and then the feature vector h is input into a classifier to be classified. The calculation procedure is as follows.
h=GAP(f(x)) (8)
In equation (8), f (-) convolution pooling, GAP denotes the global average pooling layer, and h denotes the feature vector of the final output of the encoder.
2.2.2, classifier: the output layer of the encoder is followed by a Softmax classifier consisting of an FC layer and a Softmax function for computing samplesProbability of originally belonging to RDR (recoverable RDR), i.e. probability of sample belonging to positive
Figure BDA0003854450750000121
The calculation procedure is as follows.
{z 1 ,z 2 }=FC(h) (9)
Figure BDA0003854450750000122
In equation (9), FC denotes a fully connected network with two nodes at the output, z 1 And z 2 Representing two output values, respectively. In equation (10), the output is normalized using the Softmax function to obtain the probability that the sample is positive
Figure BDA0003854450750000123
2.2.3, objective function: based on the Softmax classifier, a two-class cross entropy loss function is adopted as an objective function in the task. Input sequence for a given batch { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x N ,y N ) The loss function calculation procedure is as follows:
Figure BDA0003854450750000124
in the formula (11), y i For true labeling of samples, the positive sample label is set to 1 and the negative sample label is set to 0 in this task, so y i ∈{0,1}。
Figure BDA0003854450750000125
Indicating the probability that the sample is predicted to be positive.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims (4)

1. A method for constructing an intelligent diagnosis model of diabetic retinopathy is characterized by comprising the following specific processes:
s1, image preprocessing: carrying out noise reduction and normalization processing on the fundus image data;
s2, model training: the method mainly comprises two stages: a pre-task based on contrast self-supervision learning and a downstream task based on a convolutional neural network;
the specific process of the preposed task is as follows:
2.1.1, data enhancement: in the training process, a batch of input fundus image sets is given as X = { X = { (X) } k } k=1,2,...,N N is the number of images of the input fundus image set; fundus image data x for each input k And (3) carrying out random transformation to generate a positive sample fundus image data pair: (x) i ,x j ) (ii) a Thus, set X generates set X after data enhancement i ={x i } i=1,3,...,2N-1 And X j ={x j } j=2,4,...,2N
2.1.2, encoder: encoder f (-) uses a convolutional neural network shared by two weights, while assembling the input X i And X j Encoding into a corresponding 2048-dimensional feature representation sequence H i And H j As shown in equation (3):
H i =f(X i )
H j =f(X j ) (3)
2.1.3, projection network project Net: the projection network p (-) comprises two weight-sharing multilayer perceptron network modules, namely MLP modules, wherein each MLP module consists of two fully-connected network layers F1 and F2 which are connected through a ReLU activation function layer; after multiple nonlinear changes in the projection network, the feature is represented by H i And H j Further conversion to Z i And Z j (ii) a The projection process is shown in formula (4):
Z i =p(H i )=W 2 σ(W 1 H i )
Z j =p(H j )=W 2 σ(W 1 H j ) (4)
in the formula (4), W 1 And W 2 Respectively representing two fully-connected layer F1 and F2 network parameters in MLP, wherein sigma represents ReLU nonlinear transformation;
2.1.4, contrast loss function: selecting a comparative loss function NT-Xent as a loss function; inputting N pieces of fundus image data in each batch, and expanding the batch size to 2N after the data enhancement process; the NT-Xent loss calculation flow for each batch of data is as follows:
first, the cosine similarity between each two fundus images in all 2N fundus images is calculated as shown in formula (5):
Figure FDA0003854450740000021
in formula (5), m ∈ {1,2, \ 8230;, 2N }, N ∈ {1,2, \ 8230;, 2N }; then, the cosine similarity sim (Z) is normalized using the Softmax function m ,Z n ) Obtaining the probability of similarity between the two fundus images; then, taking the negative value of the logarithm of the pair of fundus images as the loss of the pair of fundus images, and calling the loss as noise contrast estimation loss; positive sample pair (Z) i ,Z j ) The process of calculating the loss therebetween is shown in equation (6):
Figure FDA0003854450740000022
in the formula (6), k ≠ i represents dividing Z i All images except the image are processed, and tau represents an adjustable temperature parameter which is used for zooming input and expanding the range of cosine similarity to [ -1,1](ii) a Finally, comprising (X) i ,X j ) And (X) j ,X i ) The final loss of all the positive sample pairs, i.e. the final loss calculation process of the whole network, is shown in equation (7):
Figure FDA0003854450740000023
by the pre-task training, the learned prior knowledge is migrated through a network by using a self-supervision contrast learning training method without labeled data, and the parameters learned by an encoder in the pre-task are migrated to a feature extraction network of a downstream task to be used as initialization network parameters;
the downstream task comprises three components (1) an encoder; (2) a classifier; (3) objective function:
2.2.1, encoder: the structure of a CNN encoder in a downstream task is the same as that of a preposed task encoder; loading parameters learned by an encoder in a preposed task into an encoder of a downstream task; inputting fundus image data x into an encoder to obtain a high-dimensional characteristic map representation sequence, then performing dimensionality reduction on the fundus image data x through an output layer of the encoder to obtain a characteristic vector h, and then inputting the characteristic vector h into a classifier for classification; the calculation process is as follows:
h=GAP(f(x)) (8)
in formula (8), f (·) is a convolution pooling process, GAP represents a global averaging pooling layer, and h represents a feature vector finally output by the encoder;
2.2.2, classifier: the output layer of the encoder is followed by a Softmax classifier, consisting of an FC layer and a Softmax function, for calculating the probability that a sample is positive
Figure FDA0003854450740000033
The calculation process is as follows:
{z 1 ,z 2 }=FC(h) (9)
Figure FDA0003854450740000031
in equation (9), FC denotes a fully connected network with two nodes at the output, z 1 And z 2 Respectively representing two output values; in equation (10), the output is output using the Softmax functionNormalizing to obtain the probability that the sample is positive
Figure FDA0003854450740000032
2.2.3, objective function: based on a Softmax classifier, adopting a two-class cross entropy loss function as a target function in the task; input sequence for a given batch { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x N ,y N ) The loss function computation procedure is as follows:
Figure FDA0003854450740000041
in the formula (11), y i For true labeling of samples, the positive sample label is set to 1 and the negative sample label is set to 0 in this task, so y i ∈{0,1},
Figure FDA0003854450740000042
Indicating the probability that the sample is predicted to be positive.
2. The construction method according to claim 1, wherein the specific process of step S1:
firstly, cutting redundant black boundary pixels in a fundus image to remove information redundant noise;
then, normalizing the size of the fundus images, and uniformly adjusting the resolution of all the fundus images to 256 multiplied by 256;
finally, the luminance and contrast of the fundus image are improved as shown in formula (1) and formula (2):
Figure FDA0003854450740000043
I enhance =α·I(x,y)+β·I gaussian +γ (2)
in formula (1), an image I (x, y) with normalized size is input, and is subjected to gaussian filter convolution processing with standard deviation σ to obtain I gaussian Then obtaining an enhanced fundus image I through weighted summation of formula (2) enhance Wherein the values of α, β, σ, γ are set to 4, -4, 10, 128, respectively.
3. The building method according to claim 1, characterized in that in step 2.1.1, the data enhancement comprises: randomly clipping the fundus image, randomly turning to a certain angle, randomly converting to a grey-scale image, and randomly modifying the brightness, contrast or saturation of the fundus image.
4. The method of claim 1, wherein in step 2.1.2, a ResNet50 network containing ImageNet pre-training parameters is selected as the base network structure of the encoder.
CN202211142980.XA 2022-09-20 2022-09-20 Method for constructing intelligent diagnosis model of diabetic retinopathy Pending CN115458174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211142980.XA CN115458174A (en) 2022-09-20 2022-09-20 Method for constructing intelligent diagnosis model of diabetic retinopathy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211142980.XA CN115458174A (en) 2022-09-20 2022-09-20 Method for constructing intelligent diagnosis model of diabetic retinopathy

Publications (1)

Publication Number Publication Date
CN115458174A true CN115458174A (en) 2022-12-09

Family

ID=84305332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211142980.XA Pending CN115458174A (en) 2022-09-20 2022-09-20 Method for constructing intelligent diagnosis model of diabetic retinopathy

Country Status (1)

Country Link
CN (1) CN115458174A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373656A (en) * 2023-10-30 2024-01-09 北京理工大学 Diabetes weak supervision classification method based on heterogeneous data
CN117557840A (en) * 2023-11-10 2024-02-13 中国矿业大学 Fundus lesion grading method based on small sample learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373656A (en) * 2023-10-30 2024-01-09 北京理工大学 Diabetes weak supervision classification method based on heterogeneous data
CN117557840A (en) * 2023-11-10 2024-02-13 中国矿业大学 Fundus lesion grading method based on small sample learning
CN117557840B (en) * 2023-11-10 2024-05-24 中国矿业大学 Fundus lesion grading method based on small sample learning

Similar Documents

Publication Publication Date Title
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN109886121B (en) Human face key point positioning method for shielding robustness
CN110287849B (en) Lightweight depth network image target detection method suitable for raspberry pi
CN108154118B (en) A kind of target detection system and method based on adaptive combined filter and multistage detection
CN109345508B (en) Bone age evaluation method based on two-stage neural network
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN111401384B (en) Transformer equipment defect image matching method
CN111242288B (en) Multi-scale parallel deep neural network model construction method for lesion image segmentation
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN112307958A (en) Micro-expression identification method based on spatiotemporal appearance movement attention network
CN115458174A (en) Method for constructing intelligent diagnosis model of diabetic retinopathy
Hsueh et al. Human behavior recognition from multiview videos
JP2022551683A (en) Methods and systems for non-invasive genetic testing using artificial intelligence (AI) models
CN111062329B (en) Unsupervised pedestrian re-identification method based on augmented network
CN111582044A (en) Face recognition method based on convolutional neural network and attention model
CN108596044B (en) Pedestrian detection method based on deep convolutional neural network
CN116758397A (en) Single-mode induced multi-mode pre-training method and system based on deep learning
CN114565628A (en) Image segmentation method and system based on boundary perception attention
CN114463340A (en) Edge information guided agile remote sensing image semantic segmentation method
CN113763417A (en) Target tracking method based on twin network and residual error structure
Sekmen et al. Unsupervised deep learning for subspace clustering
CN116310335A (en) Method for segmenting pterygium focus area based on Vision Transformer
CN115830401A (en) Small sample image classification method
CN113192076B (en) MRI brain tumor image segmentation method combining classification prediction and multi-scale feature extraction
CN115995040A (en) SAR image small sample target recognition method based on multi-scale network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination