CN110084252B

CN110084252B - Deep learning-based diabetic retinopathy image labeling method

Info

Publication number: CN110084252B
Application number: CN201910357674.XA
Authority: CN
Inventors: 万程; 叶辉; 周鹏; 陈志强; 吴陆辉; 华骁
Original assignee: Shanghai Keruike Pharmaceutical Technology Co ltd
Current assignee: Shanghai Keruike Pharmaceutical Technology Co ltd
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2023-09-29
Anticipated expiration: 2039-04-29
Also published as: CN110084252A

Abstract

The invention discloses a deep learning-based diabetic retinopathy image labeling method, which comprises the following steps: selecting a training sample and a test sample; sample pretreatment, which comprises the steps of cutting, overturning and normalizing sample images; constructing a deep full convolution neural network as an image encoder, initializing network parameters by using a transfer learning method, and then sending the preprocessed sample image data into the network to obtain feature vectors; and constructing a deep cyclic neural network LSTM as a vector decoder, and sending the obtained feature vector into the LSTM structure for decoding to obtain the labeling information of the sample image. The labeling information is a text description of focus point information in the sample image, can help doctors and patients to understand the image content more deeply, and improves diagnosis efficiency and accuracy. According to the invention, the deep convolutional neural network and the deep convolutional neural network are used for automatically labeling the diabetic retinopathy image, so that the comprehensibility of focus point information in the image is improved.

Description

Deep learning-based diabetic retinopathy image labeling method

Technical Field

The invention relates to the technical field of medical image processing, in particular to a diabetic retinopathy image labeling method based on deep learning.

Background

Currently, diabetes is an endocrine disease that severely affects human health, and it is expected that the number of patients will increase to 3.8 million in 2025, with morbidity and mortality being secondary to cardiovascular and cerebrovascular diseases and cancers. Diabetic Retinopathy (DR) is a common diabetic-related retinal complication that has a severe impact on vision and its incidence is highest. In addition, a well-known challenge of DR is that it does not have a clear clinical diagnosis at an early stage, making its diagnosis a great challenge, and patients often find themselves suffering from DR until later in the onset. DR has been statistically the leading cause of blindness in adults 20-74 years old, particularly in developed countries. At present, no good method for treating diabetes is provided except insulin injection, so that diabetic patients and complications caused by diabetes are more and more, and a plurality of patients suffer from DR. This places a great economic burden on society. However, timely diagnosis and treatment of DR can effectively alleviate pain of patients, and can significantly reduce blindness rate of patients and improve life quality of patients. Therefore, it is very important to develop an inexpensive and wide-ranging diagnostic algorithm to accurately and effectively diagnose DR.

In most of the current automatic diagnosis algorithms, diagnosis of diabetic retinopathy fundus images is mainly based on traditional manual methods to design extracted features, and then classifier construction is carried out. The diabetic retinopathy detection is performed using manual features including shape, color, brightness, and a priori knowledge. Because of the complicated process of manually extracting the features, the methods can only obtain good results on small data sets, and have low efficiency and poor robustness under the condition of large data sets. In recent years, due to the advent of large-scale data sets, deep learning has been vigorously developed, and many experts have applied it to medical image processing and have achieved good results. There are also many experts in DR diagnosis that use deep learning methods to obtain good results, but these algorithms simply classify DR lesions and do not provide diagnostic results in a manner that is easy for the patient and doctor to understand. The image annotation can well solve the problem, the generated text annotation can indicate focus information in the image, doctors and patients can be helped to feel the content of the image more intuitively, and the probability of misdiagnosis and missed diagnosis is reduced.

Disclosure of Invention

The invention aims to: aiming at the defects of the prior art, the invention provides a diabetic retinopathy image labeling method based on deep learning, which is used for sending a diabetic retinopathy image into a deep neural network, generating a diagnostic text description for the image and intuitively displaying focus point information in the diabetic retinopathy image.

The technical scheme is as follows: the invention relates to a deep learning-based diabetic retinopathy image labeling method which is characterized by comprising the following steps of:

(1) Selecting all fundus images in the DIARETDB0 data set and the DIARETDB1 data set and partial normal fundus images in the Messacor data set as original data samples;

(2) And (3) preprocessing an image, and expanding, clipping and normalizing the bottom-eye image. (3) Making a text label, wherein each fundus image generates visual text description according to the real various focus point label data;

(4) Constructing a deep neural network, wherein the deep neural network comprises a deep convolutional neural network for image data coding and a deep cyclic neural network for decoding, and text labels can be generated by the deep neural network for a given sample image;

(5) Initializing deep convolutional neural network parameters by using a transfer learning method, and randomly initializing the deep convolutional neural network parameters;

(6) Sending the preprocessed data into a deep neural network, and repeatedly and iteratively updating network parameters to enable the generated label to be as close to a real label as possible;

(7) And testing on a test set by using the trained diabetic retinopathy image labeling model, verifying the model effect, and taking the model with the best result as a final model.

Further perfecting the technical scheme, the preprocessing of the image in the step (2) specifically comprises the following steps: first scaling the original high resolution image (1152×1500) to a size (299×299) suitable for the network to extract global features of the image; scaling the image to obtain a sub-image with size of 768×1000, randomly cutting out an image with size of 598×897 from the sub-image, and cutting out the image into 6 images with size of 299×299 in a grid form of 2×3 to extract local features of the image; and finally, carrying out normalization processing on the obtained 7 299×299 images. So that the network fully extracts the features of the image.

Further, in the step (3), 5 words with similar meaning are generated according to the true focus point label to describe the image.

Further, the deep neural network constructed in the step (4) comprises a deep convolutional network and a deep cyclic neural network, wherein the deep convolutional network is the acceptance-V3, and the deep cyclic neural network is the RCNN. The deep convolutional neural network comprises a convolutional layer, a pooling layer, an activating layer and a Batch Normalization layer, 7 parallel full-connection layers are connected behind the deep convolutional network, 7 outputs after the deep convolutional network are respectively processed to obtain 7 vectors with 512 dimensions, and then the characteristic vectors used for representing the image are obtained through weighted summation. And cascading the long and short time sequence memory recurrent neural networks (LSTM) in the time dimension to obtain the deep recurrent neural network. For specific input, a word is generated at each moment, and finally, a consistent text description is obtained.

Further, in the step (5), initializing network parameters by using a transfer learning method, initializing the network parameters by using weights obtained by training on an image data set of ImageNet, and in the training process in the step (6), the pre-training parameters remain unchanged, and only the randomly initialized parameters, namely, parameters of 7 fully connected layers and parameters of a deep cyclic neural network are updated.

Further, in order to verify the effectiveness of the diabetic retinopathy image labeling model and obtain the model with the best performance, the step (5) is repeated, so that the model has the best effect.

The beneficial effects are that: 1. in the image preprocessing, the diabetic retinopathy image is processed into 7 sub-images which respectively contain global information and local information, the convolution network can more fully extract the characteristics of the image, and can provide more complete focus characteristics for the text labeling process, so that the accuracy of the characteristics is ensured.

2. The transfer learning initialization weight is used, and the initialization weight is derived from a large public data set ImageNet, so that the network can fully extract the characteristics of the image and no overfitting is generated.

3. The generated text labels can indicate focus information in the diabetic retinopathy image, can give doctors and patients visual feelings, and can more easily understand the information in the image.

Drawings

Fig. 1: the invention is realized in a flow chart.

Fig. 2: the focal point illustrates a graph embodiment.

Detailed Description

The technical scheme of the invention is described in detail below through the attached drawings, and the technical scheme in the embodiment of the invention is clearly and completely described. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

Example 1: the deep learning-based diabetic retinopathy image labeling method provided by the invention automatically generates character labels for four focus points (microangioma, bleeding points, hard exudates and soft exudates as shown in figure 2) in a diabetic retinopathy image, and the specific operation of the embodiment is carried out according to the following steps:

1. selecting a data set

(1) DIARETDB0 and DIARETDB1 datasets

The gearetdb 0 and the gearetdb 1 are two common databases, which are color fundus images for DR detection collected by kuo Pi Aoda institute of medicine. The main purpose of designing two data sets is to be able to define a unified assessment strategy by the data sets to assess the performance of different DR lesion diagnostic or detection algorithms. The DIARETDB0 included 130 color fundus images, 20 of which were normal, did not contain any focal points, and 110 of which were at least one of microangioma, bleeding point, hard exudate, soft exudate, and neovascular, each image was 1500X 1152 in size. One of the images, which is also the only one, contains only new blood vessels, and is irrelevant to our study, so we have deleted it. The DIARETDB1 consisted of 89 color fundus images, 5 normal images, 84 images containing at least one of four focal points, each image also 1500X 1152 in size. All images in both data sets were taken using a 50 degree field of view digital fundus camera. A total of 218 images were obtained from the two datasets, 25 normal images, 193 diseased images, with uneven sample distribution.

(2) Messidar dataset

The dataset has 1200 color fundus images, wherein 540 normal images and 660 diseased images comprise three resolutions: 1440X 960, 2240X 1488 and 2304X 1536 images are in TIF format. In order to solve the problem of uneven sample distribution, all 152 normal images from the images with the data set resolution of 2240×1488 are selected and added into experimental data. Finally we obtain 177 normal images, 193 diseased images.

2. Data preprocessing

In order to enable the deep convolutional neural network to fully extract image features, the image is processed into 7 sub-images before being sent into the network, one sub-image is directly scaled from an original image, global features of the image are extracted, and the other 6 sub-images are obtained in the following way: firstly, an original image is scaled to obtain a 1000 multiplied by 768 image, then an image with the size of 897 multiplied by 598 is randomly cut out of the image, the image is cut into 6 sub-images according to a grid form of 2 multiplied by 3, and the obtained 6 sub-images amplify the image to a certain extent, so that the risk of network overfitting is reduced. And finally, carrying out normalization processing on the obtained 7 sub-images, and converting the numerical values of the 7 sub-images to be between-1 and 1 for accelerating the convergence of the network. This section finally outputs 7 normalized color images of 299×299 in size for each color fundus image of 1500×1152 in input size.

3. Building a model

The model of the invention comprises two parts, namely an encoder and a decoder, wherein the encoder is an acceptance-V3 network, the decoder is a circulating neural network LSTM which is well-behaved in a sequence generating task, and the encoder and the decoder are connected through a full connection layer. The acceptance-V3 network used by the encoder captures the champion in 2014 ImageNet competition, and is different from the competition network in the feature vector dimension which needs to be output in the method, so that the last full-connection layer in the original network is replaced to meet the requirement of the method. 7 sub-images obtained through preprocessing are subjected to an acceptance-V3 network to obtain 7 4096-dimensional vectors, 7 512-dimensional vectors are obtained through 7 full-connection layers respectively, and finally a 512-dimensional feature vector is obtained through weighted summation, namely the output of an encoder and the input of a decoder. The decoder is implemented using a cyclic convolutional neural network, which receives as input the eigenvectors of the encoder output, which is also the only input. Word vectors are then output continually by autogenous feedback until a complete utterance is generated or the upper loop limit is reached.

4. Training model

In training the model, 370 pieces of data are divided into a training set and a test set according to a ratio of 4:1. The training data is sent to the network for training, and the network parameters are updated through iteration until the best model performance is achieved, wherein the pre-training network parameters are fixed, and the updating of the parameters is based on a predefined loss function. In the training process, each input can correspondingly output a series of word vectors, the cross entropy loss between the generated word vectors and the real word vectors is used as a loss function for updating the parameters of the model, and the parameters of each layer are updated through a gradient descent method until the model converges. To verify model performance, the experiment used 5 fold cross-validation. Each time of cross validation, an experimental sample is selected at equal intervals, one sample is selected as a test sample every 5 samples, the rest samples are used as training samples, and the steps are repeated for 5 times, wherein each time is one of 1,2,3,4 and 5. And finally, averaging the experimental results, and verifying the performance of the model.

5. Experimental result processing

The method can automatically generate the text description of the focus point for the input diabetic retinopathy image, and evaluate the accuracy of the generated description by using a keyword matching mode, namely whether the generated text description contains the focus point in the real text description. And obtaining corresponding evaluation indexes by calculating the matching accuracy, specificity and sensitivity of each focus point and the whole. After cross-validation, each performance metric produced 5 results, which were averaged to obtain the final result. Experimental results show that the diagnosis precision of the method can reach more than 90%.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The image labeling method for the diabetic retinopathy based on deep learning is characterized by comprising the following steps of:

(2) Image preprocessing, namely expanding, cutting and normalizing the bottom-eye image;

(3) Making a text label, wherein each fundus image generates visual text description according to the label data of various true focus points;

(6) Sending the preprocessed sample data into a deep neural network, and repeatedly and iteratively updating network parameters to enable the generated label to be as close to a real label as possible;

(7) Testing on the test set with trained diabetic retinopathy image labeling model, verifying model effect, taking the model with best result as final model,

the preprocessing of the image in the step (2) specifically includes: first scaling the original high resolution image 1152×1500 to a network-appropriate size 299×299 to extract global features of the image; scaling the image to obtain a sub-image with size of 768×1000, randomly cutting out an image with size of 598×897 from the sub-image, cutting out the image into 6 images with size of 299×299 in a grid form of 2×3 to extract local features of the image; finally, the normalization processing is carried out on the obtained 7 299 times 299 images, so that the network fully extracts the characteristics of the images,

in the step (3), 5 words with similar meaning are generated to describe the image according to the true focus point label of the image,

the deep neural network constructed in the step (4) comprises a deep convolutional network and a deep cyclic neural network, wherein the deep convolutional network is an acceptance-V3, and the deep cyclic neural network is an RCNN; the deep convolutional neural network comprises a convolutional layer, a pooling layer, an activating layer and a Batch Normalization layer, 7 parallel full-connection layers are connected behind the deep convolutional network, 7 outputs after the deep convolutional network are respectively processed to obtain 7 vectors of 512 dimensions, and then the vectors are weighted and summed to obtain a feature vector for representing an image; cascading the long and short time sequence memory cyclic neural network LSTM in the time dimension to obtain a deep cyclic neural network; for specific input, a word is generated at each moment, and finally, a consistent text description is obtained.