CN110720888A

CN110720888A - Method for predicting macular edema lesion of fundus image based on deep learning

Info

Publication number: CN110720888A
Application number: CN201910968469.7A
Authority: CN
Inventors: 颜成钢; 朱嘉凯; 王兴政; 陈安琪; 孙垚棋; 张继勇; 张勇东
Original assignee: HANGZHOU ELECTRONIC SCIENCE AND TECHNOLOGY UNIV
Current assignee: Hangzhou Dianzi University; HANGZHOU ELECTRONIC SCIENCE AND TECHNOLOGY UNIV
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2020-01-24

Abstract

The invention discloses a method for predicting macular oedema lesion of fundus image retina based on deep learning. The dataset used in the present invention was from the Messidor study program funded in the techon-VISION program 2004 in france, which employed hard exudates in the fundus map for assessing the risk of macular edema. After preprocessing the fundus images in the data set, only the photographed fundus images are required to be cut to be uniform in size, then Gaussian blur is carried out to extract detail features, and the simply processed images are used as the input of a convolutional neural network for training. The invention can obtain higher accuracy of predicting the pathological changes, and meanwhile, the detection time can be greatly shortened. The classification accuracy and the detection time are greatly improved compared with the traditional detection method.

Description

Method for predicting macular edema lesion of fundus image based on deep learning

Technical Field

The invention relates to the field of deep learning computer vision, in particular to a method for predicting macular edema disease of fundus oculi image retina.

Technical Field

Macular edema (macular edema) refers to the severe vision loss caused by the inflammatory reaction and liquid infiltration of the macular area of the most sensitive part of the fundus retina to light, and is one of the important reasons for the hypopsia of the retina due to the obstruction of the central retinal vein and diabetic retinopathy. Macular edema should generally be treated for the primary disease, and inflammation-diminishing treatments should be given for intraocular inflammation. If the fundus picture of the patient is checked at the beginning of illness, and whether the risk of suffering from macular edema is judged and predicted by extracting certain characteristics in the fundus picture, the patient can be treated by the medicine in time before the onset of illness, so that the morbidity and the blindness rate are greatly reduced.

The degradation problem of the deep network at least accounts for the fact that the deep network is not easily trained. But consider the fact that: now there is a shallow network, and to build a deep network by stacking new layers upwards, an extreme case is that these added layers do not learn anything, just to duplicate the features of the shallow network, i.e. so that the new layers are Identity mapping. In this case, the deep network should have at least the same performance as the shallow network and should not be degraded. This interesting assumption allows doctor Kaiming He to catch up and He proposes residual learning to solve the degradation problem.

In 2015, the doctor team of Kaiming He proposed a deep residual error network (ResNet), and once it came out, the image classification, detection and localization of three champions were obtained in ImageNet. The residual network is easier to optimize than other deep learning networks and can improve accuracy by adding comparable depth. The core of the method is to solve the problem of 'degeneration' of side effects caused by depth increase through a residual learning structure. But after the residual error learning structure is used, the network performance can be improved by simply increasing the network depth, namely the training and testing accuracy is improved.

Disclosure of Invention

The invention provides a method for predicting macular oedema lesion of fundus image based on deep learning. The method can be used for detecting the risk of macular edema lesion of the fundus.

The dataset used in the present invention was from the Messidor study program funded in the techon-VISION program 2004 in france, which employed hard exudates in the fundus map for assessing the risk of macular edema. After preprocessing the fundus images in the data set, a full-connection network is added at the front end of a ResNet network with 50 layers for classification, and the classification accuracy and the detection time are greatly improved compared with the traditional detection method.

The traditional fundus diagram macular oedema lesion prediction adopts an image characteristic extraction method, the fundus diagram of a patient adopts an image processing mode to detect the symptoms of hard exudates in the fundus diagram, whether the hard exudates exist and the corresponding number of the hard exudates are observed, then a multi-layer sensor is used for classification to give a detection result, and the traditional detection method has the defect that the accuracy rate of the obtained hard exudates is generally low. Therefore, the invention can obtain higher accuracy of predicting the lesion, and the detection time can be greatly shortened.

The fundus image retina macular edema lesion prediction method based on deep learning only needs to cut a photographed fundus image to unify the size of the image, then Gaussian blur is carried out to extract detail features, and the simply processed image is used as the input of a convolutional neural network for training, and the method specifically comprises the following steps:

step 1: screening of data sets

The data set was from the Messidor study sponsored by france in the techon-VISION project of 2004; in this data set, researchers have used hard exudates in the fundus image to assess the risk of macular edema, i.e., the presence of exudates if observed suggests a risk of disease.

Step 2: the method comprises the following steps of:

(2-1) loading the fundus image and estimating the radius of the eyeball;

(2-2) taking the larger radius as a target radius, and clipping the original fundus image according to the target radius;

(2-3) blurring the clipped fundus image, and then subtracting the blurred fundus image from the original fundus image to obtain a simple characteristic extraction image of the fundus;

(2-4) eliminating the fuzzy boundary effect of the feature extraction image, and removing 10% of the excircle of the fundus image;

(2-5) further cutting the processed feature extraction graph, wherein the cut image is an RGB image with the size of 256 x 256.

The loading of the fundus image and the estimation of the radius of the eyeball are specifically realized as follows:

(1) estimating the lateral radius of an eyeball, and extracting N lateral vectors of the fundus map on the assumption that the size of the fundus map is M x N;

(2) averaging the corresponding pixel values on the transverse vector, dividing by 10, and comparing with the original pixel values;

(3) if the original pixel value is larger than the calculated value, the value is 1, and the value obtained by dividing the number of the statistical values which are 1 by 2 is the estimated value of the transverse radius of the eyeball;

(4) the longitudinal eyeball radius estimation value is calculated by the same method, and a value with larger transverse direction and longitudinal direction is selected as the final eyeball radius estimation value.

And step 3: data set expansion

Since the majority of the data set is samples of healthy fundus, the samples at risk of disease are only a few. In consideration of the problem of training sample balance, the sample with hard exudates in part is subjected to image expansion processing, and specific operations comprise image mirroring and rotation.

And 4, step 4: fabrication of dataset tags

Because the convolutional neural network adopts supervised learning, the corresponding categories of each image in the data set after data expansion are respectively as follows: the label number corresponding to the normal image is 0, and the label number corresponding to the image at risk of disease is 1. And during neural network training, performing one-hot coding on the corresponding label number, namely coding 0 into 01 and coding 1 into 10.

And 5: construction of training and test sets

The data set is divided into a training set and a test set using the train _ test _ split () function in sklern, where the training set accounts for 80% and the test set accounts for 20% of the data set.

Step 6: the construction of the convolutional neural network comprises the following specific steps:

(6-1) constructing basic residual modules, wherein one basic residual module consists of 3 convolution layers, 2 activation function layers, 3 BN (batch normalization) layers and 1 jump connection layer, and finally 1 activation function layer is arranged;

(6-2) constructing residual modules with the dimension-increasing function, wherein one residual module with the dimension-increasing function consists of 2 convolution layers, 3 activation function layers, 3 BN (batch normalization) layers and 1 jump connection layer, and the jump connection layer comprises 1 convolution layer and 1 activation function layer and finally 1 activation function layer;

(6-3) front-end construction of convolutional neural network, which performs feature extraction on image using multiple residual modules, with network input size of 256 × 3, followed by 1 zero-padding layer (parameter 3 × 3), one convolutional layer (filter number 64, convolutional kernel size 7, step size 2 × 2), BN layer (axis value 3, momentum value 0.99, epsilon value 0.001), one activation function layer (function used is relu) and one maximum pooling layer (convolutional kernel size 3 × 3, step size 2 × 2, zero-padding is valid, output size 128 × 64), then followed by 1 residual module with ascending function (output size 63 × 63, 256), 2 basic residual modules, 1 residual module with ascending function (output size 32), and 3 basic residual modules, 1 residual module with an ascending-dimension function (output size is 16 × 1024), 5 basic residual modules, 1 residual module with an ascending-dimension function (output size is 8 × 2048) and 2 basic residual modules, and finally the residual module is composed of an activation function layer (the used function is relu) and an average pooling layer (convolution kernel size is 7 × 7, step size is 7 × 7, zero padding is valid), and the size of the front-end output of the network is 1 × 2048;

(6-4) constructing the back end of the convolutional neural network, classifying the image by using a plurality of fully-connected layers at the front end of the network, firstly reducing the dimension of the image by using one Fatten layer, then carrying out 1 fully-connected layer (the number of nodes is 36, the activation function is relu), 1 Dropout layer (the rate is 0.25), 1 fully-connected layer (the number of nodes is 26, the activation function is relu) and 1 fully-connected layer (the number of nodes is 2, the activation function is softmax), and finally outputting the size of the network by 2 x 1.

And 7: network training, wherein a loss function used by a network is Cross Entropy Cross Engine, a Gradient optimization algorithm used by the network is Stochastic Gradient Descent, the learning rate set by the algorithm is 0.025, the network is trained by using a training set, the iteration number of the network training is 2000, and the number of batch samples is 4;

and 8: and (4) network testing, namely storing the trained model by using a model & save () function, generating a model weight file of h5, and testing the network by using a test set.

The invention has the following beneficial effects:

the invention can obtain higher accuracy of predicting the pathological changes, and meanwhile, the detection time can be greatly shortened. Compared with the traditional detection method, the accuracy of classification and the time for detection are greatly improved;

the method has the characteristics of high prediction speed and high precision, the best prediction precision can be achieved by using the Gaussian blur with the mean value of 0 and the standard deviation of 15, and the maximum precision value of the algorithm test is 0.9933.

Drawings

FIG. 1 is a schematic diagram of basic residual modules;

FIG. 2 is a schematic diagram of a residual module with a dimension-raising function;

FIG. 3 is a diagram of image preprocessing steps and effects;

FIG. 4 is a flow chart of a lesion detection algorithm;

FIG. 5 is an iterative graph of algorithm training accuracy and test accuracy.

Detailed Description

The objects and effects of the present invention will become more apparent from the following detailed description of the present invention with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of basic residual modules. The residual module consists of 3 convolution layers, 2 activation function layers, 3 BN (batch normalization) layers and 1 jump connection layer, and finally, 1 activation function layer. The specific parameters of the network are: the image is input first, the jump connection layer on the left side does not process the image, the convolution layer (convolution kernel size is 1 x 1, step size is 1 x 1, padding is valid), the BN layer (axis value is 3, momentum value is 0.99, epsilon value is 0.001), the activation function layer (used function is relu), the convolution layer (convolution kernel size is 3, step size is 1 x 1, padding is same), the BN layer (axis value is 3, momentum value is 0.99, epsilon value is 0.001), the activation function layer (used function is relu), the convolution layer (convolution kernel size is 1 x 1, step size is 1 x 1, padding is valid), the BN layer (axis value is 3, momentum value is 0.99, epsilon value is 0.001), the current result and the input result are added with the image, and the final result is output as a superposition function (used function).

Fig. 2 is a schematic diagram of a residual module with a dimension-raising function. The residual module consists of 2 convolution layers, 3 activation function layers, 3 BN (batch normalization) layers and 1 jump connection layer, wherein the jump connection layer comprises 1 convolution layer and 1 activation function layer, and finally, 1 activation function layer. The specific parameters of the network are: inputting an image, sequentially adding a convolution layer (convolution kernel size is 1 x 1, step size is 1 x 1, padding is valid), a BN layer (axis value is 3, padding is 0.99, epsilon value is 0.001), an activation function layer (function used is relu), a convolution layer (convolution kernel size is 3 x 3, step size is 1, padding is same), a BN layer (axis value is 3, momentum value is 0.99, epsilon value is 0.001), an activation function layer (function used is relu), a convolution layer (convolution kernel size is 1 x 1, step size is 1 x 1, padding is valid), a BN layer (axis value is 3, momentum value is 0.99, epsilon value is 0.001), a connection layer on the right side is formed by adding 1 convolution layer (convolution kernel size is 1, padding is valid), and a BN layer (axis value is 3, momentum value is 0.99, epsilon value is 0.001), and a connection layer on the right side is formed by adding 1 convolution kernel size (convolution kernel size is 1, step size is 1, and the concatenation layer is 0.001), and the activation function used is added, and the left and right side is added with a convolution kernel size (convolution kernel size) and the result is 0.1. the next, and the activation function used is added, the left side is added to form a superposition value is 0.001), and finally, outputting the result.

FIG. 3 is a diagram of image preprocessing steps and effects. The treatment steps specifically include:

(1) inputting a fundus image;

(2) estimating the radius of an eyeball, taking the larger radius as a target radius, and clipping the original image according to the target radius;

(3) carrying out fuzzy processing on the cut image;

(4) subtracting the blurred image from the original image to obtain a simple characteristic extraction image of the fundus;

(5) eliminating boundary effect caused by fuzzy processing, and removing 10% of excircle of fundus image;

(6) and (4) further cutting the image, wherein the cut image is an RGB image with the size of 256 × 256.

Fig. 4 is a flowchart of a lesion detection algorithm, and the processing steps specifically include:

(1) primarily screening data;

(2) preprocessing a sample image, wherein the preprocessing flow and effect are shown in figure 3;

(3) data expansion, wherein the used operations are mirror image and rotation respectively, and the rotation angles are 90 degrees, 180 degrees and 270 degrees respectively;

(4) dividing the data set into a training set and a test set, wherein the division ratio is 4: 1;

(5) constructing a diabetic retinopathy detection network, wherein basic residual module schematic diagrams and residual module schematic diagrams with a dimensionality raising function are mainly used and are shown in figures 1 and 2;

(6) network training, wherein a loss function used by a network is Cross Entropy Cross Engine, a Gradient optimization algorithm used by the network is Stochastic Gradient Descent, the learning rate set by the algorithm is 0.025, the network is trained by using a training set, the iteration number of the network training is 2000, and the number of batch samples is 4;

(7) and (5) testing the network.

FIG. 5 is an iterative graph of algorithm training accuracy and test accuracy. The algorithm can achieve the best prediction accuracy by using the Gaussian blur with the mean value of 0 and the standard deviation of 15, the maximum training accuracy of the algorithm is 1, and the maximum test accuracy value is 0.9933.

Claims

1. The method for predicting the macular edema lesion of the fundus image based on the deep learning is characterized by comprising the following steps:

step 1: screening of data sets

The data set was from the Messidor study sponsored by france in the techon-VISION project of 2004; researchers used to assess risk of macular edema in this data set based on hard exudates in the fundus map;

step 2: the method comprises the following steps of:

(2-1) loading the fundus image and estimating the radius of the eyeball;

(2-5) further cutting the processed feature extraction image, wherein the cut image is an RGB image with the size of 256 × 256;

and step 3: data set expansion

Most of the data set is samples of healthy eyeground, the samples with disease risk only account for a few, and the samples with hard exudates are subjected to image expansion processing, wherein the specific operation comprises image mirroring and rotation;

and 4, step 4: fabrication of dataset tags

Because the convolutional neural network adopts supervised learning, the corresponding categories of each image in the data set after data expansion are respectively as follows: the label number corresponding to the normal image is 0, and the label number corresponding to the image with the disease risk is 1; during neural network training, one-hot coding is carried out on the corresponding label number, namely, 0 is coded as 01, and 1 is coded as 10;

and 5: construction of training and test sets

Dividing the data set into a training set and a testing set by using a train _ test _ split () function in sklern, wherein the training set accounts for 80% of the data set, and the testing set accounts for 20%;

step 6: building a convolutional neural network;

2. The method for predicting macular oedema lesion in fundus image based on deep learning as claimed in claim 1, wherein the construction of the convolutional neural network in step 6 comprises the following specific steps:

(6-1) constructing basic residual modules, wherein one basic residual module consists of 3 convolution layers, 2 activation function layers, 3 BN layers and 1 jump connection layer, and finally 1 activation function layer is arranged;

(6-2) constructing residual modules with the dimension-increasing function, wherein one residual module with the dimension-increasing function consists of 2 convolutional layers, 3 activation function layers, 3 BN layers and 1 jump connection layer, the jump connection layer comprises 1 convolutional layer and 1 activation function layer, and finally 1 activation function layer is arranged;