CN111753820A

CN111753820A - Color fundus image cup segmentation method based on deep learning

Info

Publication number: CN111753820A
Application number: CN201910236304.0A
Authority: CN
Inventors: 肖志涛; 耿磊; 张新新; 吴骏; 张芳; 刘彦北; 王雯; 王曼迪
Original assignee: Tianjin Polytechnic University
Current assignee: Tianjin Polytechnic University
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2020-10-09

Abstract

The invention relates to a color fundus image cup segmentation method based on deep learning. The method comprises the following steps: 1) inputting a fundus image; 2) the method comprises the steps that a Seg-ResNet network is utilized to firstly divide a video disc, a divided video disc area is used as an interested area of the video cup division, then the Seg-ResNet network is utilized to divide the video cup of the video disc area, the network is based on a residual error basic structure, channel weighting is carried out by considering the relation among characteristic channels, the dependency relation among the channels is modeled, the characteristic response value of each channel is adjusted in a self-adaptive mode, and the characteristics of a plurality of layers are fused, so that the position information of pixel points is positioned while the image semantic information is captured; 3) and outputting the cup segmentation result by using a Seg-ResNet network. The test of the optic cup segmentation is carried out on the public data sets Glaucomarepo and Drishti-GS, and the result shows that the segmentation accuracy and the algorithm robustness are improved by the test result.

Description

Color fundus image cup segmentation method based on deep learning

Technical Field

The invention relates to a color fundus image cup segmentation method based on deep learning, which relates to image processing, deep learning and a residual error network and can segment a color fundus image cup.

Background

Glaucoma is an optic nerve disease in which optic nerve fibers are damaged due to an increase in intraocular pressure, has irreversible activity, and has a very high blinding rate, second only to cataracts. Clinically, glaucoma can be preliminarily diagnosed by measuring parameters such as cup-disc ratio and the like, and the detection of the optic cup is helpful for establishing a retinal coordinate system, so that retinal abnormalities such as drusen, exudates, bleeding points and the like and the positions of the retinal abnormalities can be further judged according to the retinal coordinate system. The automatic detection technology of the color fundus image vision cup provides a stable, accurate and efficient solution for the intelligent diagnosis of ophthalmic diseases. Therefore, in recent years, importance has been placed on the accuracy of cup segmentation of fundus images in the diagnostic study of retinal optic nerve diseases. Although the development of medical image processing technology makes the fundus image processing technology a great breakthrough and some research results are obtained in the field of fundus disease diagnosis, the accuracy and stability of the traditional medical image processing algorithm cannot be guaranteed under the conditions of complex problems and more nonlinear conditions. Factors such as small cup area, low contrast with optic discs, unobvious edge, serious blood vessel occlusion and the like in the fundus image lead the requirement of a medical image auxiliary diagnosis system to be difficult to meet by using the traditional algorithm to segment the cup in the fundus image, and a great space is still promoted.

The feature extraction based on deep learning is carried out on a large number of data sets from different shooting environments, the abstract features of the images are learned in a self-adaptive mode, the segmentation performance of different data sets is stable, and the problems that the traditional method is complex in operation and low in robustness are solved. The deep learning segmentation framework solves the problem of semantic level image segmentation, so that each pixel in an image can be subjected to target and background binary classification, and the prediction result of each pixel is predicted end to end. Compared with the traditional technology, the deep learning-based optic cup segmentation method has made a great breakthrough, but due to the particularity of the medical image, the requirement of a large-scale data set required by deep network training is difficult to meet, so that the current deep learning-based optic cup segmentation precision is not high.

Disclosure of Invention

The invention aims to improve the segmentation precision of the optic cup in a color fundus image, provides an optic cup segmentation method based on a deep learning technology by combining a transfer learning strategy aiming at the existing small-scale data set, and adopts the following technical scheme for the invention:

1. inputting a fundus image, and extracting a region-of-interest video area as a cup segmentation area;

2. performing cup segmentation on the video area;

3. optimizing a residual error structure;

4. adjusting the response value of each characteristic channel by adopting a channel weighting structure;

5. fusing the high-level features with the bottom-level features;

6. and outputting the cup segmentation result by using a Seg-ResNet network.

In the step 1: because the optic cup area is small and is close to the brightness characteristic of the optic disc, in order to avoid the interference of the optic disc on the optic cup segmentation, the optic disc needs to be segmented firstly before the optic cup is segmented, and the optic disc area is used as the interesting area of the optic cup segmentation.

In the step 2: and performing cup segmentation by taking the segmented optic disc area as an interesting area.

In the step 3: optimizing residual block computation, replacing 3 × 3 two-layer convolution with 1 × 1+3 × 3+1 × 1 three-layer convolution, firstly reducing dimensionality under one 1 × 1 convolution filtering to reduce computation, optimizing the middle 3 × 3 convolution layer in the structure, and finally restoring dimensionality through another 1 × 1 convolution filtering.

In the step 4: in the compression stage, the two-dimensional feature map output by each layer is compressed to one dimension by using global average pooling, and the one-dimensional features obtained by compression can reflect global receptive field information of the input feature channels; in the excitation stage, learnable weights are generated for the compressed characteristic channels, and the nonlinear modeling can be explicitly carried out on the correlation among the characteristic channels through excitation; the obtained output weight represents the importance degree of each characteristic channel after compression and excitation in the steps, a group of characteristics are directly output after network convolution in the weighting stage, and then the weight obtained by excitation is weighted to the directly output characteristics channel by channel through multiplication, so that the original characteristic channel is reweighed.

In the step 5: using convolution of 1 × 1 to reduce the dimension of the high-dimension feature map in the encoding stage to the same dimension as the feature map in the decoding stage for fusion operation, using convolution kernel of 1 × 1 to reduce the dimension of the feature map, and then performing feature fusion on the feature map with the same size in the bottom layer in the same manner as described above.

In the step 6: and detecting the data sets Glaucomarepo and Drishti-GS by using a Seg-ResNet network, and outputting a segmentation result.

Compared with the prior art, the invention has the beneficial effects that:

1. avoiding gradient explosion or gradient disappearance. The identity mapping superimposes the input onto the convolution output, skipping the concatenation of layers.

2. And (6) optimizing and calculating. The three-layer convolution of 1 × 1+3 × 3+1 × 1 is used instead of the two-layer convolution of 3 × 3.

3. Simple structure and easy realization. And explicitly carrying out nonlinear modeling on the correlation among the channels by considering the relation among the channels of the feature diagram after convolution.

4. The segmentation precision and robustness of the optic cup are improved.

Drawings

FIG. 1 is a diagram of a deep learning network structure Seg-ResNet;

FIG. 2 is a diagram of an optimized residual structure;

FIG. 3 is a schematic diagram of an SE module;

FIG. 4 is a high-level feature and low-level feature fusion graph;

FIGS. 5(a1) - (a3) are graphs of segmentation results in the Glaucomara database severely occluded by blood vessels;

FIGS. 5(b1) - (b3) are graphs of segmentation results with low cup-to-disk contrast in the Glaucomarepo database;

FIGS. 5(c1) - (c3) are graphs of segmentation results of Glaucomara database with different sizes of view cup areas;

FIGS. 5(d1) - (d3) are the segmentation results of the Drishti-GS database with severe occlusion by blood vessels;

FIGS. 5(e1) - (e3) are graphs showing the segmentation results of low cup-to-disk contrast in the Drishti-GS database;

FIGS. 5(d1) - (d3) are diagrams of segmentation results in the Drishti-GS database with different sizes of view cup areas;

Detailed Description

The deep learning network structure of the invention is shown in fig. 1, and the Seg-ResNet comprises 10 SERes modules, 5 maximum pooling layers, 4 upsampling layers and a loss function layer. The convolution kernel in the network has two sizes, 1 × 1 and 3 × 3, respectively, with a step size of 1, and each layer of convolution is followed by activation using the ReLU function to increase the non-linear capability of the network. The convolution size of the maximum pooling layer is 2 x 2 with a step size of 2. SERes in the network structure are modules combining a residual error structure and a channel weighting structure, each SERes module comprises three convolution layers, and a BatchNorm layer is used in convolution to carry out batch normalization processing on data. Firstly, using ResNet as a basic network, adopting a channel weighting structure to adjust the response value of each characteristic channel, and fusing the high-level characteristics and the bottom-level characteristics to form a new network Seg-ResNet. The following describes a specific implementation process of the technical solution of the present invention with reference to the accompanying drawings.

1. Inputting the fundus image, extracting the region of interest

The video area is used as a region for cup segmentation, and the video area of the region of interest is extracted by using data in public data sets Glaucomarepo and Drishti-GS as input fundus images.

2. And performing cup segmentation on the video disc area, and performing cup segmentation on the obtained video disc area as an interested area.

3. Optimized residual structure

The residual structure model may utilize identity mapping to degenerate the deep network into a shallow network. The optimized residual structure is shown in fig. 2.

4. Adjusting response value of each characteristic channel by adopting channel weighting structure

The channel weighting is to look for the dependency relationship of each channel in consideration of the dimension of the feature map, model the importance degree of different feature channels in a learning mode, selectively enhance the main features and inhibit the unimportant features. The output of the conventional neural network ignores the dependency relationship among channels, and the Seg-ResNet is designed based on a channel weighting unit module, so that useful features in a feature diagram obtained by the network convolution are more fully utilized, and the useless features are suppressed. In the compression stage, the two-dimensional feature map output by each layer is compressed to one dimension by using global average pooling, and the one-dimensional features obtained by compression can reflect global receptive field information of the input feature channels; the excitation phase generates learnable weights for the compressed feature channels, similar to the mechanism of gates in a recurrent neural network. The correlation among the characteristic channels can be explicitly modeled in a non-linear mode through excitation; the weighting stage passes through a compression stage and an excitation stage, the obtained output weight represents the importance degree of each characteristic channel, after network convolution, a group of characteristics are directly output, then the weight obtained by excitation is weighted to the directly output characteristics channel by channel through multiplication, and the re-weighting of the original characteristic channel is completed. A schematic diagram of the SE module is shown in fig. 3.

5. Fusing high-level features with underlying features

(1) And restoring the characteristic graph size in the decoding stage. In the decoding stage, the feature map compressed in the encoding stage is restored through continuous up-sampling so as to enable the feature map to be consistent with the size of the bottom layer feature map to be fused;

(2) and reducing the dimension of the feature map in the encoding stage. The feature graph dimension of the encoding stage is higher than that of the decoding stage, and the feature graph output by the bottom layer of the 1 multiplied by 1 convolution kernel dimension reduction encoding stage is used for being conveniently fused with the feature graph recovered by the high layer;

(3) high-level and low-level features are fused. And fusing through the Concat layer after the sizes and dimensions of the characteristic graphs of the upper layer and the bottom layer are consistent. The high-level feature and low-level feature fusion map is shown in fig. 4.

6. Outputting cup segmentation result by using Seg-ResNet network

The parameters of the view cup segmentation network are set as follows: and optimizing training by adopting a random gradient descent algorithm with momentum, wherein the momentum value is 0.95, the learning rate is set to be 0.001, and the epoch value is 12.

The test image is input into the network and the final segmentation result of the optic cup is obtained as shown in fig. 5.

The image recognition method based on deep learning is used for cup segmentation, and the deep learning algorithm is adopted to train and analyze the color fundus image data, so that cup segmentation as accurate as possible can be realized. The technology is applied to medicine, and the intelligent diagnosis of ophthalmic diseases is greatly facilitated.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. All within the spirit and principles of the present invention. Any modifications, equivalents and the like which are made are intended to be included within the scope of the present invention.

Claims

1. The color fundus image cup segmentation method based on deep learning comprises the following steps:

(1) inputting a fundus image, and extracting a region-of-interest video area as a cup segmentation area;

(2) performing cup segmentation on the video area;

(3) optimizing a residual error structure;

(4) adjusting the response value of each characteristic channel by adopting a channel weighting structure;

(5) fusing the high-level features with the bottom-level features;

(6) and outputting the cup segmentation result by using a Seg-ResNet network.

2. The method of claim 1, wherein in step (3), ResNet based network is used to optimize residual block computation, and 1 x 1+3 x 3+1 x 1 tri-layer convolution is used to replace 3 x 3 bi-layer convolution, with dimensionality reduction computation being first performed under one 1 x 1 convolution filter, optimizing the middle 3 x 3 convolution layer in the structure, and dimensionality reduction being finally performed by another 1 x 1 convolution filter.

3. The method according to claim 1, wherein in step (4), a channel weighting structure is adopted to adjust the response value of each feature channel, the compression stage compresses the two-dimensional feature map output by each layer to one dimension by using global average pooling, and the one-dimensional features obtained by compression reflect global receptive field information of the input feature channels; in the excitation stage, learnable weights are generated for the compressed characteristic channels, and the nonlinear modeling is performed on the correlation among the characteristic channels through excitation explicit mode; the obtained output weight represents the importance degree of each characteristic channel after compression and excitation in the steps, a group of characteristics are directly output after network convolution in the weighting stage, and then the weight obtained by excitation is weighted to the directly output characteristics channel by channel through multiplication, so that the original characteristic channel is reweighed.

4. The method of claim 1, wherein in step (5), the high-level features are fused with the low-level features, the high-dimensional feature map in the encoding stage is reduced in dimension by using a1 x 1 convolution, the dimension is reduced to the same dimension as the feature map in the decoding stage for the fusion operation, the fusion is followed by reducing the dimension of the feature map by using a1 x 1 convolution kernel, and the feature fusion is then performed on the low-level feature map with the same dimension in the same manner as described above.