Disclosure of Invention
The invention aims to provide a fundus retinal image segmentation method based on a deep convolutional neural network, which reduces the number of model parameters, improves the receptive field and context understanding of a model, and improves the accuracy and speed of retinal blood vessels, optic discs and optic cup segmentation models.
In order to solve the technical problems, the invention provides the following technical scheme:
the invention relates to a method for preprocessing a fundus retina image, which comprises the following steps:
A. acquiring a retinal image dataset;
B. pretreatment: and sequentially performing gray processing, normalization, contrast-limited adaptive histogram equalization and gamma non-linearization processing on the retinal image to obtain a preprocessed image.
Preferably, the gradation processing includes the steps of:
decomposing the retina image into a red R channel, a green G channel and a blue B channel by adopting three RGB components, and fusing the images of the three channels according to a proportion to convert the images into a gray image;
the conversion formula is as follows:
Igray=0.299×r+0.587×g+0.114×b
in the formula: r, G, B represent the values of the R channel, G channel and B channel, respectively;
the normalization comprises the following steps:
let X be { X ═ X1,x2,...,xnIs the image dataset; carrying out dimension normalization on the data set X by using a Z fraction normalization method; the Z-score normalization formula is as follows:
Xnorm=(X-μ)/σ
where μ is the mean of x and σ is the standard deviation of x; at the moment, positive and negative values exist in x, and the mean value of x is 0 and the standard deviation is 1;
then carrying out minimum-maximum normalization on each image data in the data set X, and remapping the value of X into the range of 0-255; the min-max normalization formula is as follows:
in the formula, xi∈Xnorm,i∈[1,2,...,n]
The contrast-limited adaptive histogram equalization adopts a CLAHE algorithm to enhance the contrast between blood vessels and the background of the whole data set, and then gamma nonlinearity is used for carrying out nonlinear operation on the brightness or tristimulus values in the image to obtain a preprocessed image, wherein the gamma value is 1.2.
The invention relates to a method for segmenting optic disk and optic cup images based on a deep convolutional neural network, which comprises the following steps: a1, preprocessing images of the optic disc and the optic cup by the image preprocessing method;
b1, generator: carrying out cascade convolution on the preprocessed image by adopting convolution layers with different expansion rates, outputting a characteristic diagram, periodically rearranging information in the characteristic diagram through scale reconstruction of the characteristic diagram, and generating a segmented image of a video disc and a video cup with the same size as a target;
c1, discriminator: and constructing a discriminator which has the same convolution unit form as the generator, forming a generation confrontation network model by the discriminator and the generator, automatically adjusting the adjusting parameters and outputting a final image of the generator.
Preferably, the generator comprises an encoder and a decoder, and the training data samples of the generator are
x
iAs fundus image, y
iA label graph of the optic disc and the optic cup is shown;
h is the picture length, W is the picture width, C
1Is x
iNumber of channels of (C)
2Is y
iThe encoder performs downsampling by using cascade expansion convolution to obtain an output characteristic diagram
The decoder performs down-sampling by adopting scale reconstruction, and converts the characteristic diagram u into y
iThe same size is obtained, and characteristic graphs with the same size are obtained
The generator outputs the segmentation result C (x)
i);
The discriminator is used for training data sample image set
Hesheng (Chinese character of 'He')Image set generated by image forming device
Performing discrimination, and if the discriminator is used for collecting the image of the training data sample
Image collection generated by sum generator
If the training data sample image set is classified into one type, adjusting the adjusting parameters of the discriminator, and if the discriminator is used for classifying the training data sample image set into one type, adjusting the adjusting parameters of the discriminator
Image collection generated by sum generator
And if the image is divided into two types, adjusting the adjusting parameters of the generator, and enabling the generator to regenerate the segmentation image, wherein the calculation formula of the final image of the generator is as follows:
in the formula, thetaCParameters representing an encoder; thetaDA parameter representing a decoder; ex,y[]Representing the expected value.
Preferably, the cascade convolution cascades N expansion convolution layers each time, the size of the convolution kernel is k × k, and the expansion rates are [ r [ [ k ] respectively1,…,ri,…,rn]After N convolutional layers, all characteristic information in the receptive field area can be completely covered;
defining the maximum value of the expansion rate of the ith layer in the expansion convolution kernel as RiAnd an expansion rate ri≤Ri,RiThe calculation formula of (2) is as follows:
and (R)1-1)+(R2-1)<k is when Ri>0,Ri≠Ri+1(ii) a Expansion ratio r within each groupiThe relationship of a common factor cannot be formed; r isiRepresenting the expansion rate of the ith layer in the expanded convolution kernel, and k is the size of the convolution kernel.
The invention relates to a retinal blood vessel image segmentation method based on a deep convolutional neural network, which comprises the following steps of:
1. performing image preprocessing on retinal blood vessels by adopting the image preprocessing method;
2. the preprocessed image is subjected to down-sampling by an average pooling layer to form down-sampled images of different levels, then a common convolution layer is used for expanding a channel of the down-sampled image, and the down-sampled image after the level expansion and an output image of a multi-path FCN (fuzzy C-channel) model encoder of the previous level are fused into an input image of the level and input into the multi-path FCN model encoder;
3. the multipath FCN model encoder encodes an input image, obtains a blood vessel characteristic image in a retina image, forms blood vessel characteristic images of different levels, fuses the blood vessel characteristic image of the level and an output image of a next level multipath FCN model decoder into the input image of the level, inputs the input image into the multipath FCN model decoder, and the multipath FCN model decoder is used for inverse operation of the encoder and decodes the input image to form output characteristic images of different depths;
4. and expanding the features of the output feature images with different depths to the same size as the input image by using an up-sampling method, then respectively inputting the feature images into different convolutional layers for channel compression, then classifying the feature images by using a Softmax function, fusing the obtained multiple probability maps into a double-channel probability map, and then performing threshold segmentation to obtain a final blood vessel segmentation map.
Preferably, the multi-path FCN model is composed of two encoder paths and two decoder paths, each of the encoder paths and the decoder paths includes a convolutional layer, a residual block, a batch normalization layer, and a ReLU activation layer, the encoder paths generate a set of encoder feature maps using the residual block and the convolutional layer, normalize each layer of feature maps using the batch normalization layer, and then activate them using a ReLU activation function, and the decoder paths decode images generated by the encoder using the deconvolution layer and the residual block, normalize each layer of feature maps using the batch normalization layer, and then activate them using the ReLU activation function.
Compared with the prior art, the invention has the following beneficial effects:
1) various deep neural networks are combined to solve different problems in fundus image segmentation. A plurality of novel depth models are provided, wherein the novel depth models are based on a depth convolution network and are combined with models such as a noise reduction convolution automatic encoder, a convolution network and an attention mechanism. The new model can realize the processing functions of denoising, enhancing, feature mapping, segmenting and the like of the original retina image through the proposed data enhancement method.
2) The medical image segmentation scheme has practical value and reference value. And constructing a plurality of segmentation methods aiming at different medical images. The method can be used for assisting diagnosis of the patient by combining the overall characteristics of the whole image on the basis of obtaining the relevant tissues by segmentation, and has important practical value. For medical staff, the achievement of each stage of the segmentation method has great reference value. In the aspect of later scientific research, various deep learning algorithm data in the invention also have great reference values.
3) A novel retina image preprocessing model is provided, and the extraction and recognition capability of the model on image features is effectively improved; the comprehension capability of the model to local context information is improved by stacking and expanding convolution, and the relevance of characteristic information in a receptive field is kept; the scale reconstruction layer is used for replacing up-sampling modes such as deconvolution and bilinear interpolation, extra parameters and calculated amount do not need to be brought in, and the learning capability is reserved for restoring the detail information.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
In addition, if a detailed description of the known art is not necessary to show the features of the present invention, it is omitted.
Example 1
The segmentation method is based on a deep convolutional neural network, and relates to a data preprocessing model of a fundus retina image, an end-to-end multi-label deep convolutional network model and application of the model to segmentation of fundus retina blood vessels, optic discs and optic cups.
The segmentation method adopts a deep convolutional neural network to map the characteristics of vascular tissues, optic disc cup tissues and lesion tissues in the medical images, and uses the convolutional network to segment the images. In addition, in order to increase the segmentation accuracy, the image is subjected to enhancement processing by using a new data preprocessing method of the fundus retina image; the method comprises the steps of solving the problem of fine blood vessel segmentation by using an end-to-end deep convolutional network, obtaining deep significant features of a lesion area and visualizing the deep significant features; a series of problems caused by large pixels of various medical images are solved by using a method of combining various deep neural networks. The segmentation method specifically comprises the following steps:
1) selecting a data set
Retinal vessel segmentation dataset: digital Recovery Images for Vessel Extraction (DRIVE), Structured Analysis of the recovery (STARE), CHASE _ DB1 (CHASE). The retinal images of the three data sets are obtained under different acquisition devices, lighting conditions, and the like, which can verify the robustness and generalization of the framework proposed herein. The DRIVE data set contained 40 images with a resolution of 565 x 584 px; the STARE data set has 20 images of 700 × 605px, of which half have lesions of different conditions; the CHASE data set consisted of 28 images with a resolution of 999 × 960 px;
optic disc cup segmentation dataset: the Drishti-GS1 dataset contained a total of 101 fundus images, 31 normal and 70 diseased; wherein the training set comprises 50 images, and the test set comprises 51 images; all image optic discs and optic cup areas are marked by 4 ophthalmologists with different clinical experience;
2) novel data preprocessing method for fundus retina image
After appropriate pre-processing of the image, the model can learn its implied data distribution more efficiently. The segmentation method of the invention provides a four-step preprocessing strategy to process images in sequence. Figure 1 shows images under different treatments. The first pre-processing strategy is to convert a color image into a grayscale image. The color original retina image shown in fig. 1(a) is decomposed into an R channel shown in fig. 1(B), a G channel shown in fig. 1(c), and a B channel shown in fig. 1 (d). It can be seen that the G channel has higher vascular and background discrimination, while the R and B channels are more noisy and have lower contrast. Although more noise exists in the R channel and the B channel, corresponding blood vessel characteristic information exists in the R channel and the B channel. Therefore, in order to ensure the lowest loss of characteristic information and make the distinction degree between the blood vessel backgrounds higher, on the basis of a G channel, an R channel and a B channel are proportionally fused into the G channel to be converted into a gray image. The conversion formula is as follows:
Igray=0.299×r+0.587×g+0.114×b (1)
(1) in the formula: r, G, B represent the values of the R channel, G channel and B channel, respectively; according to the expression (1), in the converted grayscale image shown in fig. 1(e), the R channel, the G channel, and the B channel of the original image account for 29.9%, 58.7%, and 11.4%, respectively, which not only retains part of the feature information of the R channel and the B channel, but also maximally utilizes the information of the G channel.
The second preprocessing strategy is data normalization. Let X be { X ═ X1,x2,...,xnIs the image dataset; the data set X is first dimension normalized using a Z-score normalization method. The Z-score normalization formula is as follows:
Xnorm=(X-μ)/σ (2)
(2) where μ is the mean of x and σ is the standard deviation of x. At this time, positive and negative values exist in x, and the mean value of x is 0 and the standard deviation is 1. Next, each piece of image data in the data set X is subjected to minimum-maximum normalization, and the value of X is remapped to the range of 0-255. The min-max normalization formula is as follows:
(3) in the formula, xi∈Xnorm,i∈[1,2,...,n]. The effect of data normalization is shown in fig. 1 (f). Fig. 1(f) reduces the interference of the fundus retina image caused by the uneven light through normalization, so that the picture can resist the attack of geometric transformation, and can find out the invariants in the picture, thereby being beneficial to the segmentation of the retinal blood vessel details.
A third pre-processing strategy is to use CLAHE to enhance the vessel-to-background contrast of the entire data set. The CLAHE image is shown in fig. 1(g), which is based on fig. 1(f) and uses CLAHE to enhance the contrast of retinal blood vessels with the background, highlighting fine blood vessels. The last pre-processing strategy is a non-linear operation using Gamma non-linearity to handle luminance or tristimulus values in the image shown in fig. 1(g), to expand the difference between different classes of pixel values by balancing the color range due to uneven illumination. In the segmentation method of the present invention, the gamma value is set to 1.2, and the processed image is visualized as shown in fig. 1 (h). The present invention uses OpenCV to implement CLAHE and gamma non-linearization strategies.
Aiming at different problems of different data set acquisition equipment, shooting angles, illumination and the like, the four steps of gray processing, normalization, contrast-limited adaptive histogram equalization and gamma nonlinearity are performed on the retinal image, so that the discrimination between the blood vessel and the background is enhanced, and the influence caused by other interference factors in the acquisition process is reduced. The effect of each step of preprocessing is verified through a comparison experiment, so that the four-step preprocessing strategy has positive and effective influence, and the extraction and identification capability of the model on the image characteristics can be improved.
3) New division model for fundus optic disk and optic cup
The segmentation of discs and cups is a very time-consuming task that is currently performed only by professionals. Using a computer to segment discs and cups automatically is attractive because computers segment more objectively and faster than humans. The segmentation method of the invention realizes the simultaneous segmentation of the optic disk and the optic cup in the eyeground image by using an end-to-end multi-label deep learning model, and the network structure of the multi-label deep learning model is shown in figure 2 and mainly comprises two network structures of a generator (C) and a discriminator (D). C is a full convolution segmentation network, mainly composed of encoder and decoder; c training data samples are
x
iAs fundus image, y
iIs a label diagram of the optic disc and the optic cup.
H is the picture length, W is the picture width, C
1Is x
iNumber of channels of (C)
2Is y
iThe number of channels of (2). The encoder uses fundus image x
iThe segmentation method of the invention reduces down-sampling factors to carry out dense feature sampling and introduces the sense of expansion convolution and expansion in order to improve the capability of feature extractionThe wilderness, concatenated different expansion rate extended convolutions avoids the 'gridding' problem in extended convolutions. Final output characteristic diagram of coder
The decoder module uses the feature map u as input and converts the feature map u into y by means of scale reconstruction instead of upsampling
iSame size, and outputting feature maps with same size after applying scale reconstruction
The final C outputs the segmentation result C (x)
i). D is used for judging 'true/false' (1/0), and the model is two types of models which are respectively y according to the output result D (x, x)
iAnd C (x)
i) To guide C to train, respectively. During training, the input sample images of D are divided into two types:
and
the output is therefore also divided into two categories: when (x, y) is input, the output D (x, y) is 1; when (x, c (x)) is input, D (x, c (x)) is output as 0. The final objective function of the model is as follows:
(4) in the formula, thetaCParameters representing an encoder; thetaDA parameter representing a decoder; ex,y[]Representing the expected value.
Cascade expansion convolution: the larger receptive field can promote the understanding of the model to the global context of the fundus image, and the difference between the optic disc cup region and the focus region can be distinguished more accurately. Typically, downsampling using either Pooling layers (Pooling) or convolutional layers with paces in a standard fabric FCN is used to expand the model's field of reception. However, excessive downsampling operation may cause loss of feature information, is not beneficial to model learning, and may increase complexity of upsampling.
To alleviate the above problem, the model is made to reduce the down-sampling operation while keeping the model field of view unchanged or larger. The segmentation method of the present invention employs convolutional layers of different expansion rates for concatenation, as shown in fig. 3 (a). Assuming that N expansion convolution layers are cascaded each time, the size of the convolution kernel is k × k, and the expansion rates are r1,…,ri,…,rn]After N convolutional layers, all the characteristic information in the receptive field area can be completely covered.
Defining the maximum value of the expansion rate of the ith layer in the expansion convolution kernel as:
and (R)1-1)+(R2-1)<k is when Ri>0,Ri≠Ri+1. Therefore, the expansion rate ri≤Ri. However, the expansion ratio r within each groupiCannot be factored (e.g., 2,4, 8, etc.). r isiRepresenting the expansion rate of the ith layer in the expanded convolution kernel, and k is the size of the convolution kernel.
In FIG. 3(b), set (r)1,r2,r3) Size (k) of three convolution kernels ═ 1,2,5r1,kr2,kr3) (3,5,11), after a series of convolution operations, L4Characteristic value p in layer4Receptor field R ofb17, for p4The contributing characteristic is from RbAll features within a region, and no void portion is present in fig. 3 (a). The effective ratio of the characteristic values is 100%, and the relevance among the characteristic values is ensured. Thus, r is set by definition reasonable1,r2,r3The final convolution kernel can completely acquire the relevance with all characteristic values in the receptor field area, thereby effectively relieving the problem of 'gridding'. F in FIG. 3aRepresents the expansion ratio (r)1,r2,r3) The contribution area of the eigenvalue of (2,2,4), FbRepresents the expansion ratio (r)1,r2,r3) The contribution area of the eigenvalue of (1,2, 5).
Scale reconstruction layer: in order to improve the performance of the model and reduce the parameters and the calculation amount of the model, the segmentation method of the invention performs upsampling by using a scale reconstruction layer, and the scale reconstruction layer performs periodic rearrangement on information in the feature map and compresses the channel number of the feature map so as to expand the height and the width to achieve the effect of upsampling, as shown in fig. 4. Assuming that the dimension of the input feature map is W × H × (d × d × C), d is a down-sampling factor, and the dimension of the feature map obtained by the scale reconstruction layer is (W × d) × (H × d) × C. The scale reconstruction layer has the following advantages as an upsampling mode: 1. compared with a deconvolution mode, the scale reconstruction layer does not increase extra parameters and calculation overhead, and the speed of the model can be improved. And the scale reconstruction layer is learnable, so that the detailed information lost in the loss down-sampling can be captured and recovered. 2. Compared with the bilinear interpolation mode, the bilinear interpolation up-sampling mode does not need to bring in extra parameters and calculated amount, but can not carry out learning, and can not accurately restore lost characteristic information. The scale-restructuring layer thus combines the advantages of both.
Accurate segmentation of the optic disc and cup is of great practical significance to assist the physician in screening for glaucoma. The multi-label deep convolution neural network model provided by the segmentation method of the invention segments the optic disc and the optic cup in the fundus image. The concept of GAN is used, GAN is combined into a model, a generator is divided into an encoder and a decoder, a down-sampling factor is reduced in the encoder, and loss of some characteristic information is avoided. And the expansion convolution is used for replacing the traditional convolution, the receptive field of the model is expanded, and the problem of gridding is avoided while the receptive field is increased by cascading the expansion convolutions with different expansion rates. In a decoder, a scale reconstruction layer is provided to replace an up-sampling mode such as deconvolution, parameters are reduced, the learning capability is kept, and the performance of the model is improved. Finally, the network model was verified on the DRISHTI-GS1 data set, and the segmentation result is shown in FIG. 6, wherein FIG. 6(a) is the fundus retina image before segmentation; FIG. 6(b) shows the optic disc and optic cup area manually divided by the expert, the small circle representing the optic cup and the large circle representing the test disc; fig. 6(c) is the result of our proposed multi-label deep learning model segmentation. From a comparison of fig. 6(b) and 6(c), it can be seen that the results of the model segmentation are very close to the areas of the optic disk cup manually segmented by the expert, indicating that the proposed method has very good segmentation performance.
4) Fundus retinal vessel segmentation
The fundus is the only part of the human body which can directly observe the blood vessel, and the change of the fundus, such as the width, the angle, the branch shape and the like of the blood vessel, provides a basis for early diagnosis of diseases. Fundus blood vessel analysis is the main mode for diagnosing fundus diseases at present, and fundus image blood vessel segmentation is the necessary step for quantitative analysis of diseases. Inspired by FCN, the invention designs a new structure for fundus retina blood vessel segmentation, named as a baseline full convolution neural network, and the structure is shown in figure 5. The network comprises an encoding path and a decoding path which are symmetrical in structure, wherein the encoding path and the decoding path are composed of a convolutional layer, a residual error module, a batch normalization layer, a ReLU activation layer and the like. The encoder uses rich convolutional layers to encode the low-dimensional input image so as to extract context semantic information, reduce the influence of background noise and acquire the blood vessel characteristics in the retina image. The decoder is designed to perform the inverse of the encoding and recover the spatial information by upsampling and fusing the low dimensional features so that the retinal vessels can be accurately located. The network mainly comprises three parts: the method comprises the steps that firstly, a multi-scale input layer is constructed and used for constructing image pyramid input to achieve multi-level reuse of image feature information; second, multipath FCNs, which are used as backbone structures to learn rich hierarchical representation; and finally, a multi-output fusion module combines the low-level features and the high-level features to fully utilize the feature information of different depths and achieve better effect through feature fusion. The following is a detailed description of the three sections.
Multipath FCN: the network consists of two encoder paths (c), and two decoder paths (c). Each encoder path generates a set of encoder feature maps using a residual module and a convolutional layer, and normalizes each layer of feature maps using a batch normalization layer, which is then activated using a ReLU activation function. And the paths decode the characteristics of the paths (the paths) by using an deconvolution layer and a residual module, normalize the characteristic graph of each layer by using a batch normalization layer, and then activate by using a ReLU activation function. Skipping the connection to fuse the characteristic diagram of the encoder and the characteristic diagram of the decoder together so as to reduce the loss of the characteristic information and fuse the characteristic information;
multi-scale input layer: the multi-scale input is integrated into the encoder path (r) to ensure feature transfer of the original image and effectively improve the segmentation quality. In a multi-scale input layer, the segmentation method uses an average pooling layer to perform down-sampling on an image, then uses a common convolution layer to expand a channel of the down-sampled image, and then fuses the expanded down-sampled image with outputs of different layers in a path (I) of an encoder to be used as the input of a next layer;
multi-output fusion layer: in the decoder path IV, extracting output characteristics of residual modules with different depths, expanding the characteristics by using an up-sampling method to be the same as the size of an input image, then respectively inputting the characteristics into different convolutional layers for channel compression, classifying the characteristics by using a Softmax function, and fusing a plurality of obtained probability maps into a double-channel probability map, wherein the channel with the 0 th dimension is the probability of being segmented into background, and the channel with the 1 st dimension is the probability of being segmented into retinal blood vessels. And finally, performing threshold segmentation on the 0 th dimension channel feature map to obtain a final blood vessel segmentation map.
The three modules are improved to improve the characteristic identification and multiplexing capability of the network, and a structure superior to the FCN is obtained by constructing a multi-scale input layer, a multipath FCN and a multi-output fusion module. The model has a deeper network structure, the capacity of the model is increased, the low-level features and the high-level features are fused through skip connection at different distances, the difficulty of training the model is greatly reduced, parameters of a convolution kernel can be trained in a self-adaptive mode according to the shape features of blood vessels in training, the distinguishing capability of the network on the blood vessels and the background can be improved, and the segmentation accuracy of the model is improved. Finally, the present invention trains and tests the framework presented herein on three data sets, DRIVE, start and CHASE. The experimental results show that the proposed network structure has achieved competitive experimental results on three data sets, DRIVE, STARE and CHASE, and the improved model can obtain better results than FCN, wherein the results of segmentation are shown in fig. 7 by taking the DRIVE data set as an example. Fig. 7(a) is a fundus retinal image before division; FIG. 7(b) shows retinal blood vessels manually segmented by an expert; fig. 7(c) is the result of our proposed baseline full convolution neural network model segmentation. From a comparison of fig. 7(b) and 7(c), we can see that the results of model segmentation are very close to those of retinal vessels manually segmented by experts, and are very complete for the segmentation of small vessels, which indicates that the method proposed by us has very good segmentation performance.
Analysis of processing results
All experiments of the invention were run on the same hardware platform and software environment, as detailed in table 1. The experiment is based on the open source visual processing library OpenCV running code on the CPU. The experiment realizes a data preprocessing method based on a Pythrch frame, a video disc and cup segmentation network and a retina blood vessel segmentation network, and network training is carried out on a GPU. The method provided by the invention has low requirements on hardware and software of an experimental environment, and can be trained and used without or only by simply upgrading the existing equipment in the actual application process.
TABLE 1 Experimental hardware platform and software Environment
The setting of the hyper-parameters is important for the experimental results of the training model and the recurrent model. The experimental hyper-parameter settings of the invention are detailed in table 2. And assigning network weights and deviations by using a default initialization method of a Pythrch convolutional layer, and then performing end-to-end back propagation training on the network by using an Adam optimizer.
Table 2 experimental parameter settings
The present invention uses several commonly used evaluation indices to quantitatively evaluate the performance of the proposed method, including Accuracy (Accuracy), Sensitivity (Sensitivity), Specificity (Specificity) and F1 score (F1 Scores). Through comprehensive evaluation, each index of the retinal vessel segmentation method obtains very competitive experimental results on the DRIVE, STARE and CHASE data sets, wherein the segmentation accuracy rate reaches more than 97%, and the F1 score exceeds 80%. And the optic disc and optic cup segmentation method achieves good segmentation on a Drishti-GS1 data set, the F1 score reaches 97% on the optic disc, and the score reaches 92% on the optic cup.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.