CN117078697A

CN117078697A - Fundus disease seed detection method based on cascade model fusion

Info

Publication number: CN117078697A
Application number: CN202311053124.1A
Authority: CN
Inventors: 庹恩涛; 万程; 沈烨宇
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2023-11-17
Anticipated expiration: 2043-08-21
Also published as: CN117078697B

Abstract

The invention provides a fundus disease detection method based on cascade model fusion, which comprises the following steps: collecting an original fundus image; preprocessing an image and inputting the image into a segmentation network; extracting the fundus visual cup test disc, blood vessels and macular area part images by adopting a segmentation network; inputting the preprocessed image and the segmentation result into a healthy/unhealthy fundus image classifier, and judging whether the image is healthy fundus; the preprocessed image and the segmentation result are input into a fundus disease seed classifier to judge fundus diseases. The invention performs accurate part segmentation based on the fundus image automatic segmentation network, combines the fundus image automatic segmentation network with the original image for diagnosis, improves the algorithm interpretability, and the segmentation result is helpful for clinical diagnosis of doctors and can rapidly position the focus. According to the invention, the single-model processing single disease type combining mode precision can be achieved only by using a single-segmentation model and double-classification model cascading mode, the problem that the number of models is continuously increased along with the continuous increase of the number of disease types is avoided, the diagnosis efficiency is improved, and the operation cost is reduced.

Description

Fundus disease seed detection method based on cascade model fusion

Technical Field

The invention belongs to the technical field of image detection, and particularly relates to a fundus disease detection method based on cascade model fusion.

Background

In recent years, the incidence rate of various diseases of ocular fundus is high, and the damage is serious. Fundus colour photography is the simplest and most effective method for finding fundus diseases, and a doctor generally collects fundus images by using a professional fundus camera and then performs manual diagnosis. Fundus examination requires a fundus professional doctor and corresponding examination equipment, so that the doctor is difficult to cultivate, the equipment investment is high, and a great number of people in fundus disease examination demands exist in primary hospitals and physical examination centers, so that the problem is very serious. In addition, even if the number of examination instruments and doctors is sufficient, the diagnosis of fundus diseases is very dependent on the personal status and experience of the doctor. Therefore, with the continuous development of deep learning, artificial intelligence is gradually introduced to assist medical diagnosis, so that diagnosis suggestions of fundus oculi multiple diseases can be provided for doctors, and diagnosis efficiency and accuracy are improved.

However, the existing artificial intelligence disease diagnosis method has the following defects due to the great difference between fundus color images and natural images: (1) In the current fundus disease diagnosis method based on the deep learning algorithm, attention to important parts is lacked, interpretability and pertinence are lacked, and practical help is difficult to be provided for clinical diagnosis; (2) At present, a single model is basically adopted in the multi-disease diagnosis method, but the single model cannot well cope with the extremely unbalanced fundus disease label distribution in the actual situation, so that the detection precision is low; (3) The mode that single model that commonly uses in the prior art handles single disease and makes up can improve detection accuracy, but with high costs, process time is long, and the flow is loaded down with trivial details, is unfavorable for using widely.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a fundus disease detection method based on cascade model fusion, which can automatically extract fundus important part information such as blood vessels, visual cup test discs and the like, can provide the segmented parts for operators such as doctors to improve diagnosis efficiency, and can also provide features for further disease intelligent diagnosis.

The present invention achieves the above technical object by the following means.

A fundus disease detection method based on cascade model fusion comprises the following steps:

step 1: collecting an original fundus image and inputting the original fundus image into image processing equipment;

step 2: the image processing device preprocesses the original fundus image through a preprocessing algorithm, and inputs the preprocessed image into a segmentation network;

step 3: extracting the fundus visual cup test disc, blood vessel and macular region part images by adopting a segmentation network, and inputting segmentation results into data receiving equipment; the segmentation network comprises an encoder module, a hole convolution module, an attention module and a decoder module;

step 4: inputting the preprocessed image and the segmentation result received by the data receiving device into a healthy/unhealthy fundus image classifier together, judging whether the image is a healthy fundus, if so, entering a step 6, otherwise, entering a step 5;

step 5: the preprocessed images and the segmentation result received by the data receiving equipment are input into a fundus disease type classifier together, and fundus disease types corresponding to the fundus images are judged;

step 6: and storing the specific fundus disease diagnosis conclusion corresponding to the fundus image into a data server, and returning the specific fundus disease diagnosis conclusion to an interface calling party for viewing.

Further, the pretreatment method in the step 2 is as follows:

step 2.1: converting the original fundus image into a gray scale image, wherein the gray scale value of each pixel point at the moment is 0-255;

step 2.2: creating a mask image with the same size as the gray image, setting a threshold value, marking the threshold value as X, setting a pixel point with a gray value lower than the threshold value in the gray image as false, and setting a pixel point with a gray value higher than the threshold value as true;

step 2.3: respectively carrying out OR operation on the mask graph according to the rows (columns), if the result is false, indicating that the gray values of the pixel points of the whole row (column) are lower than the threshold value, and cutting off the row (column); if the result is true, indicating that the gray values of the pixel points in the whole row (column) are higher than or equal to the threshold value, and reserving the row (column);

step 2.4: merging the original fundus image into the mask image after the operation to obtain a region of interest with a black region removed;

step 2.5: normalizing the image colors, performing Gaussian blur processing on the combined image, reversely superposing the combined image with the original fundus image, and moving the pixel color mean value to 128.

Further, in the step 3, the specific method for extracting the fundus cup test disc, the blood vessel and the macular region part image by the segmentation network is as follows:

step 3.1: the encoder module adopts a physical sign extraction part of an Efficient Net-V2 convolution network to perform feature extraction on the preprocessed image to obtain core coding features and multi-stage features, and the core coding features and the multi-stage features are respectively input into the cavity convolution module and the attention module;

step 3.2: the cavity convolution module is built in the core part of the segmentation network, adopts three cavity convolution layers with cavity coefficients to extract and combine the characteristics of the input characteristic graphs, finally obtains multi-receptive field characteristics, and inputs the multi-receptive field characteristics to the decoder module;

the attention module comprises a spatial attention module and a channel attention module, wherein the spatial attention module weights through different pixel points of the same feature map so as to improve the feature response of a focus part in the feature map, and the channel attention module weights each feature map by gathering all the feature maps so as to selectively emphasize certain feature maps; finally, fusing the output characteristics of the spatial attention module and the channel attention module to obtain a final weight redistribution characteristic, and inputting the final weight redistribution characteristic to a decoder module;

step 3.3: the decoder module generates a corresponding size segmentation map through multi-stage up-sampling of the extracted features, wherein the segmentation map is a binarization result, 1 represents a target area, and 0 represents a background area; the decoder module outputs a three-channel segmentation chart finally, and the segmentation chart represents segmentation results of three eye fundus key positions, namely a visual cup test disc, blood vessels and a macular region, and the segmentation results are further input into the data receiving equipment.

Further, the encoder module has compression and activation on channels at each node to improve the response of key channel characteristics, the characteristic extraction of the encoder module is split into 5 stages, each stage adopts 3×3 convolution on the output characteristic diagram, the step length is 2, and the same size mode as the original diagram is adopted for filling, so that the size of the final output characteristic diagram of each stage is half of that of the previous stage; the module shallow layer of the cavity convolution module obtains adjacent information of the feature map so as to improve the identification capability of small targets, and the module deep layer obtains receptive fields similar to the feature map so as to improve the identification capability of large targets.

Further, in the step 4, the healthy/unhealthy fundus image classifier adopts an afflicientnetv 2-M network structure with an attention module added, a core module MBConv in the afflicientnet has a channel attention mechanism and lacks a spatial attention mechanism, a reference model integrated attention module is designed at the end of the model, a pooling layer and a full-connection layer of the model are removed, the attention module is replaced by an attention net, the extracted features of the model are directly input into the attention module, and a mask map generated by the attention module is multiplied by an original feature map, so that spatial attention is obtained, and expression of spatially useless features is inhibited.

Further, the classification algorithm process of the healthy/unhealthy fundus image classifier is as follows:

step 4.1: the preprocessed image and the segmentation map of the three fundus important parts are input into a transducer model together for feature extraction;

step 4.2: the space weight is redistributed to the characteristics extracted by the transducer by adopting a space attention module as a characteristic diagram so as to improve the characteristic expression of the important attention part;

step 4.3: carrying out pooling treatment on the characteristics extracted by the Transformer in a generalized mean pooling mode, wherein an index factor p is set as Y, the range of Y is dynamically adjusted according to judgment precision, the value range is 2-4, and the initialization is 3;

step 4.4: the pooled results are output through the full-connection layer, and the full-connection layer of the healthy/unhealthy fundus image classifier outputs 2 nodes which respectively represent the healthy or unhealthy probabilities of fundus images.

Further, the step 5, the fundus disease seed classifier adopts EfficientNetV2-S added with an attention module, and the design of the attention module and the method for adding the attention module are the same as those of the healthy/unhealthy fundus image classifier;

the fundus disease type classifier outputs six classification results for judging which specific fundus disease the fundus image belongs to, namely the fully connected layer of the fundus disease type classifier outputs 6 nodes which respectively represent the probability of the fundus image suffering from high myopia, maculopathy, glaucoma, venous obstruction, diabetic retinopathy or other lesions.

The invention has the following beneficial effects:

(1) According to the fundus disease detection method provided by the invention, the accurate part segmentation result is carried out based on the fundus image automatic segmentation network, the fundus image automatic segmentation network and the original image are diagnosed together, so that the algorithm interpretability is improved, the segmentation result is beneficial to clinical diagnosis of doctors, and the focus is positioned rapidly.

(2) According to the invention, the actual distribution of the fundus data set in reality is obtained through long-time data acquisition and analysis of a plurality of hospitals, wherein the healthy fundus accounts for nearly half of the proportion of the sample, and in order to fully utilize the characteristic that the neural network has better performance on the balanced data set, a double-classification model cascading mode is adopted, so that the training and sample distribution balance prediction of the healthy/unhealthy fundus image classifier and the fundus disease classifier are improved, and the accuracy of classifying specific disease types is effectively improved.

(2) The invention only uses a single segmentation model and double classification model cascade mode, can achieve the precision of single model processing single disease and combination mode, avoids the problem of continuously increasing model quantity along with the continuously increasing disease quantity, effectively improves diagnosis efficiency and reduces operation cost.

Drawings

FIG. 1 is a flow chart of a fundus disease detection method based on cascading model fusion;

FIG. 2 is a diagram of a split network framework;

FIG. 3 is an algorithmic schematic of a split network;

FIG. 4 is a block diagram of a hole convolution module;

FIG. 5 is a block diagram of an attention module;

FIG. 6 is a block diagram of a spatial attention module;

FIG. 7 is a block diagram of a channel attention module;

FIG. 8 is a flowchart of a preprocessing algorithm;

FIG. 9 is a flowchart of a classification algorithm;

FIG. 10 is a schematic diagram of a reference model in combination with an attention module;

fig. 11 is a schematic diagram of an attritionnet structure.

Detailed Description

The invention will be further described with reference to the drawings and the specific embodiments, but the scope of the invention is not limited thereto.

A fundus disease detection method based on cascade model fusion is shown in figure 1, and comprises the following steps:

step 1: acquiring an original fundus image by using fundus photographing equipment, and inputting the original fundus image into image processing equipment, wherein the image processing equipment is generally a desktop computer;

referring to fig. 8, a specific method of pretreatment is as follows:

step 2.2: creating a mask image with the same size as the gray image, setting a threshold value, wherein the threshold value is marked as X, the X in the embodiment takes 7, and the pixel point with the gray value lower than the threshold value in the gray image is set as false, and the pixel point with the gray value higher than the threshold value is set as true;

step 2.3: performing OR operation on the mask image according to the rows (columns) respectively, if the result is false, indicating that the gray values of the pixel points of the whole row (column) are lower than the threshold value, and cutting off the row (column); in order to fully preserve the details of the boundary, if the result is true, the gray values of the pixel points of the whole row (column) are higher than or equal to a threshold value, and the row (column) is preserved;

Step 3: referring to fig. 2 and 3, a segmentation network is used for extracting fundus cup test disc, blood vessel and macular region part images, and comprises an encoder module, a cavity convolution module, an attention module and a decoder module;

the encoder module has compression and activation on channels at each node so as to improve the response of key channel characteristics, and effectively solves the problem of complex characteristics of the color image of the eye bottom; the characteristic extraction of the encoder module is split into 5 stages, each stage adopts 3×3 convolution on the output characteristic diagram, the step length is 2, and the characteristic diagram is filled in the same size mode as the original diagram, so that the size of the final output characteristic diagram of each stage is half of that of the previous stage;

step 3.2: in order to improve the recognition precision of a segmentation network to different targets and enrich the network receptive field and obtain different receptive field features, a cavity convolution module shown in fig. 4 is built in the core part of the segmentation network, and the cavity convolution module extracts and combines features of input feature images by adopting cavity convolution layers with three cavity coefficients, finally obtains multi-receptive field features and inputs the multi-receptive field features to a decoder module; the module shallow layer of the cavity convolution module obtains adjacent information of the feature map, so that the recognition capability of a small target can be improved, the module deep layer obtains a receptive field similar to the feature map, and the recognition capability of a large target can be improved;

the attention module is used for enhancing key feature expression to be segmented, and carrying out weighting processing on a space and a channel aiming at an input feature map; 5, 6 and 7, the attention module comprises a spatial attention module and a channel attention module, wherein the spatial attention module weights different pixels of the same feature map so as to improve the feature response of the effective position (focus part) in the feature map, and the channel attention module selectively emphasizes certain feature maps by gathering all the feature maps and weighting each feature map respectively; finally, fusing the output characteristics of the spatial attention module and the channel attention module to obtain better characteristic representation, namely obtaining a final weight redistribution characteristic, and inputting the final weight redistribution characteristic to a decoder module;

step 3.3: the decoder module generates a corresponding size segmentation map through multi-stage up-sampling of the extracted features, wherein the segmentation map is a binarization result, 1 represents a target area, and 0 represents a background area; in this embodiment, the decoder module outputs a three-channel segmentation map, which represents segmentation results of three fundus important parts, namely, a visual cup test disc, a blood vessel and a macular region, respectively, and the segmentation results are further input into the data receiving device.

Step 4: the method comprises the steps that an image preprocessed by an image processing device and a segmentation result received by a data receiving device are input into a healthy/unhealthy fundus image classifier together, for selection of the classifier, the healthy/unhealthy fundus image classifier adopts an EfficientNetV2-M network structure with an attention module, a core module MBConv in the EfficientNet is provided with a channel attention mechanism and lacks a spatial attention mechanism, in order to further improve classification performance of the network, a basic model integrated attention module is designed at the tail end of the model, a pooling layer and a full connection layer of the model are removed, the characteristics extracted by the model are replaced by Attennet, the characteristics extracted by the model are directly input into the attention module, and a mask image generated by the attention module is multiplied with an original characteristic image, so that spatial attention is obtained, and expression of unnecessary characteristics in space is inhibited; the attention module can be applied to any model without redesigning;

the reference Model and attention module are combined as shown in FIG. 10, where Baseline Model is any Model that removes pooling and fully connected layers, F _E Feature map generated for the model, F _A For attention gates generated by the attention net, geM Pooling can be regarded as a more generalized Pooling operation, an exponential factor p is set for the Pooling operation, and when p=1, the Pooling operation is an average Pooling operation; p= infinity, the operation is the maximum pooling operation; the effect of stronger comprehensiveness can be obtained by taking the value between p, and experiments prove that the effect of p=3 in the data set is better; wherein, the AttenionNet structure can be expressed as C256-C128-C64-C1-C256, ck represents that k convolution cores exist in the convolution layer, the size of each convolution core is 1 multiplied by 1, the step sizes of five convolution layers are all 1, and after each convolution layer, reLU is used as an activation function, the number of final output channels is 1, and the attention gate F with the same size as the original feature map _A The process is shown in fig. 11;

outputting a classification result after the processing of the healthy/unhealthy fundus image classifier, wherein the classification result indicates whether the fundus image is judged to be a healthy fundus, if so, the step 6 is entered, and if not, the step 5 is entered;

wherein, referring to fig. 9, the classification algorithm process of the healthy/unhealthy fundus image classifier is as follows:

step 4.4: the pooled result is output through a full connection layer, and the full connection layer of the healthy/unhealthy fundus image classifier outputs 2 nodes which respectively represent the healthy or unhealthy probability of fundus images;

in the process, the global self-attention mechanism of the transducer is good in capturing global features, can effectively cope with complex relations among the global features in the medical image, and captures relations among different features after the sequence.

Step 5: the image preprocessed by the image processing equipment and the segmentation result received by the data receiving equipment are input into a fundus disease type classifier together, wherein the fundus disease type classifier adopts EfficientNetV2-S with an attention module added, and the design of the attention module and the method for adding the attention module are the same as those of a healthy/unhealthy fundus image classifier. The fundus disease type classifier outputs six classification results for judging which specific fundus disease the fundus image belongs to; the classification algorithm of the fundus disease seed classifier is similar to that of the healthy/unhealthy fundus image classifier, except that the full connection layer of the fundus disease seed classifier outputs 6 nodes, which respectively represent the probability of the fundus image having high myopia, maculopathy, glaucoma, venous obstruction, diabetic retinopathy or other lesions, so that specific algorithm processes are not repeated here.

The examples are preferred embodiments of the present invention, but the present invention is not limited to the above-described embodiments, and any obvious modifications, substitutions or variations that can be made by one skilled in the art without departing from the spirit of the present invention are within the scope of the present invention.

Claims

1. The fundus disease detection method based on cascade model fusion is characterized by comprising the following steps of:

2. The fundus disease detection method based on cascading model fusion according to claim 1, wherein the preprocessing method in step 2 is as follows:

step 2.1: converting the original fundus image into a gray scale image, wherein the gray scale value of each pixel point is 0-255 at the moment;

3. The method for detecting fundus disease based on cascade model fusion according to claim 1, wherein in the step 3, the specific method for extracting fundus cup test disc, blood vessel and macular region part images by the segmentation network is as follows:

4. The method for detecting fundus disease based on cascade model fusion according to claim 3, wherein the encoder module has compression and activation on channels at each node to improve response of key channel characteristics, the characteristic extraction of the encoder module is split into 5 stages, each stage adopts 3×3 convolution on the output characteristic diagram, the step length is 2, and filling is performed in the same size manner as the original diagram, so that the size of the output characteristic diagram of each stage is half of that of the last stage; the module shallow layer of the cavity convolution module obtains adjacent information of the feature map so as to improve the identification capability of small targets, and the module deep layer obtains receptive fields similar to the feature map so as to improve the identification capability of large targets.

5. The method according to claim 1, wherein in the step 4, the healthy/unhealthy fundus image classifier adopts an affliction netv2-M network structure with an attention module, wherein the kernel module MBConv in affliction nett has a channel attention mechanism and lacks a spatial attention mechanism, the reference model integrated attention module is designed at the end of the model, the pooling layer and the full connection layer of the model are removed, the attention module is replaced by the attainmet, the extracted features of the model are directly input into the attention module, and the mask map generated by the attention module is multiplied with the original feature map, so that spatial attention is obtained, and expression of the spatially useless features is inhibited.

6. The fundus disease detection method based on cascading model fusion according to claim 1, wherein the classification algorithm process of the healthy/unhealthy fundus image classifier is as follows:

7. The method for detecting fundus disease based on cascade model fusion according to claim 5, wherein in step 5, the fundus disease classifier adopts the affliction netv2-S with an attention module added, and the design of the attention module and the method for adding the attention module are the same as those of the healthy/unhealthy fundus image classifier;