CN115587979B

CN115587979B - Three-stage attention network-based diabetic retinopathy grading method

Info

Publication number: CN115587979B
Application number: CN202211233514.2A
Authority: CN
Inventors: 蹇木伟; 陈鸿瑜; 王芮; 举雅琨; 杨成东; 武玉增
Original assignee: Shandong Jiude Intelligent Technology Co ltd; Linyi University; Shandong University of Finance and Economics
Current assignee: Shandong Jiude Intelligent Technology Co ltd; Linyi University; Shandong University of Finance and Economics
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-08-15
Anticipated expiration: 2042-10-10
Also published as: CN115587979A

Abstract

The application provides a three-stage attention network-based diabetic retinopathy grading method which comprises two parts, wherein a three-stage network model aiming at a diabetic retinopathy grading task is constructed, and the model can split a complex diabetic retinopathy five-classification task into specific two/three classification tasks, so that the classification of the specific lesion type is carried out by the specific model, and the accuracy of overall grading is improved. Secondly, attention modules are respectively added into the three-stage model, wherein a non-local spatial attention module is designed, the module can enable the model to have the functions of context information and sensitivity to spatial position information of lesions, and can effectively and accurately classify confusing categories, so that the accuracy of each stage model and the classification efficiency of the whole model are improved.

Description

Three-stage attention network-based diabetic retinopathy grading method

Technical Field

The application relates to the field of computer vision and the technical field of medical treatment, in particular to a method for grading diabetic retinopathy based on a three-stage attention network.

Background

Diabetic retinopathy (Diabetic Retinopathy, DR) is a complication of diabetes mellitus, which is the most common condition leading to blindness and vision disability. According to the international clinical DR classification system in 2003, DR can be classified into five stages of non-DR, light, moderate, severe non-proliferative DR (NPDR) and Proliferative DR (PDR) according to fundus lesions, with the extent of the proliferative DR being the most severe. The main treatment in current medicine is anti-Vascular Endothelial Growth Factor (VEGF) injection or laser photocoagulation therapy. However, despite the existence of therapeutic approaches, the probability of blindness is largely dependent on early diagnosis.

Deep learning based disease classification has achieved great success in solving clinical problems. Disease classification prediction disease categories may assist doctors in completing early diagnosis of disease. At present, a model for grading the severity of diabetic retinopathy is proposed, but the grading accuracy is still to be further improved, and for the reason that the model accuracy is not high, the following points are possible: 1) The convolution kernel in convolutional neural networks is a local feature extraction operation that is not of sufficient interest for context information, which has a certain limitation in DR grading, because there may be long-distance similar lesions, such as microaneurysms, in fundus images. 2) For different lesions in DR, their shapes are also different, for example, the microaneurysms are punctiform in fundus images, and bleeding and exudates are likely to be flaked, so that the model is required to extract not only texture features but also shape features appropriately. 3) Most of the existing models perform well in DR severity grading, with/without DR, but perform poorly in multiple classification tasks of light/medium/heavy DR and proliferative DR.

In summary, for DR severity classification tasks, the impetus to improve model performance may be placed on model design, combining contextual information in fundus images, and extracting shape features of more lesions.

Disclosure of Invention

In order to make up for the defects of the prior art, the application designs a three-stage network model based on non-local attention in order to improve the grading efficiency of the diabetic retinopathy severity and better assist doctors in early diagnosis of patients. The application provides a method for grading diabetic retinopathy based on a three-stage attention network.

The application is realized by the following technical scheme: a method for grading diabetic retinopathy based on a three-stage attention network, which comprises the following steps:

s1, constructing a data set: firstly, remapping the original five-class labels of the data set into corresponding two-class labels and three-class labels; the data set is then divided into a training set and a test set, and the original training set is divided into three specific training sets according to three types of labels、/>、/>；

S2, designing a non-local attention module: including non-local channel attention modulesAnd non-officePartial space attention module->The method comprises the steps of carrying out a first treatment on the surface of the The non-local attention module comprises two parts of feature extraction and non-local attention calculation, wherein the two parts are designed in a parallel mode, and the image is +.>The output after entering the module is a non-local channel attention strive +.>Non-local spatial attention patterns；

S3: constructing a multi-label-based multi-classification network model and training; the model divides the network into three stages of Stage1, stage2 and Stage3 according to the redefined three types of labels in the S1, wherein Stage1 and Stage2 are classified tasks, stage3 is a three-classified task, the three stages are independently operated in a training Stage of the network model, and the testing stages are serial; i.e. the training set described in S1、/>、/>Respectively inputting the three models to train to obtain three models after training;

s4: inputting the test set into a three-stage network model in turn for prediction to obtain a final prediction result；

S5: model evaluation: and comprehensively evaluating the network grading effect by using an Accuracy index (Accuracy).

Preferably, step S1 discloses a data set with APTOS 2019Blindness Detection on kagle, and the original label of the data set is thatWherein 0 represents no DR,1 represents mild DR,2 represents moderate DR,3 represents severe DR,4 represents proliferative DR, comprising the steps of:

s1-1, a data set is divided into 8: dividing the ratio of 2 into an original training set and a test set;

s1-2, reconstructing the divided original training set into multiple labels according to five classification labels of the data set, namely，/>2,3}, wherein->Wherein 0 represents no DR and 1 represents DR; />Wherein 0 represents non-proliferation, 1 represents proliferation DR; />Middle 1 represents mild DR,2 represents moderate DR, and 3 represents severe DR;

s1-3, constructing an original training set into three specific training sets according to the new label、/>、/>, wherein />All samples comprising the original training set; />Samples containing only DR; />Only non-value added DR samples are included;

preferably, the step S2 specifically includes the following steps:

s2-1, constructing a non-local channel attention module：

S2-1-1, performing feature extraction by adopting three-layer convolution, adding residual blocks in ResNet network, and performing image extractionExtracting features to obtain feature map->：

(I))))) （1）；

wherein 、/>Representing 1x1 convolution and 3x3 convolution, respectively,/->Representing the ReLU activation function.

S2-1-2, the non-local channel attention part is divided into two sub-parts of global perception and channel attentionThe method comprises the steps of carrying out a first treatment on the surface of the First, for an imageGlobal information modeling is performed, and mathematical representation is as follows:

（2）；

where i is the index of the output location for which the response is to be calculated, j is the index of all possible locations,representing a 1x1 convolution,/->Is a normalization parameter; />Represented by formula (3):

（3）；

wherein and />Respectively representing two different 1x1 convolutions;

s2-1-3, to be obtained with context informationFurther enter the channel attention part to generate channel attention vector +.>The mathematical expression is:

(/>Pool(/>)))/>(/>MaxPool(/>))) （4）；

wherein ,pool and MaxPool represent average pooling and maximum pooling operations, respectively, and then use full-join layers and />To learn the dependency between channels>Representing a Sigmoid activation function; />Representing an add by element operation;

s2-1-4, channel attention vectorAnd (4) feature map>Fusion to obtainThe final non-local channel attention module is obtained through the shortcut operation after reaching the characteristic diagram of the channel attention>Output of +.>The mathematical expression is:

（5）；

wherein Representing multiplication by element, ∈>Representing an add by element operation;

s2-2, constructing a non-local spatial attention module：

S2-2-1 is similar to the feature extraction operation of S2-1-1 in S2-1, except that the 3x3 convolution is replaced by a large-kernel convolution of 13x13, so that the effective receptive field is increased, and the feature map has more shape bias. Image processing apparatusExtracting features to obtain feature map->：

(I))))) （6）

S2-2-2, dividing the non-local spatial attention part into two sub-parts of global modeling and spatial attention generation; global modeling operation with S2-1, equations (2) and (3) to obtain an image with context information。

S2-2-3、Further enter the spatial attention part to generate a spatial attention vector +.>The mathematical expression is:

（7）

wherein Represents a 7x7 convolution, ">Representing Sigmoid activation functions, avgPool and MaxPool represent average pooling operations and maximum pooling operations, respectively.

S2-2-4, and operating with the formula (5) in S2-1 to obtain the final non-local spatial attention moduleOutput of (2)。

As a preferable scheme, in step S3, three-Stage network models Stage1, stage2, stage3 are constructed, which specifically includes the following steps:

s3-1, constructing Stage1; stage1 toAs a basic structure by stacking->，/>，...，/>And adding two layers +.>、/>As a classification header, a Stage1 network model is obtained. Entering Stage1 for image I will yield two classification results, expressed mathematically as follows:

(.../>（8）

))) （9）

wherein Respectively representing model prediction sample category->Probability values of (2);

s3-2, constructing Stage2 and Stage 3; stage2, 3 are allAs basic structure +.>By stacking->，/>，...，And adding two layers +.>、/>As a classification header, stage2, 3 network models are obtained. Entering Stage2 for image I will yield two classification results, expressed mathematically as follows:

(.../>（10）

))) （11）

wherein Respectively representing model prediction sample category->Probability values of (a) are provided.

Entering Stage3 for image I will yield a three-classification result, mathematically expressed as follows:

(.../>（12）

))) （13）

S3-3, constructing model optimization targets of different stages; stage1 and Stage2 are two-class networks, and adopt cross entropy loss functions:

（14）

where N represents the number of samples,a label representing the ith sample, +.>Representing the probability that the i-th sample is positive; in Stage 1->The method comprises the steps of carrying out a first treatment on the surface of the Stage2>；

The training set of Stage3 uses Focal-Loss as its Loss function:

（15）

wherein and />Is two super parameters, here set to 0.25 and 2, < >>Weight representing the i-th sample, +.>For controlling the difficulty classification sample, +.>Representing the probability value of predicting the ith sample, < +.>；

S3-4, training set described in S1、/>、/>Respectively inputting the three models for training to obtain three models after training.

Preferably, the step S4 specifically includes the following steps:

s4-1, inputting the test set sample into the trained Stage1 network to obtain a classification result. If it is=0, then as final prediction result +.>=0; if->When the image is=1, the DR lesion exists in the image, and the image enters Stage2 for further prediction;

s4-2, the image enters a Stage2 network to obtain a classification resultThe method comprises the steps of carrying out a first treatment on the surface of the If->=1, then as final prediction result +.>=4; if->If the value is=0, the DR lesion in the image is not incremental, and the image enters Stage3 for further prediction;

s4-3, the image enters a Stage3 network to obtain a three-classification result. The result of Stage3 network prediction will be the final predicted result +.>=/>。

Preferably, the step S5 specifically includes the following steps:

s5-1, comparing the predicted values of all samples in the test setAnd tag->Calculate the correct number of classification +.>：

（16）；

Where N is the total number of samples in the test set and i represents the ith sample in the test set.

S5-2, calculating accuracy according to a formula to judge model performance:

（17）。

the application adopts the technical proposal, and compared with the prior art, the application has the following beneficial effects: the application mainly comprises two parts, namely, a three-stage network model aiming at the diabetic retinopathy grading task is constructed, and the model can split the complex diabetic retinopathy five-classification task into specific two/three-classification tasks, so that the category with specific lesions is classified by the specific model, and the accuracy of overall grading is improved. Secondly, attention modules are respectively added into the three-stage model, wherein a non-local spatial attention module is designed, the module can enable the model to have the functions of context information and sensitivity to spatial position information of lesions, and can effectively and accurately classify confusing categories, so that the accuracy of each stage model and the classification efficiency of the whole model are improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a general flow chart of the present application;

FIG. 2 is a schematic diagram of a non-local channel attention module;

FIG. 3 is a schematic diagram of a non-local spatial attention module;

fig. 4 is a network training-testing flow chart.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced otherwise than as described herein, and therefore the scope of the present application is not limited to the specific embodiments disclosed below.

The application designs a three-stage network model based on non-local attention in order to improve the grading efficiency of the severity of diabetic retinopathy and better assist doctors in early diagnosis of patients. When the method is specifically implemented, the technical scheme of the application can adopt the computer software technology to realize the automatic operation flow. A method of three-phase attention network based diabetic retinopathy stratification in accordance with an embodiment of the present application is described in detail below with reference to fig. 1-3.

The application provides a method for grading diabetic retinopathy based on a three-stage attention network, which specifically comprises the following steps:

s1, constructing a data set: firstly, remapping the original five-class labels of the data set into corresponding two-class labels and three-class labels; the data set is then divided into a training set and a test set, and the original training set is divided into three specific training sets according to three types of labels、/>、/>The method comprises the steps of carrying out a first treatment on the surface of the The data set is disclosed by APTOS 2019Blindness Detection on kagle, and the original label of the data set is +.>Wherein 0 represents no DR,1 represents mild DR,2 represents moderate DR,3 represents severe DR,4 represents proliferative DR, as shown in fig. 1, specifically comprising the steps of:

s1-3, constructing an original training set into three specific training sets according to the new label、/>、/>, wherein />All samples comprising the original training set; />Samples containing only DR; />Containing only non-value-added DR samples

S2, designing a non-local attention module: including non-local channel attention modulesAnd a non-local spatial attention module->The method comprises the steps of carrying out a first treatment on the surface of the The non-local attention module comprises two parts of feature extraction and non-local attention calculation, wherein the two parts are designed in a parallel mode, and the image is +.>The output after entering the module is a non-local channel attention strive +.>Non-local spatial attention patternsThe method comprises the steps of carrying out a first treatment on the surface of the As shown in fig. 2 and 3, the step S2 specifically includes the following steps:

s2-1, constructing a non-local channel attention module：

(I))))) （1）；

S2-1-2, the non-local channel attention part is divided into two sub-parts of global perception and channel attention; first, for an imageGlobal information modeling is performed, and mathematical representation is as follows:

（2）；

where i is the index of the output location for which the response is to be calculated, j is the index of all possible locations,representing a 1x1 convolution,/->Is a normalization parameter; />Expressed by formula (3), the objective is to calculate the similarity between the pixels of the image I by means of dot multiplication and activate it with the Softmax function:

（3）；

wherein and />Respectively representing two different 1x1 convolutions;

(/>Pool(/>)))/>(/>MaxPool(/>))) （4）；

wherein ,pool and MaxPool represent average pooling and maximum pooling operations, respectively, and then use full-join layers and />To learn the dependency between channels>Representing a Sigmoid activation function, the purpose is to limit the weight value of each channel to (0, 1); />Representing an add by element operation;

s2-1-4, channel attention vectorAnd (4) feature map>Fusing to obtain a feature map with channel attention, and obtaining a final non-local channel attention module through shortcut operation>Output of +.>The mathematical expression is:

（5）；

s2-2, constructing a non-local spatial attention module：

(I))))) （6）

S2-2-2, dividing the non-local spatial attention part into two sub-parts of global modeling and spatial attention generation; the global modeling operation is the same as equations (2) and (3) in S2-1 to obtain an image with context information。

（7）

S3: constructing a multi-label-based multi-classification network model and training; the model divides the network into three stages of Stage1, stage2 and Stage3 according to the redefined three types of labels in the S1, wherein Stage1 and Stage2 are classified tasks, stage3 is a three-classified task, the three stages are independently operated in a training Stage of the network model, and the testing stages are serial; i.e. the training set described in S1、/>、/>Respectively inputting the three models to train to obtain three models after training; constructing three-Stage network models Stage1, stage2 and Stage3, wherein Stage1 classifies whether DR exists in fundus images; stage2 classifies whether or not the fundus image is proliferative DR; stage3 is used for analyzing NPDR fundus image and furtherThe severity of the step subdivision comprises the following steps:

(.../>（8）

))) （9）

(.../>（10）

))) （11）

(.../>（12）

))) （13）

（14）

The training set of Stage3 has the problem of sample imbalance, and to alleviate this problem, focal-Loss is used as its Loss function:

（15）

S4: inputting the test set into a three-stage network model in turn for prediction to obtain the final test setPrediction resultThe method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:

S5: model evaluation: comprehensively evaluating the network grading effect by using an Accuracy index (Accuracy); the method specifically comprises the following steps:

（16）；

S5-2, calculating accuracy according to a formula to judge model performance:

（17）。

in the description of the present application, the term "plurality" means two or more, unless explicitly defined otherwise, the orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, merely for convenience of description of the present application and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present application; the terms "coupled," "mounted," "secured," and the like are to be construed broadly, and may be fixedly coupled, detachably coupled, or integrally connected, for example; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.

In the description of the present specification, the terms "one embodiment," "some embodiments," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above is only a preferred embodiment of the present application, and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for grading diabetic retinopathy based on a three-stage attention network, comprising the steps of:

The data set is disclosed by APTOS 2019Blindness Detection on kagle, and the original label of the data set isWherein 0 represents no DR,1 represents mild DR,2 represents moderate DR,3 represents severe DR,4 represents proliferative DR, comprising the steps of:

s2, designing a non-local attention module: including non-local channel attention modulesAnd a non-local spatial attention moduleThe method comprises the steps of carrying out a first treatment on the surface of the The non-local attention module comprises two parts of feature extraction and non-local attention calculation, wherein the two parts are designed in a parallel mode, and the image is +.>The output after entering the module is a non-local channel attention strive +.>Non-local spatial attention strive->；

The method specifically comprises the following steps:

s2-1, constructing a non-local channel attention module：

(I))))) （1）；

wherein 、/>Representing 1x1 convolution and 3x3 convolution, respectively,/->Representing a ReLU activation function;

（2）；

（3）；

wherein and />Respectively representing two different 1x1 convolutions;

(/>Pool(/>)))/>(/>MaxPool(/>))) （4）；

wherein ,pool and MaxPool represent mean pooling and maximum pooling operations, respectively, followed by the use of the full linker +.>Andto learn the dependency between channels>Representing a Sigmoid activation function; />Representing an add by element operation;

（5）；

s2-2, constructing a non-local spatial attention module：

S2-2-1 is similar to the feature extraction operation of S2-1-1 in S2-1, except that the 3x3 convolution is replaced by a large-kernel convolution of 13x13, so that the effective receptive field is increased, and the feature map has more shape bias; image processing apparatusExtracting features to obtain feature map->：

(I))))) （6）

S2-2-2, dividing the non-local spatial attention part into two sub-parts of global modeling and spatial attention generation; the global modeling operation is the same as equations (2) and (3) in S2-1 to obtain an image with context information；

（7）

wherein Represents a 7x7 convolution, ">Representing Sigmoid activation functions, avgPool and MaxPool represent average pooling operations and maximum pooling operations, respectively;

s2-2-4, and operating with the formula (5) in S2-1 to obtain the final non-local spatial attention moduleOutput of +.>；

the method specifically comprises the following steps:

s3-1, constructing Stage1; stage1 toAs a basic structure by stacking->，/>，...，/>And adding two layers +.>、/>As a classification head, obtaining a Stage1 network model; entering Stage1 for image I will yield two classification results, expressed mathematically as follows:

(.../>（8）

))) （9）

s3-2, constructing Stage2 and Stage 3; stage2, 3 are allAs basic structure +.>By stacking->，/>，...，/>And adding two layers +.>、/>As a classification head, obtaining Stage2 and 3 network models; entering Stage2 for image I will yield two classification results, expressed mathematically as follows:

(.../>（10）

))) （11）

(.../>（12）

))) （13）

s3-3, constructing model optimization targets of different stages; stage1 and Stage2 are two-class networks,

the cross entropy loss function is adopted:

（14）

The training set of Stage3 uses Focal-Loss as its Loss function:（15）

S3-4, training set described in S1、/>、/>Respectively inputting the three models to train to obtain three models after training;

S5: model evaluation: and comprehensively evaluating the network grading effect by using the accuracy index.

2. A method of diabetic retinopathy grading based on a three-phase attention network according to claim 1 wherein step S4 comprises in particular the steps of:

s4-1, inputting the test set sample into the trained Stage1 network to obtain a classification result；

If it is=0, then as final prediction result +.>=0; if->When the image is=1, the DR lesion exists in the image, and the image enters Stage2 for further prediction;

s4-3, the image enters a Stage3 network to obtain a three-classification resultThe method comprises the steps of carrying out a first treatment on the surface of the The result of Stage3 network prediction will be the final predicted result +.>=/>。

3. A method of diabetic retinopathy grading based on a three-phase attention network according to claim 1 wherein step S5 comprises in particular the steps of:

s5-1, comparing the predicted values of all samples in the test setAnd tag->Calculate the correct number of classification +.>；

S5-2, calculating accuracy according to a formula to judge model performance:

(17) Where N is the total number of samples in the test set.