CN114170089B

CN114170089B - Method for classifying diabetic retinopathy and electronic equipment

Info

Publication number: CN114170089B
Application number: CN202111540274.6A
Authority: CN
Inventors: 黄昶荃; 李永红; 彭小东
Original assignee: Chengdu Second Peoples Hospital
Current assignee: Chengdu Second Peoples Hospital
Priority date: 2021-09-30
Filing date: 2021-12-16
Publication date: 2023-07-07
Anticipated expiration: 2041-12-16
Also published as: CN114170089A

Abstract

The invention discloses a method and electronic equipment for classifying diabetic retinopathy, wherein the method comprises the steps of super-resolution reconstruction, classifying super-resolution images by utilizing an image classification network, and the like, the super-resolution reconstruction network comprises a preliminary convolution layer, a deep feature extraction unit, a feature fusion unit and an image amplification module, the preliminary convolution layer is used for carrying out shallow feature extraction on fundus images, the deep feature extraction unit comprises jump connection and a plurality of residual error attention modules, and the image amplification module is used for carrying out super-resolution reconstruction on second feature images. The invention increases the resolution of fundus images by utilizing the super-resolution technology, improves the accuracy of classifying diabetic retinopathy based on a deep learning method under the condition of limited hardware, meets the requirement of actual diagnosis, and relieves the situation of medical resource shortage; the super-resolution reconstruction network features have high extraction efficiency and good reconstruction effect.

Description

Method for classifying diabetic retinopathy and electronic equipment

Technical Field

The invention belongs to the technical field of medical treatment and image processing, and particularly relates to a method and electronic equipment for classifying diabetic retinopathy.

Background

Diabetes causes a number of complications, of which Diabetic Retinopathy (DR) is one of the more severe, which can lead to impaired and even blinding vision in patients. With the improvement of the physical living standard of people, the average age of people is gradually increased, the number of people suffering from diabetes in China is rapidly increased, and the number of patients needing to carry out retinopathy diagnosis is rapidly increased. On the other hand, conventional manual diagnosis depends on clinical experience of doctors, and qualified medical staff has a long culture period, and medical resources are increasingly deficient compared with the number of patients which are rapidly increased, so that a plurality of patients cannot be diagnosed and treated in time, and the disease is worsened.

Automatic classification of fundus retinal images using artificial intelligence techniques, particularly deep learning, to aid in diagnosis of diabetic retinopathy is an effective approach to address medical resource starvation. However, accurate diagnosis depends on high-quality image input, but economic level in many areas of China is not high, medical equipment conditions are behind, resolution ratio of retina images obtained through shooting is generally low, so that misdiagnosis rate of an image classification method based on deep learning is high, and actual requirements are difficult to meet.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method for classifying diabetic retinopathy and electronic equipment so as to improve the accuracy of classifying fundus retina images.

In order to achieve the above object, the present invention adopts the following solutions: a method for diabetic retinopathy classification, comprising the steps of:

s1, acquiring a fundus image, acquiring a trained super-resolution reconstruction network, and performing super-resolution reconstruction on the fundus image by using the super-resolution reconstruction network to obtain a super-resolution image with resolution greater than that of the fundus image;

s2, acquiring a trained image classification network, inputting the super-resolution image into the image classification network, and classifying the super-resolution image by using the image classification network;

the super-resolution reconstruction network includes:

the preliminary convolution layer is used for carrying out shallow layer feature extraction on the fundus image to obtain a first feature map;

the deep feature extraction unit is connected with the output end of the preliminary convolution layer, the deep feature extraction unit comprises jump connection and a plurality of residual attention modules, the residual attention modules are sequentially connected end to end, and the first feature map is input into each residual attention module through the jump connection;

the feature fusion unit is used for fusing the feature graphs output by the residual attention modules to obtain a second feature graph;

the image amplification module is used for carrying out super-resolution reconstruction on the second feature map to obtain the super-resolution image;

the mathematical model of the residual attention module is as follows:

F1＝σ ₁ (f ₃₁ (K _n ))

F2＝σ ₂ (f ₃₂ (K _n ))

F3＝σ ₃ (f ₃₃ (F2))

F4＝f _ca (f _T (F1,F2,K ₀ ))

F5＝f _m (F4,[F1,F3])

K _n+1 ＝σ ₄ (f ₁ (F5))+K _n

wherein K is ₀ Representing the first feature map, K _n K is a feature map of upstream output ₀ And K _n At the same time as input to the residual attention module, f ₃₁ 、f ₃₂ And f ₃₃ All represent convolution operations with convolution kernel size 3*3, f ₁ Representing a convolution operation with a convolution kernel size 1*1, σ ₁ 、σ ₂ 、σ ₃ Sum sigma ₄ All represent a nonlinear activation function ReLU, f _T Represents a ternary fusion module for fusing F1, F2 and K ₀ Fusion, f _ca Representing the channel attention module, [ ·]Representing the splicing operation of the characteristic graphs _m (F4，[F1,F3]) Represents K after F1 and F3 are spliced and then are fused with F4 _n+1 And the characteristic diagram represents the output of the residual attention module.

Further, the ternary fusion module may be expressed as the following formula:

F _T1 ＝σ _T1 (f _T1 ([F1,F2,K ₀ ]))

F _T2 ＝σ _T2 (F1+F2)

F _T ＝σ _T3 (f _T3 ([F _T1 ,F _T2 ]))

wherein F1, F2 and K ₀ At the same time, f is used as the input of the ternary fusion module _T1 Representing a convolution operation with a convolution kernel size of 1*1, f _T3 Representing a convolution operation with a convolution kernel size of 3*3, σT ₁ 、σT ₂ And sigma T ₃ All represent nonlinear activation functions ReLU, [. Cndot.]Representing the splicing operation of the characteristic graphs in the characteristic graphs, F _T And outputting the data to the ternary fusion module.

Further, the mathematical model of the channel attention module is:

Fa1＝δ _a1 (f _a12 (σ _a1 (f _a11 (p _v (F _T )))))

Fa2＝δ _a2 (f _a22 (σ _a2 (f _a21 (p _M (F _T )))))

Fa3＝δ _a3 (f _a32 (σ _a3 (f _a31 (p _Max (F _T )))))

wherein F is _T Representing the output of the ternary fusion module, F _T As input to the channel attention module, p _v Representing global differential pooling operations, p _M Representing global average pooling operations, p _Max Representing global max pooling operations, f _a11 、f _a12 、f _a21 、f _a22 、f _a31 And f _a32 All represent full join operation, sigma _a1 、σ _a2 Sum sigma _a3 All represent a nonlinear activation function ReLU, delta _a1 、δ _a2 And delta _a3 Are all sigmoid activation functions, fa1, fa2, and Fa3 are channel attention diagrams output by the channel attention module.

Further, the fusion process of the feature map obtained after the feature map F1 and the feature map F3 are spliced with the channel attention map comprises the following steps:

a1, the feature map obtained after the feature map F1 and the feature map F3 are spliced is multiplied by three channel attention force maps output by the channel attention module respectively to obtain a first enhancement feature map, a second enhancement feature map and a third enhancement feature map,

a2, splicing the first enhancement feature map, the second enhancement feature map and the third enhancement feature map in the channel direction.

Further, the image classification network is MobileNet V3.

The invention also provides an electronic device for diabetic retinopathy classification, comprising a processor and a memory, the memory storing a computer program, the processor being adapted to execute the method for diabetic retinopathy classification as described above by loading the computer program.

The beneficial effects of the invention are as follows:

(1) The resolution of the fundus image is increased by utilizing the super-resolution technology, the accuracy of classifying the diabetic retinopathy based on the deep learning method is improved under the limited hardware condition, so that the actual diagnosis needs are met, the burden of medical staff is reduced, the situation of medical resource shortage is relieved, and the condition of a patient is increased due to the fact that the patient does not visit in time is avoided;

(2) In the residual attention module, the input feature map is subjected to feature extraction through two branches, one branch is subjected to convolution operation of 3*3, the other branch is subjected to convolution operation of two 3*3 in series, which is equivalent to the convolution operation of 5*5, and because super-resolution reconstruction is a more apparent task than classification, in the invention, the feature map output by the front end convolution layer of the residual attention module is input into a ternary fusion module to generate a channel attention map, and because the feature map contains relatively more apparent feature information, the generated channel attention map is better in modulation effect, and is favorable for reconstructing a super-division image with better effect;

(3) In order to make full use of information in the low-resolution image, the first feature map is input into each residual attention module through jump connection, and after the first feature map is fused with the feature map in the residual attention module, the channel attention map generated by the channel attention module is more beneficial to improving the extraction ratio of high-frequency information by modulation and reducing the redundancy of useless information;

(4) For different residual attention modules, as depth positions of the residual attention modules in a network are different, scales of information in F1 and F2 are also different, in order to enable a first feature map to be well fused with F1 feature maps and F2 feature maps in different residual attention modules, the inventor creatively designs a ternary fusion module, the F1, F2 and the first feature map are fused by splicing and dimension reduction, the obtained feature map is spliced with the feature map after summation and activation of the F1 and the F2, and finally, feature extraction is carried out through a 3*3 convolution layer, so that specific feature information in the first feature map and abstract feature information in the F1 and F2 are well balanced, and the first feature map input into the different residual attention modules can be adaptively fused with information of different scales, so that useful feature information extraction efficiency is improved;

(5) For an attention mechanism (such as CBAM) comprising a plurality of modules, attention force generated by different attention modules is multiplied with a feature diagram in sequence in a serial mode, after the upstream attention force is used for modulating the feature diagram, the importance and the duty ratio of different information are changed, and the attention module positioned at the back is interfered.

Drawings

FIG. 1 is a schematic diagram of a super-resolution reconstruction network structure according to the present invention;

FIG. 2 is a schematic diagram of a residual attention module structure according to the present invention;

FIG. 3 is a schematic diagram of a ternary fusion module according to the present invention;

FIG. 4 is a schematic diagram of a first branch structure of the channel attention module according to the present invention;

FIG. 5 is a schematic diagram of a second branch structure of the channel attention module according to the present invention;

FIG. 6 is a schematic diagram of a third branch structure of the channel attention module according to the present invention;

FIG. 7 is a schematic diagram of a feature fusion unit according to the present invention;

FIG. 8 is a schematic diagram of an image magnification module according to the present invention;

in the accompanying drawings:

the system comprises a 1-fundus image, a 2-preliminary convolution layer, a 3-deep feature extraction unit, a 4-jump connection, a 5-residual attention module, a 51-ternary fusion module, a 52-channel attention module, a 53-first branch, a 54-second branch, a 55-third branch, a 6-feature fusion unit, a 7-image amplification module and an 8-super-resolution image.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

examples:

the fundus image 1 is collected, a fundus image dataset is made, and the dataset is divided into three parts, a first training set, a second training set, and a test set. The first training set, the second training set and the test set all contain lesion information marked by human experts, and the lesion information comprises five categories of no lesions, slight lesions, medium lesions, serious lesions and proliferation lesions.

Super-resolution reconstruction network training:

the super-resolution reconstruction network structure is shown in fig. 1, and in this embodiment, the super-resolution reconstruction network includes sixteen residual attention modules 5. The first training set image is downsampled to obtain a low-resolution fundus image 1, and then the super-resolution reconstruction network is trained by using the low-resolution fundus image 1 and the first training set image. An L2 loss function optimization model is adopted in the training process.

The primary convolution layer 2 is realized by adopting a convolution layer with the convolution kernel size of 3*3, the number of channels of the first feature map output by the primary convolution layer 2 is 48, and the output end of the primary convolution layer 2 is connected with the input end of the deep feature extraction unit 3. For the first residual attention module 5, the first feature map is input to the first residual attention module 5 from both the upstream end and the jump connection 4, and the input end of the subsequent residual attention module 5 receives as input the feature map output by the previous residual attention module 5.

As shown in fig. 2, the residual attention module 5 is configured such that feature extraction is performed through two branches from the feature map input upstream to the residual attention module 5, the number of channels of the feature maps (F1, F3) output from the two branches is 48, and after the channels are spliced in the channel direction, the number of channels becomes 96.

The structure of the ternary fusion module 51 is shown in fig. 3, the number of channels of three feature maps (first feature map, F1, F2) input into the ternary fusion module 51 is 48, and after the three feature maps are subjected to splicing, 1*1 convolution and activation operations, a first intermediate feature map with the number of channels being 48 is output. On the other hand, after F1 and F2 are summed and activated, a second intermediate feature map with 48 channels is output. The first intermediate profile and the second intermediate profile are then subjected to a stitching, 3*3 convolution and activation function to output 96 profiles as inputs to the channel attention module 52.

In this embodiment, the channel attention module 52 includes a first branch 53 (shown in FIG. 4), a second branch 54 (shown in FIG. 5), and a third branch 55 (shown in FIG. 6), each of which includes a global pooling operation and two fully connected operations. In each branch, the input node number of the first full connection layer is 96, the output node number is 24, the input node number of the second full connection layer is 24, and the output node number is 96. After modulating the characteristic graphs after the splicing of the F1 and the F3 respectively, the three channels are spliced in the channel direction, then a 1*1 convolution layer and a ReLU activation function are input, and the characteristic graph with 48 channels is output.

The structure of the feature fusion unit 6 is as shown in fig. 7, all feature graphs output by the residual attention module 5 are spliced in the channel direction, then the channel number is changed to 48 by convolution with a 1*1 convolution layer, and finally a second feature graph is output through a ReLU activation function.

The image amplifying module 7 may be implemented by adopting the existing deconvolution and sub-pixel convolution operations, in this embodiment, the structure of the image amplifying module 7 is shown in fig. 8, the image amplifying module 7 includes two amplifying components and a tail end convolution layer, the tail end convolution layer is disposed at the tail end of the image amplifying component, the two amplifying components are sequentially connected end to end, and the amplifying component includes a deconvolution layer and a ReLU activation function which are sequentially connected. The convolution kernel sizes of the tail end convolution layer and the deconvolution layer are 3*3, the length and width sizes of the feature images are doubled when the feature images pass through one amplifying assembly, and the number of channels is unchanged. For the end convolution layer, the number of input channels is 48 and the number of output channels is 3 for the super-resolution image 8.

Image classification network training:

and performing super-resolution reconstruction on the images in the second training set by using the super-resolution reconstruction network after training, and then training an image classification network by using the obtained super-resolution images 8. The image classification network adopts MobileNet V3, and cross entropy is adopted as a loss function in the training process.

Classification of diabetic retinopathy:

inputting the images in the test set into a super-resolution reconstruction network trained before, inputting the reconstructed and output images into an image classification network trained before, and calculating the accuracy of model classification by combining the labeling information.

As a comparison experiment, the super-resolution reconstruction network provided by the invention is replaced by a SAN model, the SAN model and the image classification network are retrained under the identical conditions (comprising a training set, a framework, epoch, batchsize, magnification and the like) according to the method, and the test is carried out on the same test set, so that the accuracy of the classification result finally obtained by adopting the super-resolution reconstruction network provided by the invention is 7.4% higher than that obtained by adopting the SAN network.

The foregoing examples merely illustrate specific embodiments of the invention, which are described in greater detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims

1. A method for classifying diabetic retinopathy, characterized by: the method comprises the following steps:

the super-resolution reconstruction network includes:

the mathematical model of the residual attention module is as follows:

F1＝σ ₁ (f ₃₁ (K _n ))

F2＝σ ₂ (f ₃₂ (K _n ))

F3＝σ ₃ (f ₃₃ (F2))

F4＝f _ca (f _T (F1,F2,K ₀ ))

F5＝f _m (F4,[F1,F3])

K _n+1 ＝σ ₄ (f ₁ (F5))+K _n

2. The method for diabetic retinopathy classification as claimed in claim 1, wherein: the ternary fusion module may be expressed as the following formula:

F _T1 ＝σ _T1 (f _T1 ([F1,F2,K ₀ ]))

F _T2 ＝σ _T2 (F1+F2)

F _T ＝σ _T3 (f _T3 ([F _T1 ,F _T2 ]))

wherein F1, F2 and K ₀ At the same time, f is used as the input of the ternary fusion module _T1 Representing a convolution operation with a convolution kernel size of 1*1, f _T3 Representing a convolution operation with a convolution kernel size 3*3, σ _T1 、σ _T2 Sum sigma _T3 All represent nonlinear activation functions ReLU, [. Cndot.]Representing the splicing operation of the characteristic graphs in the characteristic graphs, F _T And outputting the data to the ternary fusion module.

3. The method for diabetic retinopathy classification as claimed in claim 1, wherein: the mathematical model of the channel attention module is as follows:

Fa1＝δ _a1 (f _a12 (σ _a1 (f _a11 (p _v (F _T )))))

Fa2＝δ _a2 (f _a22 (σ _a2 (f _a21 (p _M (F _T )))))

Fa3＝δ _a3 (f _a32 (σ _a3 (f _a31 (p _Max (F _T )))))

4. A method for diabetic retinopathy classification as claimed in claim 3, wherein: the fusion process of the characteristic diagram obtained after the characteristic diagram F1 and the characteristic diagram F3 are spliced and the channel attention diagram comprises the following steps:

5. The method for diabetic retinopathy classification as claimed in claim 1, wherein: the image classification network is MobileNet V3.

6. An electronic device for diabetic retinopathy classification, characterized by: comprising a processor and a memory, said memory storing a computer program for executing the method for diabetic retinopathy classification according to any of claims 1-5 by loading said computer program.