CN116758038A

CN116758038A - Infant retina disease information identification method and system based on training network

Info

Publication number: CN116758038A
Application number: CN202310747947.8A
Authority: CN
Inventors: 张国明; 刘亚玲; 谢海; 赵欣予; 吴祯泉; 唐建楠; 郑棉瑩; 陈妙虹; 雷柏英; 汪天富
Original assignee: Shenzhen Eye Hospital (shenzhen Institute Of Eye Disease Prevention And Control)
Current assignee: Shenzhen Eye Hospital (shenzhen Institute Of Eye Disease Prevention And Control)
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-09-15

Abstract

The application relates to a training network-based infant retinal disease information identification method and system, wherein the method comprises the following steps: extracting local-global features of the retina image to be detected by adopting a CNN and Transformer mixed network, fusing the local-global features by adopting a depth attention fusion module, training ROP data in the fused features by adopting a training network, extracting information with depth feature expression, and realizing ROP severity classification; combining the advantages and disadvantages of CNN and a transducer network, the dual tasks of automatically detecting various infant retinopathy and grading the severity of ROP are realized; the device can detect various common infant fundus lesions, shield the influence of other lesion characteristics, and greatly improve the accuracy of an automatic detection system for the common infant fundus lesions; the discomfort of the child patient due to the fact that repeated examination is difficult to diagnose is reduced to a certain extent, the certain misdiagnosis and missed diagnosis rate are reduced, and the diagnosis and treatment efficiency of doctors is improved.

Description

Infant retina disease information identification method and system based on training network

Technical Field

The application relates to the technical field of ophthalmic disease recognition and image recognition, in particular to a training network-based infant retinal disease information recognition method and system.

Background

Deep Learning (DL) is a mature but still rapidly evolving technology, especially in the context of computer-aided diagnosis of human diseases. In terms of computer major algorithms, he et al proposed a model res net with a residual structure, so that network depth was continuously increased without overfitting, and shallow-deep features were extracted to improve network recognition accuracy. Dosovitsky et al propose a different scale Transformer framework, i.e., design different scale input block sizes to train large scale data to obtain higher classification accuracy. Chen et al employ a pyramid structure and select a new region-local attention mechanism instead of a global self-attention mechanism to obtain more spatial information, thereby improving classification accuracy. Tu et al describe an efficient and extensible attention model called multiaxial attention (MaxViT) that consists of two aspects: blocking local attention and expanding global attention. These design choices allow global local spatial interactions at any input resolution with only linear complexity. Valanaasu et al propose a gated axial attention model that extends the existing architecture by introducing additional control mechanisms in the self-attention module, and we propose a local-global training strategy that further improves performance in order to train the model efficiently on medical images. Zhang et al propose a new parallel branch TransFuse network that combines the transformers and CNNs together in a parallel fashion, can effectively capture global dependencies and low-level spatial detail features in a shallower fashion, and fuses features extracted at different levels of the two branches using a bi-directional fusion module.

Deep learning techniques are widely used in the field of medical image analysis. As a representative framework of deep learning, convolutional neural networks (Convolutional Neural Network, CNN) are often used in backbone network frameworks to extract deep features in medical images by virtue of their strong feature extraction capabilities. The residual network proposed in 2015 can pay attention to the information of the shallow features while extracting the deep features due to the unique residual jump connection. Therefore, the residual network can pay attention to the characteristic information of the deep layer and the shallow layer, so that the extracted characteristics are more complete, and the network performance is better. The residual network is also selected as one of the branches for feature extraction. However, the features extracted by the pure CNN network lack the expression of global feature information, so that the network performance improvement has a certain limitation. For this reason, a Transformer network has been developed that can constantly learn global feature information having long-distance dependency relationships using a multi-head self-attention mechanism.

Among the numerous ocular diseases, congenital abnormalities and early-onset diseases are particularly important. Conventional ocular fundus diseases in infants generally include retinopathy of prematurity (ROP), coats disease, retinoblastoma (RB), retinitis Pigmentosa (RP), choroidal defects, congenital retinal folds, and familial exudative vitreoretinopathy. Most diseases have a long-term impact on the structure and function of the eye, including ametropia, night blindness, and may increase the abnormal arrangement of the eye (strabismus) and neovascular glaucoma. Among them, ROP is a major cause of vision impairment and blindness in children, and even 8000 RB newborns worldwide may need to undergo eyeball removal surgery to save lives.

From a clinical point of view, these infant conditions often lead to severe vision impairment and even blindness in the child for life, which has a long-term impact on society, especially future employment pressures. Since infant fundus diseases are not common in some general hospitals, they are often ignored, and even after they are encountered, accurate diagnosis may not be made. At the same time, specialized ophthalmologists worldwide are not available in sufficient resources.

In the field of ocular fundus diseases, several studies on automatic examination methods for retinopathy of prematurity have been successively carried out, and most of these studies are to detect a single ROP lesion. To date, we have found that little research has focused on detecting more than one type of ocular fundus disease in infants, and even a variety of ocular fundus diseases. In real life, particularly in remote areas lacking specialized ophthalmologists, it is necessary to effectively detect various types of fundus diseases.

Therefore, a more efficient auxiliary detection system is needed to solve this problem.

Disclosure of Invention

Aiming at the defects in the prior art, the application provides a training network-based infant retina disease information identification method and a training network-based infant retina disease information identification system.

The technical scheme adopted for solving the technical problems is as follows:

a training network-based infant retinal disease information identification method is constructed, which comprises the following steps:

extracting local-global characteristics of the retina image to be detected by adopting a CNN and Transformer mixed network;

the depth attention fusion module is adopted to fuse the local-global characteristics, so as to obtain fused characteristics with the local-global characteristic expression capability;

and training ROP data in the fused features by adopting a training network, extracting information with depth feature expression, and realizing ROP severity grading.

The application relates to a training network-based infant retinal disease information identification method, wherein the method for extracting local-global features of a retina image to be detected by adopting a CNN and Transformer mixed network comprises the following steps:

extracting depth semantic feature information in the retina image to be detected by using residual error network modules with different scales of the CNN network to obtain local features;

and extracting global features with long-distance dependency relations by using a 4-stage transducer network module.

The application discloses a training network-based infant retinal disease information identification method, wherein the fusion of local-global features by using a deep attention fusion module comprises the following steps:

the characteristics extracted by the residual block network and the transducer network module are used as the input of the depth attention fusion module, and element level addition processing is carried out after point convolution operation;

the processed features are input into a ReLU activation function for activation, then point convolution is utilized for feature extraction, and then the feature extraction is performed through a Sigmoid function for activation, so that attention features are obtained;

and performing element-level multiplication operation on the obtained attention characteristic and the characteristic obtained by the transducer network module to obtain a deep attention fusion characteristic.

The application discloses a training network-based infant retinal disease information identification method, wherein the algorithm flow of a transducer network module comprises the following steps:

firstly, carrying out blocking operation on an input image, then processing the block image by utilizing block embedding operation, carrying out layer regularization treatment on processed block features, inputting the processed block features into a multi-head attention module, outputting features, and adding the features and the block embedding features to obtain attention features;

and inputting the obtained attention features into a layer regularization module to obtain regularized features, inputting the regularized features into a multi-layer perceptron to obtain processed features, and adding the processed features with the previous attention features at element level to obtain module processing features of the converter network module.

The application discloses a training network-based infant retinal disease information identification method, wherein the residual network module algorithm flow comprises the following steps:

firstly, inputting an input image into a 3×3 convolution to obtain a convolution feature, regularizing and activating a function to obtain a standardized feature, inputting the feature into the 3×3 convolution to obtain the convolution feature, and inputting the convolution into the regularized function to obtain the regularized feature;

and adding the characteristics obtained by the operation with the input characteristics by element level to obtain processed characteristics, and obtaining residual module characteristics through an activation function.

An infant retinal disease information identification system based on a training network, comprising:

the hybrid network module is composed of CNN and a transducer and is used for extracting local-global characteristics of the retina image to be detected;

the depth attention fusion module is used for fusing the local-global characteristics to obtain fused characteristics with the local-global characteristic expression capability;

and the training network module is used for training ROP data in the fused features, extracting information with depth feature expression and realizing ROP severity classification.

The application relates to a training network-based infant retinal disease information identification system, wherein the local-global characteristics of a retinal image to be detected are extracted by a hybrid network module by adopting the following steps:

The application discloses a training network-based infant retinal disease information identification system, wherein the deep attention fusion module fuses local-global characteristics by adopting the following method:

The application discloses a training network-based infant retinal disease information identification system, wherein the algorithm flow of a transducer network module comprises the following steps:

The application discloses a training network-based infant retinal disease information identification system, wherein the residual network module algorithm flow comprises the following steps:

The application has the beneficial effects that: combining the advantages and disadvantages of the CNN and the Transformer network, providing a double-stage deep learning network combining the CNN and the Transformer network, and realizing the double tasks of automatically detecting various infant retinopathy and ROP severity classification; the device can detect various common infant fundus lesions, shield the influence of other lesion characteristics, and greatly improve the accuracy of an automatic detection system for the common infant fundus lesions; the method reduces the uncomfortable feeling of the child patient due to difficult diagnosis and repeated examination to a certain extent, reduces certain misdiagnosis and missed diagnosis rate, improves the diagnosis and treatment efficiency of doctors, and therefore reduces the complications of the child patient to a certain extent.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the present application will be further described with reference to the accompanying drawings and embodiments, in which the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained by those skilled in the art without inventive effort:

FIG. 1 is a flowchart of a training network-based infant retinal disease information identification method in accordance with a preferred embodiment of the present application;

FIG. 2 is a block diagram of a two-stage training network based method for identifying retinal diseases in infants in accordance with a preferred embodiment of the present application;

FIG. 3a is a block diagram of residual structure of a dual-stage training network based multiple infant retinal disease identification methods according to a preferred embodiment of the present application;

FIG. 3b is a schematic diagram of a two-stage training network based multiple infant retinal disease identification method transducer module according to the preferred embodiment of the present application;

FIG. 3c is a deep attention fusion module of a multiple infant retinal disease identification method based on a dual stage training network in accordance with a preferred embodiment of the present application;

FIGS. 4a-d are schematic diagrams of four confusion matrices, res-18, maxViT, res-18+MaxViT, res-18+MaxViT+DA, in accordance with a preferred embodiment of the present application;

fig. 5 is a schematic block diagram of an infant retinal disease information identification system based on a training network in accordance with a preferred embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following description will be made in detail with reference to the technical solutions in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present application, based on the embodiments of the present application.

The training network-based infant retinal disease information identification method according to the preferred embodiment of the present application, as shown in fig. 1, with reference to fig. 2, 3a, 3b, 3c and 4a-d, comprises the following steps:

s01: extracting local-global characteristics of the retina image to be detected by adopting a CNN and Transformer mixed network;

s02: the depth attention fusion module is adopted to fuse the local-global characteristics, so as to obtain fused characteristics with the local-global characteristic expression capability;

s03: training ROP data in the fused features by adopting a training network, extracting information with depth feature expression, and realizing ROP severity grading;

combining the advantages and disadvantages of the CNN and the Transformer network, providing a double-stage deep learning network combining the CNN and the Transformer network, and realizing the double tasks of automatically detecting various infant retinopathy and ROP severity classification; the device can detect various common infant fundus lesions, shield the influence of other lesion characteristics, and greatly improve the accuracy of an automatic detection system for the common infant fundus lesions; the method reduces the uncomfortable feeling of the child patient due to difficult diagnosis and repeated examination to a certain extent, reduces certain misdiagnosis and missed diagnosis rate, improves the diagnosis and treatment efficiency of doctors, and therefore reduces the complications of the child patient to a certain extent.

Aiming at the dual tasks of automatically detecting various infant retinopathy and ROP severity grading, a dual-stage training network is designed, as shown in figure 2, for the training network 1, the application utilizes a mixed network of CNN and a transducer to extract local-global characteristics, utilizes a 4-stage transducer network to extract global characteristics with long-distance dependency, and utilizes residual networks with different scales to extract deep semantic characteristic information. The features extracted by CNN and the transducer network branches are fused by a deep attention fusion module, so that the fused features have the local-global feature expression capability.

The residual error network module, the transducer network module and the deep attention fusion module used in the application are shown in fig. 3a, 3b and 3 c;

preferably, the residual network module algorithm flow includes:

Preferably, the method for fusing the local-global features by using the deep attention fusion module comprises the following steps:

Preferably, the algorithm flow of the transducer network module comprises:

By adopting the training network 1 of the application, by comparing different base line network results, a proper frame of the base line network component training network 1 is selected, and the comparison results of the different base line networks in the retinal multi-disease classification and identification are shown in the table 1:

TABLE 1 comparison of different basis-line networks for retinal multiple disease identification

As can be seen from Table 1, the Res-18 model is best classified as a CNN network and is the lowest complexity, so the present application selects the ResNet-18 model as the CNN branched backbone network. For a transducer network, the MaxViT can perform local and global spatial information interaction due to the multi-axis attention mechanism designed by the MaxViT, so that the best classification performance is obtained, and therefore, the MaxViT is selected as a backbone network of a transducer branch.

In summary, the application selects ResNet-18 and MaxViT as backbone design hybrid network to extract local-global characteristics to realize the classification and identification task of multiple retina diseases.

In order to verify the effectiveness of the design, the application also carries out a plurality of ablation experiments under the condition of ensuring that the verification settings are completely consistent, namely, firstly, a ResNet-18 network and a MaxViT network are respectively utilized to carry out complete experiments, then, the two networks are combined to carry out experiments, and finally, a deep attention fusion module is added to carry out experiments on the basis before. The results are shown in Table 2.

TABLE 2 ablation experiment results for different network modules

As can be seen from Table 2, the classification results obtained by combining ResNet-18 and MaxViT network models are the best, and the overall performance is further improved after the deep attention fusion module is added.

In addition, the application calculates confusion matrixes of different methods on different disease types as an auxiliary evidence, and proves that the application proposes that the network framework is effective in the task of classifying and identifying the multiple diseases of the retina, as shown in fig. 4a-d, and (a) - (d) in fig. 4a-d respectively represent the confusion matrixes obtained by four modes of Res-18, maxViT, res-18+MaxViT and Res-18+MaxViT+DA (the method of the application).

Aiming at the training network 2, the retina multi-disease identification result is obtained through the training network 1, and the severity classification is carried out on the obtained ROP category data by using the training network 2.

In the application, the ResNet-34 model is selected as a framework in consideration of precision and complexity, and the classification task of the ROP severity is realized, and the auxiliary evidence is shown in a table 3.

TABLE 3 comparison of results of different network models on ROP severity classification tasks

Methods	Accuracy	Precision	Recall	F1	Kappa
						ResNet-18	91.82(0.49)	93.01(0.26)	92.83(0.98)	92.89(0.52)	86.64(1.41)
ResNet-34	92.41(0.02)	93.19(0.19)	93.47(0.02)	93.21(0.05)	87.61(0.19)
						ResNet-50	93.14(0.13)	94.23(0.14)	94.05(0.07)	94.13(0.09)	88.78(0.32)

An infant retinal disease information identification system based on a training network, as shown in fig. 5, includes:

the hybrid network module 100 is composed of CNN and a transducer and is used for extracting local-global characteristics of the retina image to be detected;

the depth attention fusion module 101 is configured to fuse the local-global features to obtain fused features with the local-global feature expression capability;

the training network module 102 is configured to train ROP data in the fused features, extract information with depth feature expression, and implement ROP severity classification.

The specific overview of the system content is referred to the above method section and will not be repeated here;

in summary, the following drawbacks in the prior art are addressed:

1. prediction of a single disease.

2. Most often using a single neural network

3. Modules are added in a transducer and a CNN, so that the complexity of a network is increased;

4. aiming at the single-task design, the design structure of the double tasks in the application cannot be satisfied yet;

the method and the system have the following beneficial effects:

1. the application combines knowledge in the business field, can detect various common infant fundus lesions, shield the influence of other lesion characteristics, and greatly improve the accuracy (including indexes such as sensitivity, specificity, F1 and the like) of the common infant fundus lesion automatic inspection system.

2. Aiming at the characteristics of complex characteristics and small difference of certain focus of the common fundus image of the infants, the prior art mainly adopts manual judgment to easily misdiagnose or difficult to give accurate disease judgment; because all pictures are analyzed in the same way, subjectivity is not involved. Although this work is the job to be completed for pediatric ophthalmologists, the misdiagnosis rate and missed diagnosis rate of common infant fundus diseases are high due to the limited professional level; the device can accurately and rapidly identify various common infant fundus diseases through rapid and efficient learning, reduces uncomfortable feeling of the infant due to difficult diagnosis and repeated inspection to a certain extent, reduces certain misdiagnosis and missed diagnosis rate, improves diagnosis and treatment efficiency of doctors, and therefore complications of the infant to a certain extent.

3. The operation is simple, and the universal adaptability is realized. Once properly trained, even a non-ophthalmologist doctor can make a preliminary diagnosis so that the child patient does not miss the optimal treatment time; .

4. The algorithm model has the advantages that: extracting a network structure with local-global characteristic information expression based on a mixed framework of CNN and a transducer; the depth attention module is used for fusing the feature information extracted from the CNN and the transducer branches, so that the extracted features have complete expressivity; aiming at the dual tasks of automatically detecting various infant retinopathy and ROP severity classification, a dual-stage data feature training network is designed, namely, the identification of multiple retinal diseases is realized by utilizing a CNN and Transformer hybrid network, and the ROP severity classification task is realized by utilizing a ResNet-34 model.

It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.

Claims

1. The infant retina disease information identification method based on the training network is characterized by comprising the following steps of:

2. The training network-based infant retinal disease information identification method according to claim 1, wherein the extracting of local-global features of the retinal image to be measured using the CNN and transducer mixed network comprises the steps of:

3. The training network-based infant retinal disease information identification method according to claim 2, wherein the fusing of local-global features using the deep attention fusion module comprises the steps of:

4. The training network-based infant retinal disease information identification method according to claim 2, wherein the Transformer network module algorithm flow comprises:

5. The training network-based infant retinal disease information identification method according to claim 2, wherein the residual network module algorithm flow comprises:

6. An infant retinal disease information identification system based on a training network, comprising:

7. The training network-based infant retinal disease information identification system of claim 6, wherein the hybrid network module extracts local-global features of the retinal image to be tested using the method of:

8. The training network-based infant retinal disease information identification system of claim 7, wherein the deep attention fusion module fuses local-global features using the method of:

9. The training network-based infant retinal disease information identification system of claim 7, wherein the Transformer network module algorithm flow comprises:

10. The training network-based infant retinal disease information identification system of claim 7, wherein the residual network module algorithm flow comprises: