CN117426748A - MCI detection method based on multi-mode retina imaging - Google Patents
MCI detection method based on multi-mode retina imaging Download PDFInfo
- Publication number
- CN117426748A CN117426748A CN202311151559.XA CN202311151559A CN117426748A CN 117426748 A CN117426748 A CN 117426748A CN 202311151559 A CN202311151559 A CN 202311151559A CN 117426748 A CN117426748 A CN 117426748A
- Authority
- CN
- China
- Prior art keywords
- modal
- mci
- fundus
- oct
- cross
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003384 imaging method Methods 0.000 title claims abstract description 26
- 210000001525 retina Anatomy 0.000 title abstract description 16
- 238000001514 detection method Methods 0.000 title description 12
- 238000012014 optical coherence tomography Methods 0.000 claims abstract description 91
- 238000000034 method Methods 0.000 claims abstract description 76
- 230000004927 fusion Effects 0.000 claims abstract description 54
- 230000002207 retinal effect Effects 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 230000007246 mechanism Effects 0.000 claims abstract description 13
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000013507 mapping Methods 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 9
- 210000004204 blood vessel Anatomy 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 7
- 230000002792 vascular Effects 0.000 claims description 6
- 230000003930 cognitive ability Effects 0.000 claims description 5
- 210000001367 artery Anatomy 0.000 claims description 4
- 238000010586 diagram Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 210000003462 vein Anatomy 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 208000010877 cognitive disease Diseases 0.000 abstract description 66
- 230000001149 cognitive effect Effects 0.000 abstract description 8
- 208000028698 Cognitive impairment Diseases 0.000 abstract description 6
- 239000000090 biomarker Substances 0.000 abstract description 6
- 230000004913 activation Effects 0.000 abstract description 3
- 230000002776 aggregation Effects 0.000 abstract description 2
- 238000004220 aggregation Methods 0.000 abstract description 2
- 208000027061 mild cognitive impairment Diseases 0.000 description 57
- 208000024827 Alzheimer disease Diseases 0.000 description 38
- 230000004256 retinal image Effects 0.000 description 12
- 201000010099 disease Diseases 0.000 description 11
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 11
- 238000013459 approach Methods 0.000 description 10
- 230000009977 dual effect Effects 0.000 description 10
- 230000019771 cognition Effects 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 7
- 238000012216 screening Methods 0.000 description 7
- 206010012289 Dementia Diseases 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 6
- 238000003745 diagnosis Methods 0.000 description 6
- 210000004556 brain Anatomy 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000005206 flow analysis Methods 0.000 description 4
- 238000010191 image analysis Methods 0.000 description 4
- 208000006096 Attention Deficit Disorder with Hyperactivity Diseases 0.000 description 3
- 208000036864 Attention deficit/hyperactivity disease Diseases 0.000 description 3
- 208000015802 attention deficit-hyperactivity disease Diseases 0.000 description 3
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 208000035231 inattentive type attention deficit hyperactivity disease Diseases 0.000 description 3
- 210000004126 nerve fiber Anatomy 0.000 description 3
- 238000002610 neuroimaging Methods 0.000 description 3
- 230000003557 neuropsychological effect Effects 0.000 description 3
- 230000007170 pathology Effects 0.000 description 3
- 238000002600 positron emission tomography Methods 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002902 bimodal effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000001771 impaired effect Effects 0.000 description 2
- 238000002595 magnetic resonance imaging Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000002911 mydriatic effect Effects 0.000 description 2
- 210000003733 optic disk Anatomy 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- AOYNUTHNTBLRMT-SLPGGIOYSA-N 2-deoxy-2-fluoro-aldehydo-D-glucose Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)[C@@H](F)C=O AOYNUTHNTBLRMT-SLPGGIOYSA-N 0.000 description 1
- 206010012667 Diabetic glaucoma Diseases 0.000 description 1
- 206010012689 Diabetic retinopathy Diseases 0.000 description 1
- 208000003098 Ganglion Cysts Diseases 0.000 description 1
- 208000010412 Glaucoma Diseases 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 206010021034 Hypometabolism Diseases 0.000 description 1
- 206010058558 Hypoperfusion Diseases 0.000 description 1
- 206010025421 Macule Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 208000022873 Ocular disease Diseases 0.000 description 1
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 1
- 208000002367 Retinal Perforations Diseases 0.000 description 1
- 208000017442 Retinal disease Diseases 0.000 description 1
- 206010038923 Retinopathy Diseases 0.000 description 1
- 206010038933 Retinopathy of prematurity Diseases 0.000 description 1
- 208000005400 Synovial Cyst Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 206010064930 age-related macular degeneration Diseases 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000006933 amyloid-beta aggregation Effects 0.000 description 1
- 230000005821 brain abnormality Effects 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000006999 cognitive decline Effects 0.000 description 1
- 230000003920 cognitive function Effects 0.000 description 1
- 230000004633 cognitive health Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000001969 hypertrophic effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 208000002780 macular degeneration Diseases 0.000 description 1
- 208000029233 macular holes Diseases 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003961 neuronal insult Effects 0.000 description 1
- 230000007171 neuropathology Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 210000002747 omentum Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000008289 pathophysiological mechanism Effects 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000004243 retinal function Effects 0.000 description 1
- 238000011064 split stream procedure Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/40—Detecting, measuring or recording for evaluating the nervous system
- A61B5/4076—Diagnosing or monitoring particular conditions of the nervous system
- A61B5/4088—Diagnosing of monitoring cognitive diseases, e.g. Alzheimer, prion diseases or dementia
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B3/00—Apparatus for testing the eyes; Instruments for examining the eyes
- A61B3/10—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
- A61B3/102—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for optical coherence tomography [OCT]
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B3/00—Apparatus for testing the eyes; Instruments for examining the eyes
- A61B3/10—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
- A61B3/14—Arrangements specially adapted for eye photography
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Surgery (AREA)
- Physics & Mathematics (AREA)
- Neurology (AREA)
- General Health & Medical Sciences (AREA)
- Veterinary Medicine (AREA)
- Animal Behavior & Ethology (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Heart & Thoracic Surgery (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Ophthalmology & Optometry (AREA)
- Psychology (AREA)
- Pathology (AREA)
- Physiology (AREA)
- Neurosurgery (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Developmental Disabilities (AREA)
- Child & Adolescent Psychology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Eye Examination Apparatus (AREA)
Abstract
The invention aims to establish a dual-flow attention neural network based on multi-mode retina images to classify MCI individuals. Our method combines a cross-modal fusion technique, a variable scale dense residual model, and a multi-classifier mechanism in a dual-flow network. The model utilizes a residual error module to extract image characteristics, and captures complex context information by adopting a multi-level characteristic aggregation method. Self-care and cross-care modules are used at each convolution layer to fuse the features of Optical Coherence Tomography (OCT) and fundus modes. The neural network was used to classify MCI patients, alzheimer's patients and control groups with cognitive normality. By fine tuning the pre-training model, we split the community resident participants into two groups according to the cognitive impairment test score. To identify retinal imaging biomarkers associated with accurate predictions, we use gradient weighted class activation mapping techniques. The classification accuracy of the method for the MCI and cognitive disorder positive test scores reaches 84.96% and 80.90% respectively.
Description
Technical Field
The invention belongs to the technical field of biology, and particularly relates to an MCI detection method based on multi-mode retina imaging.
Background
Alzheimer's Disease (AD) is the most common dementia. The pathophysiological mechanisms of Alzheimer's disease remain elusive after decades of research, and there is no cure for advanced Alzheimer's disease. Mild cognitive impairment (mild cognitive impairment, MCI) reflects a transitional phase between normal cognition and dementia, and early discovery and timely intervention are important. Thus, the determination of more readily available AD and MCI biomarkers is particularly important for determining the pathology of MCI, facilitating screening and risk prognosis, and developing new therapeutic approaches.
Methods for diagnosing MCI include a number of aspects in current clinical practice and research. Neuroimaging examinations such as single photon Positron Emission Tomography (PET) and fluorodeoxyglucose PET can detect brain abnormalities in specific brain regions, including hypoperfusion and glucose hypometabolism. Structural Magnetic Resonance Imaging (MRI) and cerebrospinal fluid (CSF) detection may also be used to identify brain changes associated with neuronal damage. In clinical practice, cognitive screening measures such as mini-mental state examination (MMSE) or montreal cognitive assessment (MoCA) are used to assess MCI. However, MMSE lacks sensitivity to detect MCI, while MoCA may be insufficient to determine the specific cognitive domain affected to determine MCI subtype. A more comprehensive approach includes neuropsychological assessment, including a comprehensive assessment of cognitive function and underlying factors affecting performance. Researchers are striving to find less costly, less invasive biomarkers to detect neuropathology of alzheimer's disease early in the preclinical phase. A recent study reported the potential for the use of plasma-based aβ deposition complex biomarkers, showing promise for early diagnosis and treatment. However, these evaluation methods consume a lot of manpower, are costly to detect, and are invasive, especially at an early stage.
Given the value of vision as a cognitive stimulator, it may play an important role in maintaining cognitive health, and previous studies report a correlation between AD and underlying ocular diseases. Since the retina is a compartment of the central nervous system, changes in the retina may provide valuable markers of cognitive decline and brain pathology in early stages of the disease, helping to identify individuals at risk for MCI. The retina can be easily imaged using optical techniques, and imaging-based retinal changes may reflect early brain pathology. Retinal imaging is generally less costly than other diagnostic methods such as PET scanning or CSF analysis, making it a viable option for extensive screening and monitoring in a clinical setting. Thus, retinal imaging can provide inexpensive and non-invasive early disease detection, monitoring markers.
Retinal imaging techniques are widely used for non-invasive diagnosis of ophthalmic diseases. In an ophthalmic clinic, a non-mydriatic fundus camera and a traditional fundus camera can be used for shooting fundus before and after mydriatic on an outpatient basis. The fundus image provides a bird's eye view allowing for the overall retinal feature of the examination. Optical Coherence Tomography (OCT) is a relatively new three-dimensional imaging technique that can be used to assess retinopathy, such as hypertrophic choroidal lineage diseases. OCT images provide a cross-sectional view of the retina, and are sensitive to subtle changes in the thickness of the omentum and macular holes. Thus, multimodal imaging is of great clinical value for the diagnosis of ophthalmic diseases.
AD detection frameworks or roadmaps based on retinal imaging have been proposed. Changes in ganglion intra-cell plexiform layer and retinal nerve fiber layer thickness were observed on OCT images of AD and MCI patients. Fundus photography is also used to identify abnormal retinal vascular parameters in AD patients, highlighting the potential of markers such as vascular reduction, vascular broadening, and complexity and optimization of branching patterns. Fig. 1 shows an example of two modal retinal images from individuals with different degrees of cognitive impairment.
Artificial intelligence (Artificial intelligence, AI) is very widely used in the medical field for target detection, segmentation, prediction of disease stage, protection of patient's private information. In the ophthalmic field, artificial intelligence has been used to screen various forms of retinopathy of prematurity, including age-related macular degeneration, diabetic retinopathy, and glaucoma. Recent studies have shown the potential of Deep Learning (DL) algorithms based on retinal image analysis in detecting AD. The DL algorithm was developed by Zhang et al, using 4 retinal photographs per subject, with higher accuracy, sensitivity and specificity in distinguishing between ad positive and ad negative individuals. wisdom et al propose a multi-modal deep learning system that incorporates data from various eye imaging modes and achieves excellent performance with high subject work feature area under the curve (AUROC) values. Tian et al developed a highly modular DL algorithm with automatic image selection, vessel segmentation and AD classification, achieving accuracy of over 80%. Retinal image analysis based on artificial intelligence shows considerably even better sensitivity, specificity and accuracy in AD detection than commonly used methods such as MMSE or neuroimaging methods, suggesting that deep learning algorithms using retinal images have the potential to improve detection of AD and may be a valuable tool for early screening and diagnosis. However, these methods are mainly directed to AD, and methods for detecting MCI are still limited. For subtle changes in retinal images that may not be apparent to the human eye, AI may be able to detect and intervene earlier by identifying early biomarkers of mci related retinal changes, potentially leading to better outcome and disease management.
Various Convolutional Neural Network (CNN) architectures are used for image-based diagnosis and classification. Li et al introduced a method of predicting glaucoma based on fundus images of CNN. Work has proposed a dense correlation network (DCNet) that can diagnose 8 different diseases by analyzing paired fundus photographs.And the like, to generate two-dimensional plaques from the volumetric OCT scan using CNN for retinal function prediction. However, these classification methods are typically based on only one modality (such as fundus photography or OCT). Few studies have been made to take advantage of data from multiple modalities and effectively combine these different features together.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings in the prior art and provides an MCI detection method based on multi-mode retina imaging.
An MCI prediction method based on multi-modal retinal imaging, comprising the steps of:
s1, constructing a double-flow attention neural network, comprising a modal feature extraction module, a cross-modal fusion unit and a classifier, wherein the classifier comprises a fusion classifier,
the modal feature extraction module is provided with two, including a fundus feature extractor and an OCT feature extractor, pairs of fundus photos and OCT images are input, for each pair of images, the modal feature extraction module extracts a pair of feature images, each pair of feature images is fused by a cross-modal fusion unit and transmitted to the next layer of convolution layer for further feature extraction, the fusion result of each cross-modal fusion unit is transmitted to a fusion classifier,
Classifying the cognitive ability of the test object through multi-modal features of multi-step convolution layer propagation combination;
s2, determining a prediction model according to actual needs, forming an MCI image prediction network by using the double-flow attention neural network and the prediction model, taking fundus photos and OCT images as inputs, training the MCI image prediction network, and extracting a trained double-flow attention neural network from the trained MCI image prediction network;
s3, inputting fundus photos and OCT images of the predicted object into the double-flow attention neural network trained in the step S2, and obtaining a prediction result of the category to which the cognitive ability of the predicted object belongs.
Preferably, in step S1, a cross-modal fusion unit is provided to follow each convolution layer, the cross-modal fusion unit comprising a self-attention module and a cross-attention module,
the self-attention module includes a location self-attention mechanism and a channel self-attention mechanism,
assume that the input feature map of the position self-attention mechanism is I PAM ∈R C×H×W Its shape and output characteristic diagram X PAm ∈R C×H×W Identical, self-locatingAttention weight X PAM The soft maximum value of the three feature matrices is obtained: x is X PAM =PAM(I PAM ) The channel self-attention weight is obtained by soft maxima of three feature matrices: x is X CAM =CAM(I CAM ) Self-attention module through X PAM And X CAM Obtain output X SA ,X SA-fundus And X SA-OCT Wherein ∈α represents the contact operation:
X SA-fundus =X PAM-fundus ⊙X CAM-fundus ,
X SA-OCT =X PAM-oct ⊙X CAM-oct ,
one of X SA-fundus The flow flows as a shortcut branch into the fundus feature extractor, adding the original fundus features, one of which is X, accordingly SA-OCT The flow flows into the OCT feature extractor as a shortcut branch, original OCT features are added, and the other split flow is spliced through channels to form a feature matrix X SA ∈R 2C×H×W WhereinRepresenting an element addition operation on the feature map of two modes: />
Preferably, the input feature map of the cross attention module is denoted as I CA =X SA The cross-attention profile output is I CA ∈R 2C×H×W Cross-modal global descriptor is
Wherein F is gp Represents Global Average Pooling (GAP), g= (G 1 ,...,G k ,...,G 2C ) A cross-modal descriptor representing sensitive statistics for collecting the entire input, a description of the cross-attention (CA) vector is as follows: w (W) CA =σ(F mlp (I)),
Wherein F is mlp For a multi-layer perceptron network, sigma is a sigmoid function, and the value range is (0,1). Finally, weight W CA And feature matrix X SA Multiplication, enhancing feature matrix by cross-modal fusion, whereinRepresenting multiplication: />
For convolution layer l covn Generating the first cross-modal fusion unit output X l To perfect the original output of the first layer of the cross-modal fusion unit:
This is a bi-directional propagation process and the refined features will be passed to the next layer of the cross-modal fusion unit for finer feature mapping.
Preferably, the classifier further comprises an optical coherence tomography classifier and a fundus classifier, and the fusion result of the last cross-mode fusion unit is transmitted to the OCT classifier and the fundus classifier simultaneously.
Preferably, the number of the convolution layers is 4, the filter size of the first convolution layer is 7x7, and the filter size of the remaining convolution layers is 3x3.
Preferably, in step S2, the fundus picture input is preceded by image preprocessing, including the steps of: and extracting the region of interest by using an image mask, and carrying out standardization processing on the image size through self-adaptive histogram equalization limited by the contrast of a histogram equalization algorithm.
Preferably, in step S2, the OCT image is preprocessed before being input, including the following steps: a mask is created for a foreground region of the OCT image by using an Ojin threshold segmentation method, and double-side filtering and inner side filtering are combined to reduce noise by using a hybrid filtering method.
Preferably, the effective information in fundus photographs and OCT images includes arteries and veins distributed along the blood vessels, information related to blood vessels and vascular bifurcation, outer retinal layers, suprachoroidal layers.
Preferably, in step S2, the predictive model includes two data sets, the first data set including a number of MCI patients, CN patients and AD patients, and the second data set including a number of nCI patients and pCI patients, and the transition learning is used to distinguish between nCI patients and pCI patients.
Preferably, in step S2, after the modal feature extraction and cross-modal fusion encoding, the fusion features obtained in each stage are sub-sampled to have the same size, and then connected in series to obtain multi-domain fusion features, and these feature vectors are used to train the MCI image prediction network.
The beneficial effects of the invention are as follows:
the invention aims to establish a dual-flow attention neural network based on multi-mode retina images to classify MCI individuals. Our method combines a cross-modal fusion technique, a variable scale dense residual model, and a multi-classifier mechanism in a dual-flow network. The model utilizes a residual error module to extract image characteristics, and captures complex context information by adopting a multi-level characteristic aggregation method. Self-care and cross-care modules are used at each convolution layer to fuse the features of Optical Coherence Tomography (OCT) and fundus modes. The neural network was used to classify MCI patients, alzheimer's patients and control groups with cognitive normality. By fine tuning the pre-training model, we split the community resident participants into two groups according to the cognitive impairment test score. To identify retinal imaging biomarkers associated with accurate predictions, we use gradient weighted class activation mapping techniques. The classification accuracy of the method for the MCI and cognitive disorder positive test scores reaches 84.96% and 80.90% respectively. The present invention demonstrates the potential of our method in identifying MCI patients and underscores the importance of retinal imaging for early discovery of cognitive impairment.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that it is within the scope of the invention to one skilled in the art to obtain other drawings from these drawings without inventive faculty.
Fig. 1 is an example of bimodal retinal images of patients with varying degrees of cognitive impairment: (a) representative retinal photographs showing the disk, macula, nerve fiber layer, artery and vein of a cognitively normal patient, (b) a Mild Cognitive Impairment (MCI) dementia patient, (c) an Alzheimer's Disease (AD) dementia patient, (d) a representative optical coherence tomography image showing the normal cognitive ability patient, (e) an MCI dementia patient and (f) a choroidal blood vessel of an AD dementia patient;
FIG. 2 is a diagram of a dual-flow attention neural network framework of the present invention;
FIG. 3 is a cross-modality fusion unit of the present invention: (a) a structure of cross-modality fusion units; (b) specific structure of PAM; (c) details of the CAM;
fig. 4 is a comparison of the image of the present invention before and after preprocessing: (a) fundus photographs of original size; (b) extracting the ROI and normalizing the image after the image size; (c) A contrast-limited adaptive histogram equalization image; (d) raw Optical Coherence Tomography (OCT) images; (e) masking of OCT images; (f) denoising and extracting the OCT image after the region of interest.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent.
1. Construction of double-flow attention neural network
The aim of the invention is to develop an end-to-end frame for MCI detection based on OCT images and fundus photographs. As shown in fig. 2, the network is composed of two subnets, using a Modal Feature Extraction (MFE) module and four cross-modal fusion (CMF) units. To be effective, the cross-modal design must identify the advantages of each feature and unify the most informative cross-modal features into an effective representation. To this end, we propose a cross-modal guided fusion approach, inspired by a cross-attention network.
The fundus and OCT images are taken as inputs, modal features are extracted from fundus photographs and OCT images, respectively, by MFE module, which are then fused into multi-modal features by CMF unit. In the MFE subnetwork, the matching information of the two modalities is recalibrated and combined by the CMF unit, and then the combined multi-modality features are propagated through a multi-step convolution (Conv) layer. Subsequently, we run three classifiers to accomplish multi-modal and single-modal classification tasks. The fused feature vectors are fully connected to the Conv layer to predict the probability of each class. The CMF unit also propagates to OCT and fundus classifiers for flexible use of the mesh frame.
Mfe module
The DuCAN has two MFE flows, each of the same structure. Retinal images are very complex, containing a large number of complex details spanning different scales. At the micron level, retinal neuronal layers, such as nerve fiber layers that transmit information from the eye to the brain, can be observed. At the millimeter level, the macular area is very prominent, where fine detail of central vision is handled. Furthermore, blood vessels in retinal images are also an important source of information about eye and physical health. To efficiently identify and analyze the most relevant features in retinal images, we propose MFE models. The model allows for multi-scale characteristics of the retinal image, and can encode features of different scales to better highlight the most discernable regions. As shown in fig. 2, each MFE has four convolution layers, and pairs of fundus images and OCT images are input by using a rectifying linear unit function, and for each pair of images, a pair of model feature extraction modules will extract a pair of feature images, each pair of feature images is fused by a cross-mode fusion (CMF) unit and propagated to the next layer for further feature extraction, the fusion result of each CMF unit will be transmitted to a fusion classifier, and the fusion result of the last CMF unit will be propagated to an optical coherence tomography classifier and a fundus classifier, so as to improve the flexibility of the frame.
Taking into account spatial relationships between different regions in the image and contextual information helps make informed classification decisions. The global context helps to understand the overall structure and context of the retina. To achieve this goal, we set the filter size of the first convolution layer in the architecture to 7x7 so that the network can obtain more spatial information from the input image, thus knowing the overall structure more comprehensively, while the subsequent layers learn more complex detail and fine-grained patterns using 3x3 filters. The subsequent layer blocks have 3, 4, 6, 3 Conv layers, respectively, as shown in fig. 2. The stacked layers of the 3x3 filter introduce a nonlinear transformation for the input data. These non-linear characteristics allow the network to learn complex relationships and patterns that may not be captured by linear operations alone. Early layers capture low-level features, and as information propagates through the network, higher-level features are formed by combining and refining the low-level features. The layered features of each layer will shunt the CMF to enhance the features. The subsequent batch normalization in each block is a regularization that reduces the need for dropout et al techniques and helps prevent overfitting.
As the neural network goes deep, the receiving domain of the early layer becomes limited, possibly missing important global patterns and structures in capturing the input image. To address this limitation, attention feature X is output SA-fundus Or X SA-OCT Is also shunted back to the network to preserve the information of the earlier layers and propagate it directly to deeper layers. After Conv2, conv4 and Conv5, global pooling operation is performed on the convolution feature matrix, and then the data is passed through three Full Connection Layers (FCLs), the neuron number is set to 256, and the classification category number is set to 128.
B.CMF Unit
The retinal structural features in the cross-sectional and facial views provide valuable information for analyzing attention deficit disorder and changes in retinal structure caused by the loss of attention. However, since these features have different characterizations, their complementarity must be exploited to fully understand the changes in the retina. It is therefore necessary to integrate the features of the two modes together, gathering them in a certain place. This isA process takes into account the specific features of each mode to generate a constructional unit (OCT) for each set of paired OCT and fundus i ,fundus i ). To further ensure the accuracy of the feature map, we employ an attention mechanism to control the information flow of the feature map. As shown in fig. 2, we set the CMF unit to follow each Conv layer. The CMF unit consists of two sub-modules: self-attention and cross-attention. The self-attention module captures the relative semantic features of the different modality subspaces with channel and location information. The cross attention module captures various aspects of self attention and cross attention as shown in fig. 3.
1) Self-care part
As shown in fig. 3 (b), the self-attention portion is mainly composed of a position self-attention mechanism (PAM) and a channel self-attention mechanism (CAM). PAM enhances the discrimination capability of the feature representation to detect the most prominent region in the received object. Let PAM input feature map be I PAM ∈R C×H×W Its shape and output characteristic diagram X PAM ∈R C×H×W The same applies. C. H, W represent the channel, width and height, respectively, of the feature map. Position attention weight X PAM Is obtained by soft maxima of three feature matrices:
X PAM =PAM(I PAM ). (1)
the CAM bank focuses on the channel information in a similar fashion. The calculation formula is as follows:
X CAM =CAM(I CAM ). (2)
for special modes, self-attention module passes X PAM And X CAM Obtain output X SA 。X SA-fundus And X SA-OCT Wherein ∈α represents the contact operation:
X SA-fundus =X PAM-fundus ΦX CAM-fundus , (3)
X SA-OCT =X PAM-oct ⊙X CAM-oct . (4)
then one of X SA-fundus The flow flows as a shortcut branch into the fundus feature extractor, adding the original fundus features. Corresponding toGround, one of X SA-OCT The flow flows as a shortcut branch into the OCT feature extractor, adding the original OCT features. Another split stream is spliced through channels to form a feature matrix X SA ∈R 2C×H×W WhereinRepresenting an element addition operation on the feature map of two modes:
2) Cross-attention (CA) component
Let I CA =X SA An input feature map representing CA. The output of the cross-attention profile will be I CA ∈R 2C×H×W . In order to realize cross-modal feature enhancement, a cross-modal global descriptor is firstly acquired, and the specific method is as follows:
wherein F is gp Represents Global Average Pooling (GAP), g= (G 1 ,...,G k ,...,G 2C ) A cross-modality descriptor representing sensitive statistics for collecting the entire input. The description of the cross-attention (CA) vector is as follows:
W CA =σ(F mlp (I)) (7)
wherein F is mlp For a multi-layer perceptron network, sigma is a sigmoid function, and the value range is (0, 1). Finally, weight W CA And feature matrix X SA Multiplication, enhancing feature matrix by cross-modal fusion, whereinRepresenting multiplication:
the proposed method has the advantage that it focuses on learning the features of the salient regions while preserving the whole image information of each mode.
3) Multi-step propagation
For Conv layer l covn Generating the first CMF unit output X l To perfect the original output of MFE layer i:
this is a bi-directional propagation process; the refined features will be passed to the next layer of MFE for finer feature mapping. Since the CMF unit runs through the entire MFE phase, fundas out And OCT out There is a similar pattern to residual learning. The main features not being significantly different from the inputor/>It does not therefore negatively affect the learning process or the pre-trained parameter loading.
C. Image preprocessing
Prior to the training process, we pre-process the fundus photo and OCT image separately using several methods, as shown in fig. 3. Fundus photographs are mostly based on a black background, which contains a lot of redundant information. Thus, an image mask can be used to efficiently extract a region of interest (ROI). The image size is also different due to the different data sources. Therefore, we normalize the image size using classical histogram equalization algorithm Contrast Limited Adaptive Histogram Equalization (CLAHE). Unlike conventional histogram equalization techniques, the CLAHE method is directed to a small region of the image, not the entire image. By processing these small regions separately and combining them with bilinear interpolation, the CLAHE method can avoid creating artifacts in the final image. This approach allows for more precise adjustment of each region of the image, thereby making the final product more balanced and visually attractive. The use of bilinear interpolation also ensures that there are no sharp edges or abrupt transitions between different areas of the image, thereby creating a more seamless overall appearance. The CLAHE method is also widely used for fundus image data processing. Fig. 4 shows the image after extracting the ROI, normalizing the size and implementing the CLAHE method. As shown in fig. 4 (a-c), the enhanced image shows a clearer vessel profile than the original image.
OCT images contain a large background area, but the indicative features appear only in the central layer of the retina. It is important to highlight the indicative features of the OCT image, avoiding displaying extraneous regions. We created a mask for the foreground region using the oxford thresholding method. OCT can also generate a lot of noise during acquisition. Due to the specificity of OCT images, the relationship between OCT adjacent layers should remain unchanged when noise is removed. Therefore, we use a hybrid filtering method to combine the double-sided filtering with the inner-sided filtering for noise reduction.
2. Training model
A. Selecting samples
The data set used in this application example has two independent sources. The first data set included information for 29 MCI patients (58 eyes), 50 age-matched CN patients (100 eyes), and 38 AD patients (76 eyes). Patients vary in age from 47 years to 84 years and are treated in a college plateau hospital at army university of medical science at 12 months 2020 to 4 months 2022. The study was approved by the university of the Committee for ethics in hospitals and was conducted in accordance with guidelines of the university of the Committee for ethics in hospitals in Helsinki (approval No. 2020-14) and the university of medical science in ophthalmic hospitals in Wenzhou (approval No. 2020-012-K-10). AD and MCI patients received clinical evaluations, neuropsychological evaluations, structural neuroimaging examinations and laboratory examinations according to the national institute of aging and the alzheimer's disease institute 2011 guide. Based on neuropsychological assessment, the CN group included individuals who were cognitively healthy and without any neurological disease. The interval between ophthalmic examination and clinical evaluation is 1 to 90 days.
The second dataset included information for 133 nCI patients (different from the CN group in the first dataset) and 140 pCI patients. These patients participated in a cross-sectional study, which was approved by the ethical committee of the ophthalmic hospital at the university of West-bloom (approval No. 2021-086-K-73-01), in compliance with the declaration of Helsinki. Studies use mini-mental state examination (MMSE) and montreal cognitive assessment (MoCA) scales to assess cognitive status.
Each eye takes a fundus picture centered on the orbital. Patients were also imaged using a fully automated nystagmless fundus camera (retecam 3100; syseye; chongqing in China). OCT images were acquired using an SS-OCT system (VG 200SS-OCT, tokyo trapkang, japan). Each set of 3D-OCT images contains 18 images centered at the corneal vertex.
B. Training process
Five sets of data are shown in table 1. During the training process, we have data enhancement on the image for each mode by random cropping and random inversion. The mean and standard deviation were used to normalize fundus and OCT images. Training was performed using a random gradient descent optimizer with an initial decay factor of 0.01 and an initial learning rate of 0.0003. The input data included randomly selected pairs of images, the batch size was set to 8, and then the network was trained 400 times for duration. Our framework was built based on PyTorch (Linux foundation, san Francisco, calif., USA). The verification experiment was performed in the Windows 10 operating system using two CPUs: intel (R) to strong (R) Gold 6253 clcu (dominant frequency 3.10 GHz), hua Shuo TUF-RTX3090TI-O24G-GAMING GPU and 32GB memory. Training time was 14 hours, and to avoid model overfitting, advanced stopping techniques were used. If the model validation index does not improve for 50 durations, the iteration is stopped. During the training process, the mean and variance of each small lot activation is calculated separately for each feature map (channel).
Table 1 five sets of data: AD group, MCI group, CON group, NCI group and PCI group
In table 1, ad=alzheimer's disease, mci=mild cognitive impairment, cn=normal cognition, nci=negative cognition test score, pci=positive cognition test score.
C. Evaluation index
We calculated widely used evaluation metrics including accuracy, precision, recall, specificity, F1 score, and AUC to analyze the effectiveness of our framework. The definition is as follows:
TP, TN, FP and FN represent true positive, true negative, false positive and false negative values, respectively. ACC represents accuracy; pre represents accuracy; rec represents recall; spe stands for specificity. AUC represents the area under the receiver operating characteristic.
D. Classifier and loss function
After modal feature extraction and cross-modal fusion encoding, we sub-sample the fusion features obtained at each stage to the same size, and then concatenate them together to obtain multi-domain fusion features. These feature vectors are then used to train the fully connected soft maximum classifier and make predictions. When the multi-mode features are connected in the multi-stage fusion process, the weighted addition and tensor connection method is compared, and the classification effect of the weighted addition is found to be better. In this model, the weights are implemented in the form of network trainable parameters. To increase the flexibility of the network, we designed two auxiliary classifiers for predicting the probability of MCI in fundus and OCT images, respectively. Auxiliary classifiers are also used to enhance the training effect during training. We train a multi-modality magnetic resonance imaging fusion network using cross entropy loss functions commonly used in classification networks. The calculation formula of the cross entropy loss is as follows:
Wherein y and p represent the true category labels and, respectivelyThe probability is predicted. The DuCAN includes three classifiers for fundus photographs, OCT images, and CMF units. At the same time three corresponding loss functions L are defined OCT 、L fundus And L fusion . Throughout the training process, the final loss function L of DuCAN final Is weighted by three loss functions:
L fimal =αL OCT +βL fundus +L fusion , (13)
wherein alpha and beta reflect L final Is a weight of each modality. Here, we set the values of α and β to 0.7.
2. Test results
We performed the following experiments to evaluate the performance of our method. First, duCAN is trained using retinal images in the first data set (including the MCI, AD, and CN groups). The weights of the training model are then fine-tuned to the same architecture using a second data set, including the data of the nCI set and the pCI set.
A. Classifying MCI, AD and CN patients in the first data set
Table 2 lists the overall classification performance of the proposed method in both dual stream and single stream analysis. In single flow analysis, we used the MFE model to train the fundus and OCT data separately and used the same backbone res net34. In the dual stream analysis we remove the CAF unit and finally fuse OCT and eye underflow by direct contact feature maps before outputting the classifier. We used 5-fold cross-validation, i.e. training with 80% of the data, the remaining 20% of the data were tested, 5 times per fold. We use fundus photographs and OCT images in the first dataset, labeled AD, MCI and CN.
Table 2 proposed method predicts the results of MCI, AD and CN in single and double stream analysis
In table 2, mci=mild cognitive impairment, ad=alzheimer's disease, cn=normal cognition, acc=accuracy, auc=area under the curve, oct=optical coherence tomography, mfe=pattern feature extraction, ducan=dual-flow cross-fusion attention network.
As shown in table 2, the proposed network DuCAN has an accuracy of 91.64% and AUC of 96.78%. In the uniflow analysis, the fundus model has an accuracy of 78.9%, an AUC of 90.44% slightly higher than the OCT model, which has an accuracy of 78.2% and an AUC of 90.26%. In the subgroup analysis, the network performed best for AD classification, accuracy 86.58%, recall 94.84%, specificity 94.56% and F1 score 90.36%. Of the three groups, the MCI group had the greatest increase, indicating that adding fundus photographs to the OCT model could enhance the proposed network. The recall and F1 scores of MCI groups were greatly increased to 88.05% and 85.68%, respectively.
B. Classification of nCI and pCI using transfer learning
The DuCAN model was pre-trained with the first dataset, including MCI, AD and CN patients, followed by classification tasks for the pCI and nCI groups. Subtle structural changes in the retina may have subtle links to MCI, AD and CN. Throughout the fine tuning process, our goal is to retain the knowledge gained in the pre-training phase, especially in the initial layer that captures the general features. At the same time, we also strive to tailor the later layers to the specific task attributes necessary for MCI classification.
To optimize the network during the fine tuning we use a random gradient descent method (SGD) with momentum method as the optimization technique of choice. To determine the optimal hyper-parameter configuration, we performed a grid search, including training models using different learning rates, batch sizes, and duration combinations. In our model, we initialize two MFE subnets with the weights that get the best performance in the predictive task of the first dataset. In this model, the Conv layer weights in the MFE are kept as initial weights. The final FCL for the MCI, AD and CN in the first dataset is replaced with GAP and FCL. Then, with the Conv layer locked, one epoch training is performed on the FCL. The random initialization of the last FCL brings about a gradient that oversteps the weights of the Conv layer. Finally, we thawed all layers and trained the network up to 400 times using an SDG optimizer with a learning rate of 0.001 until the validation loss of 10 consecutive trains is no longer reduced. The loss function is a combination of focus loss and cross entropy.
Table 3 shows the results obtained using our framework in different nCI and pCI groups. The overall accuracy was 82.46% with an AUC of 84.41%. The classification accuracy of the pCI group and the nCI group was 70.9%, the recall rate was 83.74% and the specificity was 79.78%. Compared to OCT models, our multi-modal approach reduced recall from 93.88% to 79.78% in the nCI group, with accuracy from 89.4% to 70.9%. In the "discussion" section of the report, we will explain the trends observed in the results in full and detail.
C. Visual results based on attention features
We use Grad-CAM to determine which regions in the image contribute most to the prediction of our model. Fig. 4 shows representative OCT and fundus images with a significant heat map highlighting these areas. As shown in the line 1 CN group fundus photographs, the network focused mainly on the entire retina, while the Optic Nerve Head (ONH) focused little. Compared to the results of the CN group, the network of the MCI group mainly focuses on arteries and veins distributed along the blood vessels. Whereas the network of the attention deficit disorder group is more concerned about vascular bifurcation. These results indicate that the model uses information related to blood vessels and vessel bifurcation to detect MCI and AD.
The heat map of fig. 4 (d-f) shows that the network of CN groups is mainly concentrated on the whole retinal layer. The MCI group focused mainly on the outer retinal and suprachoroidal layers. While in the attention deficit disorder group, the model is increasingly focusing on the outer retinal layer and rarely on the dural-choroidal interface.
In the visualization process, it is more difficult to distinguish MCI using only OCT images or fundus images. Table II shows that the primary evaluation index for single flow analysis is lower than for dual flow analysis.
D. Ablation study
1) Efficiency of dual flow architecture and CMF device
We first compared the performance of the single and dual stream methods using a baseline DuCAN. And according to the channel and position characteristics of different scales, adopting a CMF unit to learn the special mode characteristics from the cross-mode data. As shown in table 2, the single flow method using the res net34 backbone and MFE modules was less accurate than the dual flow method. The accuracy of the dual flow method using linear fusion rather than CMF was 80.72% with AUC 89.356%. The results for DuCAN are best compared to the single flow and multi-mode methods, with 91.64% accuracy and 96.78% AUC values, respectively.
We then compared the uniflow approach to our proposed approach by distinguishing the nCI and pCI data in the second dataset. The accuracy of the OCT model was 81.14% and the AUC was 79.63%. The fundus model accuracy and AUC were 81.48% and 78.18% respectively, whereas we propose a method with an accuracy of DuCAN of 91.46% and AUC of 94.41%. These results indicate that the dual stream architecture enables the model to extract more MCI related features.
The results of the study show that the CMF unit takes into account the attention features associated with both modes and makes efficient use of the complementary information in these features.
2) Efficiency of loss function weights
We explore the effect of weights α and β in the loss function. Table 4 shows the performance of the different alpha and beta in the loss function. Comparing different weights in the experiment, L fundus α=0.7, l OCT The accuracy of the DuCAN was highest, 84.52%, and AUC was highest, 92.67%, at β=0.7.
E. Comparison with the most advanced model
Table 5 compares the performance of the DuCAN with three most advanced (SOTA) deep learning methods in a first dataset labeled AD, MCI and CN. In a dual flow analysis we remove the CMF unit and finally merge the OCT and fundus flows by direct contact feature maps before outputting the classifier.
Our method DuCAN always goes far beyond the latest SOTA methods, including visual converter (ViT), efficientformerV2 and Swin converter. In the CN group, the accuracy of the EfficientFormer is highest and reaches 92.68%; whereas the recall rate of ViT is highest, reaching 93.1% in CN group. These methods only model the retinal surface view or retinal cross view and aggregate unimodal features between adjacent layers for feature correlation. The results indicate that free contact features in a multi-modal network lead to greater discrimination and thus better detection of different individuals. In contrast, our carefully designed cross-modal fusion mechanism (CMF unit) helps identify modality-independent features, thereby improving the efficiency of discrimination performance.
Table 5 comparison with the most advanced model on the first dataset (%)
In our study, we detected MCI from retinal photographs and OCT images using a dual-stream attention DL framework. In order to solve the problem of limited utilization rate of specific mode characteristics, a multi-stage cross-mode fusion technology is adopted. The network is excellent in classification, and the accuracy reaches 91.64%. By gradually and effectively integrating the information flow of the bimodal image features, our model is superior to the unimodal approach in detecting MCI and AD. In addition, our model has a strong ability to distinguish between individuals who are negative and positive for cognitive impairment test results, which is especially beneficial for community ophthalmic environments and basic physical examination where efficient screening is required. By accurately distinguishing between cognition impaired and cognition impaired people, our model simplifies the screening process, enables healthcare workers to effectively allocate limited resources, and provides targeted intervention and support for people in need thereof.
To our knowledge, this study was the first use of retinal photographs and OCT images to distinguish MCI from normal cognition. Multimodal images are also critical for clinical diagnosis of disease. Retinal images from both modes capture different aspects of the retina, with different sizes, shapes, textures, and contrast adding complexity and ambiguity. The results show that the multi-modal feature flow enables the model to exploit complementary information of different sources to more accurately predict MCI, especially in complex and ambiguous situations. The CMF unit proposed in the research aims to solve the problems of insufficient feature extraction and insufficient multi-mode information fusion in the current OCT and fundus fusion method. We employ a multi-step propagation learning strategy to enhance multi-modal expression and recalibrate the network using four CMF units. This approach highlights the specific features of each modality and selectively integrates the distinguishing features of both modalities for final classification. In addition, the multi-step propagation method combines multi-level progressive learning of multi-domain features. This multi-level approach provides flexibility to accommodate different convolution layers and images. By adjusting the number of levels and the complexity of each level, models can be customized to meet specific data characteristics and learning requirements. The SA component enables the model to focus on the most relevant features in each modality, thereby facilitating more accurate, meaningful multimodal information fusion. The CA component facilitates learning of comprehensive cross-modal characterization, providing more information and more discriminative features for disease decisions.
The attention mechanism emphasizes the discrimination areas in the feature map while suppressing irrelevant information. Thus, grad-CAM may account for prominent regions in the image that produce more accurate results. Grad-CAM can accurately locate the region of the input image that contributes most to model prediction. This localization capability is critical in medical image analysis because determining the exact location of an abnormality or disease-related feature is critical to diagnosis and treatment planning. It is noted that Grad-CAM provides a visual interpretation of individual predictions, but does not take into account the context of the entire image, and thus may not be able to fully capture global context or spatial relationships between different objects or structures in the image, which may be important in some applications such as medical image analysis.
It is noted that the dual stream structure can be converted to a single stream structure to compensate for a possible lack of pattern data. Whereas most screening tests are "symptom-based", diagnosis may be based on uniflow data, especially fundus photographs. We devised a separable model that places two classifiers behind the subnetworks and maintains the parameter ratios of the two subnetworks. We performed an ablation study (see tables 2, 3) demonstrating that relatively good results can be obtained with only one mode. As can be seen from table II, the fundus mode was 78.9% accurate in distinguishing MCI, AD and CN, 90.44% accurate in OCT mode, 78.2% accurate in AUC, 90.26% accurate in OCT mode, and 77.8% accurate in the pc group using only fundus mode, with the highest accuracy (89.4%) using only OCT mode (table III). We believe that the present model can help screen MCI and AD in conjunction with the appropriate infrastructure for community ophthalmic care.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (10)
1. An MCI prediction method based on multi-modal retinal imaging, comprising the steps of:
s1, constructing a double-flow attention neural network, comprising a modal feature extraction module, a cross-modal fusion unit and a classifier, wherein the classifier comprises a fusion classifier,
the modal feature extraction module is provided with two, including a fundus feature extractor and an OCT feature extractor, pairs of fundus photos and OCT images are input, for each pair of images, the modal feature extraction module extracts a pair of feature images, each pair of feature images is fused by a cross-modal fusion unit and transmitted to the next layer of convolution layer for further feature extraction, the fusion result of each cross-modal fusion unit is transmitted to a fusion classifier,
classifying the cognitive ability of the test object through multi-modal features of multi-step convolution layer propagation combination;
s2, determining a prediction model according to actual needs, forming an MCI image prediction network by using the double-flow attention neural network and the prediction model, taking fundus photos and OCT images as inputs, training the MCI image prediction network, and extracting a trained double-flow attention neural network from the trained MCI image prediction network;
S3, inputting fundus photos and OCT images of the predicted object into the double-flow attention neural network trained in the step S2, and obtaining a prediction result of the category to which the cognitive ability of the predicted object belongs.
2. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: in step S1, a cross-modality fusion unit is provided to follow each convolution layer, the cross-modality fusion unit including a self-attention module and a cross-attention module,
the self-attention module includes a location self-attention mechanism and a channel self-attention mechanism,
assume that the input feature map of the position self-attention mechanism is I PAM ∈R C×H×W Its shape and output characteristic diagram X PAM ∈R C×H×W Same, position self-attention weight X PAM The soft maximum value of the three feature matrices is obtained:
X PAM =PAM(I PAM ) The channel self-attention weight is obtained by soft maxima of three feature matrices:
X CAM =CAM(I CAM ) Self-attention module through X PAM And X CAM Obtain output X SA ,
X SA-fundus And X SA-OCT Wherein ∈α represents the contact operation:
X SA-fundus =X PAM-fundus ☉X CAM-fundus ,
X SA-OCT =X PAM-oct ⊙X CAM-oct ,
one of X SA-fundus The flow flows as a shortcut branch into the fundus feature extractor, adding the original fundus features, one of which is X, accordingly SA-OCT The flow flows into the OCT feature extractor as a shortcut branch, original OCT features are added, and the other split flow is spliced through channels to form a feature matrix X SA ∈R 2C×H×W Wherein ∈ indicates that the element addition operation is performed on the feature maps of the two modes:
3. the MCI prediction method based on multi-modal retinal imaging according to claim 2, wherein: the input feature map of the cross attention module is denoted as I CA =X SA The cross-attention profile output is I CA ∈R 2C×H×W Cross-modal global descriptor is
Wherein F is gp Represents Global Average Pooling (GAP), g= (G 1 ,...,G k ,...,G 2C ) A cross-modal descriptor representing sensitive statistics for collecting the entire input, a description of the cross-attention (CA) vector is as follows: w (W) CA =σ(F mlp (I)),
Wherein F is mlp For a multi-layer perceptron network, sigma is a sigmoid function, and the value range is (0, 1). Finally, weight W CA And feature matrix X SA Multiplication, enhancing feature matrix by cross-modal fusion, whereinRepresenting multiplication: />
For convolution layer l covn Generating the first cross-modal fusion unit output X l To perfect the original output of the first layer of the cross-modal fusion unit:
this is a bi-directional propagation process and the refined features will be passed to the next layer of the cross-modal fusion unit for finer feature mapping.
4. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: the classifier also comprises an optical coherence tomography classifier and a fundus classifier, and the fusion result of the last cross-mode fusion unit is transmitted to the OCT classifier and the fundus classifier simultaneously.
5. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: the number of the convolution layers is 4, the filter size of the first convolution layer is 7x7, and the filter size of the remaining convolution layers is 3x3.
6. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: in step S2, the fundus picture input is preceded by image preprocessing, including the steps of: and extracting the region of interest by using an image mask, and carrying out standardization processing on the image size through self-adaptive histogram equalization limited by the contrast of a histogram equalization algorithm.
7. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: in step S2, the OCT image is preprocessed before being input, including the following steps: a mask is created for a foreground region of the OCT image by using an Ojin threshold segmentation method, and double-side filtering and inner side filtering are combined to reduce noise by using a hybrid filtering method.
8. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: effective information in fundus photographs and OCT images includes arteries and veins distributed along blood vessels, information related to blood vessels and vascular bifurcation, outer retinal layers, suprachoroidal layers.
9. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: in step S2, the predictive model includes two data sets, the first data set including a number of MCI patients, CN patients, and AD patients, and the second data set including a number of nCI patients and pCI patients, with transition learning used to distinguish between nCI patients and pCI patients.
10. The MCI prediction method based on multi-modal retinal imaging according to claim 1, wherein: in step S2, after the modal feature extraction and cross-modal fusion encoding, the fused features obtained in each stage are sub-sampled to have the same size, and then are connected in series to obtain multi-domain fused features, and the MCI image prediction network is trained using these feature vectors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311151559.XA CN117426748B (en) | 2023-09-07 | 2023-09-07 | MCI detection method based on multi-mode retina imaging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311151559.XA CN117426748B (en) | 2023-09-07 | 2023-09-07 | MCI detection method based on multi-mode retina imaging |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117426748A true CN117426748A (en) | 2024-01-23 |
CN117426748B CN117426748B (en) | 2024-09-24 |
Family
ID=89554305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311151559.XA Active CN117426748B (en) | 2023-09-07 | 2023-09-07 | MCI detection method based on multi-mode retina imaging |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117426748B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117854139A (en) * | 2024-03-07 | 2024-04-09 | 中国人民解放军总医院第三医学中心 | Open angle glaucoma recognition method, medium and system based on sparse selection |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210319420A1 (en) * | 2020-04-12 | 2021-10-14 | Shenzhen Malong Technologies Co., Ltd. | Retail system and methods with visual object tracking |
US20220207729A1 (en) * | 2019-04-18 | 2022-06-30 | Shelley Boyd | Detection, prediction, and classification for ocular disease |
CN115590481A (en) * | 2022-12-15 | 2023-01-13 | 北京鹰瞳科技发展股份有限公司(Cn) | Apparatus and computer-readable storage medium for predicting cognitive impairment |
CN115908789A (en) * | 2022-12-09 | 2023-04-04 | 大连民族大学 | Cross-modal feature fusion and asymptotic decoding saliency target detection method and device |
US20230162359A1 (en) * | 2020-06-29 | 2023-05-25 | Medi Whale Inc. | Diagnostic assistance method and device |
US20230245772A1 (en) * | 2020-05-29 | 2023-08-03 | University Of Florida Research Foundation | A Machine Learning System and Method for Predicting Alzheimer's Disease Based on Retinal Fundus Images |
-
2023
- 2023-09-07 CN CN202311151559.XA patent/CN117426748B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220207729A1 (en) * | 2019-04-18 | 2022-06-30 | Shelley Boyd | Detection, prediction, and classification for ocular disease |
US20210319420A1 (en) * | 2020-04-12 | 2021-10-14 | Shenzhen Malong Technologies Co., Ltd. | Retail system and methods with visual object tracking |
US20230245772A1 (en) * | 2020-05-29 | 2023-08-03 | University Of Florida Research Foundation | A Machine Learning System and Method for Predicting Alzheimer's Disease Based on Retinal Fundus Images |
US20230162359A1 (en) * | 2020-06-29 | 2023-05-25 | Medi Whale Inc. | Diagnostic assistance method and device |
CN115908789A (en) * | 2022-12-09 | 2023-04-04 | 大连民族大学 | Cross-modal feature fusion and asymptotic decoding saliency target detection method and device |
CN115590481A (en) * | 2022-12-15 | 2023-01-13 | 北京鹰瞳科技发展股份有限公司(Cn) | Apparatus and computer-readable storage medium for predicting cognitive impairment |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117854139A (en) * | 2024-03-07 | 2024-04-09 | 中国人民解放军总医院第三医学中心 | Open angle glaucoma recognition method, medium and system based on sparse selection |
CN117854139B (en) * | 2024-03-07 | 2024-05-28 | 中国人民解放军总医院第三医学中心 | Open angle glaucoma recognition method, medium and system based on sparse selection |
Also Published As
Publication number | Publication date |
---|---|
CN117426748B (en) | 2024-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ali et al. | A hybrid convolutional neural network model for automatic diabetic retinopathy classification from fundus images | |
Nasir et al. | Deep DR: detection of diabetic retinopathy using a convolutional neural network | |
CN117426748B (en) | MCI detection method based on multi-mode retina imaging | |
Rahhal et al. | Detection and classification of diabetic retinopathy using artificial intelligence algorithms | |
Aurangzeb et al. | An efficient and light weight deep learning model for accurate retinal vessels segmentation | |
KR102288727B1 (en) | Apparatus and methods for classifying neurodegenerative diseases image of amyloid-positive based on deep-learning | |
Bali et al. | Analysis of deep learning techniques for prediction of eye diseases: A systematic review | |
WO2020219968A1 (en) | Detecting avascular and signal reduction areas in retinas using neural networks | |
Phridviraj et al. | A bi-directional Long Short-Term Memory-based Diabetic Retinopathy detection model using retinal fundus images | |
Gao et al. | Using a dual-stream attention neural network to characterize mild cognitive impairment based on retinal images | |
Haider et al. | Modified Anam-Net Based Lightweight Deep Learning Model for Retinal Vessel Segmentation. | |
Jabbar et al. | A Lesion-Based Diabetic Retinopathy Detection Through Hybrid Deep Learning Model | |
Nage et al. | A survey on automatic diabetic retinopathy screening | |
Alam et al. | Benchmarking deep learning frameworks for automated diagnosis of OCULAR TOXOPLASMOSIS: A comprehensive approach to classification and segmentation | |
Jiwani et al. | Application of transfer learning approach for diabetic retinopathy classification | |
Sesikala et al. | A Study on Diabetic Retinopathy Detection, Segmentation and Classification using Deep and Machine Learning Techniques | |
Latha et al. | Automated macular disease detection using retinal optical coherence tomography images by fusion of deep learning networks | |
Taş et al. | Detection of retinal diseases from ophthalmological images based on convolutional neural network architecture. | |
Selvathi et al. | Deep convolutional neural network-based diabetic eye disease detection and classification using thermal images | |
Daghistani | Using Artificial Intelligence for Analyzing Retinal Images (OCT) in People with Diabetes: Detecting Diabetic Macular Edema Using Deep Learning Approach | |
Muhammed et al. | Diabetic retinopathy diagnosis based on convolutional neural network | |
Khalaf et al. | Identification and Classification of Retinal Diseases by Using Deep Learning Models | |
Narayanan | Deep learning of fundus images and optical coherence tomography images for ocular disease detection–a review | |
Khandolkar et al. | Survey on Techniques for Diabetic Retinopathy Detection & Classification | |
Tiwari et al. | A Comprehensive Literature Review of Skin Lesions Classification Techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |