EP4107290A1 - Méthodes et systèmes pour prédire des taux d'évolution de la dégénérescence maculaire liée à l'âge - Google Patents

Méthodes et systèmes pour prédire des taux d'évolution de la dégénérescence maculaire liée à l'âge

Info

Publication number
EP4107290A1
EP4107290A1 EP21711144.2A EP21711144A EP4107290A1 EP 4107290 A1 EP4107290 A1 EP 4107290A1 EP 21711144 A EP21711144 A EP 21711144A EP 4107290 A1 EP4107290 A1 EP 4107290A1
Authority
EP
European Patent Office
Prior art keywords
cfp
image
amd
images
rpd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21711144.2A
Other languages
German (de)
English (en)
Inventor
Emily YING CHEW
Zhiyong Lu
Tiarnan DANIEL LENAGHAN KEENAN
Wai T. Wong
Yifan Peng
Qingyu CHEN
Elvira AGRÓN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
US Department of Health and Human Services
Original Assignee
US Department of Health and Human Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by US Department of Health and Human Services filed Critical US Department of Health and Human Services
Publication of EP4107290A1 publication Critical patent/EP4107290A1/fr
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic

Definitions

  • the present disclosure relates to systems and methods for predicting rates of progression of age-related macular degeneration (AMD). More specifically, the disclosure relates to predicting rates of progression of AMD and determining the presence of reticular pseudodrusen (RPD) using color fundus photography.
  • AMD age-related macular degeneration
  • RPD reticular pseudodrusen
  • age-related macular degeneration AMD
  • AMD age-related macular degeneration
  • This disclosure provides a method of predicting risk of late age-related macular degeneration (AMD).
  • the method may include receiving one or more color fundus photograph (CFP) images from both eyes of a patient and classifying each CFP image.
  • Classifying each CFP image may include extracting one or more deep features in each CFP image, or grading drusen and pigmentary abnormalities.
  • the method may further include predicting the risk of late AMD by estimating a time to late AMD using a Cox proportional hazard model using the one or more deep features or the graded drusen and pigmentary abnormalities.
  • the disclosure further provides a method of predicting risk of late AMD.
  • the method may include receiving one or more images from both eyes of a patient and classifying each image. Classifying each image may include detecting the presence of RPD in each image.
  • the method may further include predicting the risk of late AMD by estimating a time to late AMD using the presence of RPD in each image.
  • a device having at least one non-transitory computer readable medium storing instructions which when executed by at least one processor, cause the at least one processor to: receive one or more color fundus photograph (CFP) images from both eyes of a patient, classify each CFP image, and predict the risk of late AMD.
  • classifying each CFP image may include extracting one or more deep features in each CFP image; grade the drusen and pigmentary abnormalities; and/or detecting the presence of RPD in each CFP image.
  • Predicting the risk of late AMD may include estimating a time to late AMD using a Cox proportional hazard model using the presence of RPD, the one or more deep features, and/or the graded drusen and pigmentary abnormalities.
  • FIG. 1 is a flowchart of the two-step architecture of the method of predicting risk of late AMD.
  • FIG. 2 illustrates the creation of the study data sets. To avoid ‘cross- contamination’ between the training and test datasets, no participant was in more than one group.
  • FIG. 3 shows prediction error curves of the survival models in predicting risk of progression to late age-related macular degeneration on the combined AREDS/AREDS2 test sets (601 participants), using the Brier score (95% confidence interval).
  • FIG. 4 illustrates example system embodiments.
  • FIG. 5 illustrates an example machine learning environment.
  • FIG. 6A shows a screenshot of an example software for implementing the late AMD risk prediction method.
  • FIG. 6B shows four selected color fundus photographs with highlighted areas used by the deep learning classification network (DeepSeeNet). Saliency maps were used to represent the visually dominant location (drusen or pigmentary changes) in the image by back-projecting the last layer of neural network.
  • FIG. 7 is an overview of training a deep learning method for detecting
  • FIG. 8A shows receiver operating characteristic curves of five different deep learning convolutional neural networks for the detection of reticular pseudodrusen from fundus autofluorescence images, using the full test set.
  • FIG. 8B shows receiver operating characteristic curves of five different deep learning convolutional neural networks for the detection of reticular pseudodrusen from the corresponding color fundus photographs, using the full test set.
  • FIG. 9A shows receiver operating characteristic curves for the detection of reticular pseudodrusen by the convolutional neural network DenseNet from fundus autofluorescence images.
  • the performance of the four ophthalmologists on the same test sets is shown by four single points. In all cases, the ground truth is the reading center grading of the fundus autofluorescence images.
  • FIG. 9B shows receiver operating characteristic curves for the detection of reticular pseudodrusen by the convolutional neural network DenseNet from the corresponding color fundus photographs.
  • the performance of the four ophthalmologists on the same test sets is shown by four single points and the performance of the reading center grading of the color fundus photographs is also shown as a single point. In all cases, the ground truth is the reading center grading of the fundus autofluorescence images.
  • FIG. 10A, FIG. 10B, and FIG. 10C show deep learning attention maps overlaid on fundus autofluorescence (FAF) images and color fundus photographs (CFP).
  • FAF fundus autofluorescence
  • CFP color fundus photographs
  • the FAF image left column
  • CFP right column
  • the heatmap scale for the attention maps is also shown: signal range from -1.00 (purple) to +1.00 (brown).
  • FIG. 11 shows an example framework for multi-modal, multi-task, multi- attention (M3) deep learning convolutional neural network (CNN) for the detection of reticular pseudodrusen from color fundus photographs (CFP) alone, their corresponding fundus autofluorescence (FAF) images alone, or the CFP-FAF image pairs.
  • M3 multi-modal, multi-task, multi- attention
  • CNN deep learning convolutional neural network
  • FIG. 12 shows box plots showing the F1 score results of the multi-modal, multi-task, multi-attention (M3) and standard (non-M3) deep learning convolutional neural networks for the detection of reticular pseudodrusen from color fundus photographs (CFP) alone, their corresponding fundus autofluorescence (FAF) images alone, or the CFP-FAF image pairs, using the full test set.
  • the horizontal line represents the median F1 score and the boxes represent the first and third quartiles.
  • the whiskers represent quartile 1 - (1.5 x interquartile range) and quartile 3 + (1.5 x interquartile range).
  • the dots represent the individual F1 scores for each model. **** : P ⁇ 0.0001;
  • FIG. 13 is a differential performance analysis: distribution of test set images correctly classified by both models, neither model, the multi-modal, multi-task, multi-attention (M3) model only, or the non-M3 model only, for the detection of reticular pseudodrusen from color fundus photographs (CFP) alone, their corresponding fundus autofluorescence (FAF) images alone, or the CFP-FAF image pairs, using the full test set.
  • CFP color fundus photographs
  • FAF fundus autofluorescence
  • FIG. 14A shows deep learning attention maps overlaid on representative image examples for color fundus photographs (CFP) alone;
  • FIG. 14B shows deep learning attention maps overlaid on representative image examples for fundus autofluorescence (FAF) images alone;
  • FIG. 14C shows deep learning attention maps overlaid on representative image examples for the CFP-FAF image pairs for the detection of reticular pseudodrusen (RPD) by the multi-modal, multi-task, multi-attention (M3) model or the non-M3 model: representative examples where the non-M3 model missed RPD presence but the M3 model correctly detected it.
  • RPD reticular pseudodrusen
  • M3 multi-modal, multi-task, multi-attention
  • the non-M3 model representative examples where the non-M3 model missed RPD presence but the M3 model correctly detected it.
  • the attention maps demonstrate quantitatively the relative contributions made by each pixel to the detection decision.
  • the heatmap scale for the attention maps is also shown: signal range from -1 .00 (purple) to +1 .00 (brown).
  • RPD are observed on the FAF images as ribbon-like patterns of round and oval hypoautofluorescent lesions with intervening areas of normal and increased autofluorescence. Areas of RPD clearly apparent to human experts are shown (black arrows), as well as areas of RPD possibly apparent to human experts (dotted black arrows).
  • FIG. 15A and FIG. 15B show receiver operating characteristic (ROC) curves of the multi-modal, multi-task, multi-attention (M3) and standard (non-M3) deep learning convolutional neural networks for the detection of reticular pseudodrusen from color fundus photographs (CFP) alone (FIG. 15A) or their corresponding fundus autofluorescence (FAF) images alone (FIG. 15B), using a random subset of the test set.
  • the mean ROC curve is shown (dotted line), together with its standard deviation (shaded area).
  • the performance of the 13 ophthalmologists on the same test sets is shown by 13 single points.
  • PGD projected gradient descent
  • FIG. 17 shows the framework’s flowchart for adversarial training and testing adversarial attack examples. Inter-denoising and intra-denoising layers follow the ReLU layer.
  • references to “one embodiment”, “an embodiment”, or “an aspect” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure.
  • the appearances of the phrase “in one embodiment” or “in one aspect” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
  • various features are described which may be exhibited by some embodiments and not by others.
  • Late AMD the stage associated with severe visual loss, occurs in two forms, geographic atrophy (GA) and neovascular AMD (NV).
  • GA geographic atrophy
  • NV neovascular AMD
  • Making accurate time-based predictions of progression to late AMD is clinically critical.
  • Making predictions of late stage AMD will enable improved decision-making regarding: (i) medical treatments, especially oral supplements known to decrease progression risk, (ii) lifestyle interventions, particularly smoking cessation and dietary changes, and (iii) intensity of patient monitoring, e.g., frequent reimaging in clinic and/or tailored home monitoring programs.
  • RPD also known as subretinal drusenoid deposits
  • RPD subretinal drusenoid deposits
  • RPD are thought to represent aggregations of material in the subretinal space between the RPE and photoreceptors. Compositional differences have also been found between soft drusen and RPD.
  • the detection of eyes with RPD is important for multiple reasons. Not only is their presence associated with increased risk of late AMD, but the increased risk is weighted towards particular forms of late AMD, including the recently recognized phenotype of geographic atrophy (GA) or also known as outer retinal atrophy (ORA).
  • GA geographic atrophy
  • ORA outer retinal atrophy
  • AREDS2 Age-Related Eye Disease Study 2
  • the risk of progression to GA was significantly higher with RPD presence, while the risk of neovascular AMD was not.
  • RPD presence may be a powerfully discriminating feature that could be very useful in risk prediction algorithms for the detailed prognosis of AMD progression.
  • the presence of RPD has also been associated with increased speed of GA enlargement, which is a key endpoint in ongoing clinical trials.
  • the presence of RPD appears to be a critical determinant of the efficacy of subthreshold nanosecond laser to slow progression to late AMD.
  • CFP is the most widespread and accessible retinal imaging modality used worldwide; it is the most highly validated imaging modality for AMD classification and prediction of progression to late disease.
  • Simplified Severity Scale is a points-based system whereby an examining physician scores the presence of two AMD features (macular drusen and pigmentary abnormalities) in both eyes of an individual. From the total score of 0-4, a five-year risk of late AMD is then estimated.
  • the other standard is an online risk calculator.
  • the online risk calculator predicts the risk of progression to late AMD, GA, and NV at 1-10 years.
  • Deep learning is a branch of machine learning that allows computers to learn by example; in the case of image analysis, it involves training algorithmic models on images with accompanying labels such that they can perform classification of novel images according to the same labels.
  • the models are typically neural networks that are constructed of an input layer (which receives the image), followed by multiple layers of non-linear transformations, to produce a classifier output (e.g., drusen and pigmentary abnormalities, RPD present or absent, etc.).
  • the method of predicting the rate of progression of AMD may include a deep learning (DL) architecture to predict progression of AMD with improved accuracy and transparency in two steps: image classification followed by survival analysis, as seen in FIG. 1.
  • DL deep learning
  • the prediction method performs progression predictions directly from CFP over a wide time interval (1-12 years).
  • training and testing may be based on the ground truth of reading center-graded progression events at the level of individuals. Both training and testing may utilize an expanded dataset with many more progression events.
  • the prediction method may predict the risk not only of late AMD, but also of GA and NV separately. This is important since treatment approaches for the two subtypes of late AMD are very different: NV needs to be diagnosed extremely promptly, since delay in access to intravitreal anti-VEGF injections is usually associated with very poor visual outcomes, while various therapeutic options to slow GA enlargement are under investigation.
  • the two-step approach has important advantages.
  • the final predictions are more explainable and biologically plausible, and error analysis is possible.
  • end-to-end ‘black-box’ DL approaches are less transparent and may be more susceptible to failure.
  • the prediction method delivers autonomous predictions of a higher accuracy than those from retinal specialists using two existing clinical standards. Hence, the predictions are closer to the ground truth of actual time-based progression to late AMD than when retinal specialists are grading the same bilateral CFP and entering these grades into the SSS or the online calculator.
  • deep feature extraction may generally achieve higher accuracy than DL grading of traditional hand-crafted features.
  • the DL prediction methods herein enable ascertainment of risk above 50%. This may be helpful in justifying medical and lifestyle interventions, vigilant home monitoring, and frequent reimaging, and in planning shorter but highly powered clinical trials.
  • the AREDS-style oral supplements decrease the risk of developing late AMD by approximately 25%, but only in individuals at higher risk of disease progression.
  • subthreshold nanosecond laser treatment is approved to slow progression to late AMD, accurate risk predictions may be very helpful for identifying eyes that may benefit most.
  • the prediction method may include receiving one or more CFP images from one or more eyes of a patient. In some examples, one or more CFP images from both eyes of the patient are received.
  • a deep neural network such as a deep convolutional neural network (CNN) may be adapted to classify each CFP image by either: (i) extracting multiple highly discriminative deep features, and/or (ii) estimating grades for drusen and pigmentary abnormalities.
  • the deep features may include drusen and/or pigmentary abnormalities.
  • the drusen may be soft, large drusen.
  • the classification of each CFP image may be performed using one or more adaptations of ‘DeepSeeNet’.
  • DeepSeeNet is a CNN framework that was created for AMD severity classification. It has achieved state-of-the- art performance for the automated diagnosis and classification of AMD severity from CFP; this includes the grading of macular drusen, pigmentary abnormalities, the SSS, and the AREDS 9-step severity scale. For example, based on the received images, the following information may be automatically generated separately for each eye: drusen size status, pigmentary abnormality presence/absence, late AMD presence/absence, and the Simplified Severity Scale score.
  • the CNN may include embedded denoising operators for improved robustness.
  • the denoising operators in the CNN may defend against unnoticeable adversarial attacks.
  • the method at step 106 may include extracting one or more ‘deep features’.
  • this step may involve using DL to derive and weight predictive image features, including high-dimensional ‘hidden’ features.
  • Deep features may be extracted from the second to last fully-connected layer of DeepSeeNet (the highlighted portions in the classification network in FIG. 1).
  • 512 deep features may be extracted for each patient in this way, comprising 128 deep features for each model, drusen and pigmentary abnormalities, in each of the two images (left and right eyes). After feature extraction, all 512 deep features may be normalized as standard- scores.
  • the method may further include feature selection to avoid overfitting and to improve the generalizability, because of the multi-dimensional nature of the features.
  • performed selection may be performed to group correlated features and one feature may be picked for each group.
  • Features with non- zero coefficients may be selected and applied as input to the survival models described below.
  • the method at step 106 may optionally include a second adaptation of DeepSeeNet (‘DL grading’).
  • the method may include grading of drusen and pigmentary abnormalities, the two macular features considered by humans most able to predict progression to late AMD.
  • the two predicted risk factors may be used directly.
  • One CNN may be provided, where the CNN has been previously trained and validated to estimate drusen status in a single CFP, according to three levels (none/small, medium, or large), using reading center grades as the ground truth.
  • a second CNN may be provided, where the second CNN has been previously trained and validated to predict the presence or absence of pigmentary abnormalities in a single CFP.
  • the probability of progression to late AMD may be automatically calculated, along with separate probabilities of geographic atrophy and neovascular AMD.
  • the method may further include generating a survival model by using a Cox proportional hazards model to predict probability of progression to late AMD (and GA/NV, separately), based on the deep features (‘deep features/survival’) or the DL grading (‘DL grading/survival’).
  • the method may further include optional step 110 in which additional participant information may be added to the survival model, such as demographic and (if available) genotype information, along with a time point for prediction.
  • additional participant information may be added to the survival model, such as demographic and (if available) genotype information, along with a time point for prediction.
  • the patient’s age, smoking status, and/or genetics may be received.
  • the probability of progression to late AMD may estimate time to late AMD.
  • the Cox model may be used to evaluate simultaneously the effect of several factors on the probability of the event, i.e., participant progression to late AMD in either eye.
  • Separate Cox proportional hazards models may analyze time to late AMD and time to subtype of late AMD (i.e., GA and NV).
  • the survival models may receive three additional inputs at step 110: (i) participant age; (ii) smoking status (current/former/never), and (iii) participant AMD genotype ( CFH rs1061170, ARMS2 rs10490924, and the AMD GRS).
  • the one or more images may be CFP images, FAF images, ad/or near infrared images.
  • the method of detecting RPD in CFP images may be performed in conjunction with the method of predicting the risk of late AMD in a patient.
  • RPD may be detected in one or more CFP images of a patient and may be included as a deep feature for input into the survival model.
  • FIG. 7 is an overview of training a deep learning method for detecting RPD.
  • reading center experts may grade the presence or absence of reticular pseudodrusen (RPD) on fundus autofluorescence (FAF) images.
  • each FAF image may be assigned a label (e.g. RPD present or absent). These labels may be transferred to the CFP images at step 206.
  • the FAF and CFP images may be each split into training, development, and test sets.
  • one or more, such as ten deep learning models may be trained, about half for the FAF detection task and about half for the CFP detection task.
  • Each model was evaluated on the hold-out test sets (steps 210 and 212).
  • One or more (e.g. four) ophthalmologists also may grade a subset of the test sets. Their grades may be compared with those of the deep learning models, using the reading center grades as the ground truth.
  • RPD may be detected in FAF images, regardless of image quality (e.g. high quality or low quality images).
  • the method may detect RPD with higher accuracy than physicians or other trained specialists.
  • the AUC may be high at about 0.94, driven principally by high specificity of about 0.97 (and lower sensitivity of about 0.70).
  • the method of detecting RPD may include using label transfer from graded corresponding FAF images to serve as a standard for CFP images. In some examples, the method may identify RPD presence from CFP images with an AUC of about 0.83, including a relatively high specificity of about 0.90.
  • Using automated methods to ascertain RPD presence also has advantages over ascertainment by human graders/clinicians.
  • Deep learning models can perform grading of large numbers of images very quickly, with complete consistency, and provide a confidence value associated with each binary prediction. Attention maps can be generated for each graded image to verify that the models are behaving with face validity.
  • human grading of RPD presence particularly from CFP, is difficult to perform accurately, even by trained graders at the reading center level. This form of grading is operationally time-consuming and associated with lower rates of inter- grader agreement, relative to other AMD-associated features.
  • Human grading of RPD presence from FAF images has higher accuracy than from CFP imaging, but additional FAF imaging in clinical care involves added time and expense, and grading expertise for FAF images is currently limited to a small number of specialist centers in developed countries.
  • deep learning models may be capable of ascertaining RPD presence from CFP images. This capability may unlock an additional dimension in CFP image datasets from established historical studies of AMD that are unaccompanied by corresponding images captured on other modalities and impractical to replicate with multimodal imaging. These include the AREDS, Beaver Dam Eye Study, Blue Mountains Eye study, and other population-based and AMD-based longitudinal datasets, which have provided a wealth of information on AMD epidemiology and genetics.
  • FIG. 11 shows deep learning framework for an example, multi-modal, multi-task nature of training, and multi-attention mechanism (M3).
  • the framework consists of three deep learning models: the CFP model, the FAF model, and the CFP- FAF model.
  • the CFP model takes CFP images as its input and predicts RPD presence/absence as its output; the same idea applies to the FAF and CFP-FAF models.
  • For the CFP model and FAF model each has a CNN to extract features from the Input image, followed by an attention module to analyze the features that contribute most to decision-making, followed by fully-connected layers, and an output layer, which makes the prediction.
  • the GFP-FAF model has the same structure except that, instead of having its own CNN backbone, it receives the image features from both the CFP and the FAF models.
  • Multi-task training may be used to train the deep learning models. As shown in FIG. 11 , this may include (i) multi-task learning and (ii) cascading task fine- tuning.
  • multi-task learning the models may be trained jointly, with each model considered as a parallel task, using a shared representation.
  • cascading task fine- tuning each model then undergoes additional training separately.
  • the aim of multi-task learning is to learn generalizable and shared representations for all the image scenarios, and the aim of cascading task fine-tuning is to perform additional training suitable for each separate image scenario.
  • Multi-task training has important advantages over traditional single-task learning, where each model is trained separately.
  • Single-task training has the disadvantage that the performance of each model is limited by the features present on that particular image modality. Models trained in this way may also be more susceptible to overfilling.
  • multi-task training exploits the similarities (shared image features) and differences (task-specific image features) between the features present on the different image modalities. In this way, it usually has improved learning efficiency and accuracy.
  • what is learned for each image modality task can assist during training for the other image modality tasks. In this way, it benefits each model by sharing features that are generalizable between the image modalities. This may be particularly relevant for retinal lesions like RPD, where different imaging modalities (CFP and FAF) highlight very different features relating to the same underlying anatomy.
  • FIG. 4 shows an example of computing system 400 in which the components of the system are in communication with each other using connection 405.
  • Connection 405 can be a physical connection via a bus, or a direct connection into processor 410, such as in a chipset or system-on-chip architecture.
  • Connection 405 can also be a virtual connection, networked connection, or logical connection.
  • computing system 400 is a distributed system in which the functions described in this disclosure can be distributed within a datacenter, multiple datacenters, a peer network, throughout layers of a fog network, etc.
  • one or more of the described system components represents many such components each performing some or all of the function for which the component is described.
  • the components can be physical or virtual devices.
  • Example system 400 includes at least one processing unit (CPU or processor) 410 and connection 405 that couples various system components including system memory 415, read only memory (ROM) 420 or random access memory (RAM) 425 to processor 410.
  • Computing system 400 can include a cache of high-speed memory 412 connected directly with, in close proximity to, or integrated as part of processor 410.
  • Processor 410 can include any general purpose processor and a hardware service or software service, such as services 432, 434, and 436 stored in storage device 430, configured to control processor 410 as well as a special-purpose processor where software instructions are incorporated into the actual processor design.
  • Processor 410 may essentially be a completely self-contained computing system, containing multiple cores or processors, a bus, memory controller, cache, etc.
  • a multi-core processor may be symmetric or asymmetric.
  • computing system 400 includes an input device 445, which can represent any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech, etc.
  • Computing system 400 can also include output device 435, which can be one or more of a number of output mechanisms known to those of skill in the art.
  • output device 435 can be one or more of a number of output mechanisms known to those of skill in the art.
  • multimodal systems can enable a user to provide multiple types of input/output to communicate with computing system 400.
  • Computing system 400 can include communications interface 440, which can generally govern and manage the user input and system output, and also connect computing system 400 to other nodes in a network. There is no restriction on operating on any particular hardware arrangement and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed.
  • Storage device 430 can be a non-volatile memory device and can be a hard disk or other types of computer readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, solid state memory devices, digital versatile disks, cartridges, battery backed random access memories (RAMs), read only memory (ROM), and/or some combination of these devices.
  • a computer such as magnetic cassettes, flash memory cards, solid state memory devices, digital versatile disks, cartridges, battery backed random access memories (RAMs), read only memory (ROM), and/or some combination of these devices.
  • the storage device 430 can include software services, servers, services, etc., that when the code that defines such software is executed by the processor 410, it causes the system to perform a function.
  • a hardware service that performs a particular function can include the software component stored in a computer- readable medium in connection with the necessary hardware components, such as processor 410, connection 405, output device 435, etc., to carry out the function.
  • FIG. 5 illustrates an example machine learning environment 500.
  • the machine learning environment can be implemented on one or more computing devices 502A-N (e.g., cloud computing servers, virtual services, distributed computing, one or more servers, etc.).
  • the computing device(s) 502 can include training data 504 (e.g., one or more databases or data storage device, including cloud-based storage, storage networks, local storage, etc.).
  • the training data may include data from AREDS and/or AREDS2.
  • the training data 504 of the computing device 502 can be populated by one or more data sources 506 (e.g., data source 1 , data source 2, data source n, etc.) over a period of time (e.g., t, t+1 , t+n, etc.).
  • training data 504 can be labeled data (e.g., one or more tags associated with the data).
  • training data can be one or more images and a label (e.g., drusen status/size (none/small, medium, or large), the presence or absence of pigmentary abnormalities, and/or the presence or absence of RPD) can be associated with each image.
  • the computing device(s) 502 can continue to receive data from the one or more data sources 506 until the neural network 508 (e.g., convolution neural networks, deep convolution neural networks, artificial neural networks, learning algorithms, etc.) of the computing device(s) 502 are trained (e.g., have had sufficient unbiased data to respond to new incoming data requests and provided an autonomous or near autonomous image classification).
  • the neural network can be a convolutional neural network, for example, utilizing five layer blocks, including convolutional blocks, convolutional layers, and fully connected layers (e.g. ‘DeepSeeNet’, Densenet, Resnet, Inception v3, vgg16, or vgg19). While example neural networks are realized, neural network 508 can be one or more neural networks of various types are not specifically limited to a single type of neural network or learning algorithm.
  • a feature selection can be generated (e.g., group correlated features such that one feature is used for each group).
  • features with non-zero coefficients are used in a survival model.
  • the training data can require an equivalent number of images per patient, and as such, if a missing image exists a substitute image can be generated based on the existing images (e.g., in order to enable sufficient training data, while not biasing the training data).
  • the training data 504 can be checked for biases, for example, by checking the data source 506 (and corresponding user input) verse previously known unbiased data. Other techniques for checking data biases are also realized.
  • the data sources can be any of the sources of data for providing the input images (e.g., CFP, FAF, etc.) as described above in this disclosure.
  • the computing device(s) 502 can receive user (e.g., physician) input 510 related to the data source.
  • the user input 510 and the data source 506 can be temporally related (e.g., by time t, t+1 , t+n, etc.).
  • the user input 510 and the data sources 506 can be synchronous in that the user input 510 corresponds and supplements the data source 506 in a manner of supervised or reinforced learning.
  • a data source 506 can provide a CFP and/or FAF image at time t and corresponding user input 510 can be input of drusen size, RPD presence, and/or pigmentary abnormalities of that CFP and/or FAF image at time t. While, time t may actually be different in real-world time, they are synchronized in time with respect to the data provided to the training data.
  • the training data 504 can be used to train a neural network 508 or learning algorithms (e.g., convolutional neural network, artificial neural network, etc.).
  • the neural network 508 can be trained, over a period of time, to automatically (e.g., autonomously) determine what the user input 510 would be, based only on received data 512 (e.g., imaging data, etc.). For example, by receiving a plurality of unbiased data and/or corresponding user input for a long enough period of time, the neural network will then be able to determine what the user input would be when provided with only the data.
  • a trained neural network 508 will be able to receive a CFP and/or FAF image (e.g., 512) and based on the CFP and/or FAF image determine the drusen size, RPD presence, and/or pigmentary abnormalities features or grading that a physician would manually identify (and that would have been provided as user input 510 during training). In some examples, this can be based on labels associated with the data as described above.
  • the output from the trained neural network can be provided to a survival model 514 for treating a patient. In some examples, the output from the trained neural network can be inputted directly into a survival model to predict a rate of progression of late AMD in the patient.
  • Trained neural network system 516 can include a trained neural network 508, received data 512, and survival model 514.
  • the received data 512 can be information related to a patient, as previously described above.
  • the received data 512 can be used as input to trained neural network 508.
  • Trained neural network 508 can then, based on the received data 512, label the received data and/or determine a recommended course of action for treating the patient, based on how the neural network was trained (as described above).
  • the recommended course of action or output of trained neural network 508 can be used as an input into a survival model 514 (e.g., to predict the risk of progression to late AMD for the patient to which the received data 512 corresponds).
  • the output from the trained neural network can be provided in a human readable form, for example, to be reviewed by a physician to determine a course of action.
  • the computer-readable storage devices, mediums, and memories can include a cable or wireless signal containing a bit stream and the like.
  • non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se.
  • Methods according to the above-described examples can be implemented using computer-executable instructions that are stored or otherwise available from computer readable media.
  • Such instructions can comprise, for example, instructions and data which cause or otherwise configure a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Portions of computer resources used can be accessible over a network.
  • the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, or source code. Examples of computer-readable media that may be used to store instructions, information used, and/or information created during methods according to described examples include magnetic or optical disks, flash memory, USB devices provided with non-volatile memory, networked storage devices, and so on.
  • Devices implementing methods according to these disclosures can comprise hardware, firmware and/or software, and can take any of a variety of form factors. Typical examples of such form factors include laptops, smart phones, small form factor personal computers, personal digital assistants, rackmount devices, standalone devices, and so on. Functionality described herein also can be embodied in peripherals or add-in cards. Such functionality can also be implemented on a circuit board among different chips or different processes executing in a single device, by way of further example.
  • the instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are means for providing the functions described in these disclosures.
  • the AREDS was a 12-year multi-center prospective cohort study of the clinical course, prognosis, and risk factors of AMD, as well as a phase III randomized clinical trial (RCT) to assess the effects of nutritional supplements on AMD progression.
  • RCT phase III randomized clinical trial
  • 4,757 participants aged 55 to 80 years were recruited between 1992 and 1998 at 11 retinal specialty clinics in the United States.
  • the inclusion criteria were wide, from no AMD in either eye to late AMD in one eye.
  • the participants were randomly assigned to placebo, antioxidants, zinc, or the combination of antioxidants and zinc.
  • the AREDS dataset is publicly accessible to researchers by request at dbGAP.
  • the AREDS2 was a multi-center phase III RCT that analyzed the effects of different nutritional supplements on the course of AMD. 4,203 participants aged 50 to 85 years were recruited between 2006 and 2008 at 82 retinal specialty clinics in the United States. The inclusion criteria were the presence of either bilateral large drusen or late AMD in one eye and large drusen in the fellow eye. The participants were randomly assigned to placebo, lutein/zeaxanthin, docosahexaenoic acid (DHA) plus eicosapentaenoic acid (EPA), or the combination of lutein/zeaxanthin and DHA plus EPA. AREDS supplements were also administered to all AREDS2 participants, because they were by then considered the standard of care.
  • DHA docosahexaenoic acid
  • EPA eicosapentaenoic acid
  • AREDS supplements were also administered to all AREDS2 participants, because they were by then considered the standard of care.
  • the primary outcome measure was the development of late AMD, defined as neovascular AMD or central GA. Institutional review board approval was obtained at each clinical site and written informed consent for the research was obtained from all study participants. The research was conducted under the Declaration of Helsinki and, for the AREDS2, complied with the Health Insurance Portability and Accessibility Act. For both studies, at baseline and annual study visits, comprehensive eye examinations were performed by certified study personnel using a standardized protocol, and CFP (field 2, i.e. , 30° imaging field centered at the fovea) were captured by certified technicians using a standardized imaging protocol. Progression to late AMD was defined by the study protocol based on the grading of CFP, as described below.
  • the AREDS2 ancillary study of FAF imaging was conducted at 66 selected clinic sites, according to the availability of imaging equipment. Sites were permitted to join the ancillary study at any time after FAF imaging equipment became available during the five-year study period. Hence, while some sites performed FAF imaging on their participants from the AREDS2 baseline visit onwards, other sites performed FAF imaging from later study visits onwards, and the remaining sites did not perform FAF imaging at any point.
  • the FAF images were acquired from the Heidelberg Retinal Angiograph (Heidelberg Engineering, Heidelberg, Germany) and fundus cameras with autofluorescence capability by certified technicians using standard imaging protocols.
  • the graders were certified technicians with over 10 years of experience in the detailed evaluation of retinal images for AMD. (These graders did not overlap at all with the four ophthalmologists described below).
  • RPD were defined as clusters of discrete round or oval lesions of hypoautofluorescence, usually similar in size, or confluent ribbon-like patterns with intervening areas of normal or increased autofluorescence; a minimum of 0.5 disc area (approximately five lesions) was required.
  • Two primary graders at the reading center independently evaluated FAF images (from both initial and subsequent study visits) for the presence of RPD. In the case of disagreement between the two primary graders, a senior grader at the reading center would adjudicate the final grade.
  • the mean number of images used per eye was 2.39: 2.39 (training set), 2.38 (validation set), and 2.39 (test set).
  • the number of images where multiple images were used from the same eye was 8,983 images of 2,432 eyes: 6,316 images of 1 ,708 eyes (training set), 864 images of 229 eyes (validation set), and 1 ,803 images of 495 eyes (test set).
  • the proportion with RPD was 27.1%: 27.9% (training set), 25.2% (validation set), and 25.0% (test set).
  • the proportion that had RPD from the first image used was 76.3%: 78.4% (training set), 71.2% (validation set), and 70.6% (test set).
  • AREDS 2,889
  • 1 ,826 AREDS2
  • SNPs were analyzed using a custom Iliumina HumanCoreExome array.
  • two SNPs CFH rs1061170 and ARMS2 rs10490924, at the two loci with the highest attributable risk of late AMD
  • AMD GRS was calculated for each participant according to methods known in the art.
  • the GRS is a weighted risk score based on 52 independent variants at 34 loci identified in a large genome-wide association study as having significant associations with risk of late AMD.
  • the online calculator cannot receive this detailed information.
  • the eligibility criteria for participant inclusion in the current analysis were: (i) absence of late AMD (defined as NV or any GA) at study baseline in either eye, since the predictions were made at the participant level, and (ii) presence of genetic information (in order to compare model performance with and without genetic information on exactly the same cohort of participants). Accordingly, the images used for the predictions were those from the study baselines only.
  • the ground truth labels used for both training and testing were the grades previously assigned to each CFP by expert human graders at the University of Wisconsin Fundus Photograph Reading Center.
  • the RPD grading was performed from FAF images (since RPD are detected by human experts with far greater accuracy on FAF images than on CFP).
  • the grading for geographic atrophy and pigmentary abnormalities was performed from CFP (since this remains the gold standard for grading pigmentary abnormalities and was traditionally considered the gold standard for grading geographic atrophy).
  • the reading center workflow includes a senior grader performed initial grading of each photograph for AMD severity using a 4-step scale and a junior grader performed detailed grading for multiple AMD-specific features. All photographs were graded independently and without access to the clinical information.
  • RPD were defined as clusters of discrete round or oval lesions of hypoautofluorescence, usually similar in size, or confluent ribbon-like patterns with intervening areas of normal or increased autofluorescence; a minimum of 0.5 disc area (approximately five lesions) was required.
  • Label transfer was used between the FAF images and their corresponding CFP images; this means that the ground truth label obtained from the reading center for each FAF image was also applied to the corresponding CFP. Similarly, the labels from the FAF images were also applied to the CFP-FAF image pairs.
  • CNN convolutional neural network
  • AMD convolutional neural network
  • FAF deep learning model the convolutional neural network
  • CFP deep learning model the binary detection of retinal diseases from CFP.
  • This new CNN contains 601 layers, comprising a total of over 14 million weights (learnable parameters) that are subject to training. Its main novel feature is that each layer passes on its output to all subsequent layers, such that every layer obtains inputs from all the preceding layers (not just the immediately preceding layer, as in previous CNNs).
  • DenseNet has demonstrated superior performance to many other slightly older CNNs in a range of image classification applications. However, for comparison of performance according to the CNN used, we trained additional deep learning models using four different CNNs frequently employed in image classification tasks: VGG version 16, VGG version 19, lnceptionV3, and ResNet version 101 (i.e. , eight additional deep learning models in total).
  • Model convergence was measured when the loss on the training set started to increase. The training was stopped five epochs (passes of the entire training set) after the loss of the training set no longer decreased. All experiments were conducted on a server with 32 Intel Xeon CPUs, using three NVIDIA GeForce GTX 1080 Ti 11 Gb GPUs for training and testing, with 512Gb available in RAM memory. In addition, image augmentation procedures were used, as follows, in order to increase the dataset size and to strengthen model generalizability: (i) rotation (180 degrees), (ii) horizontal flip, and (iii) vertical flip.
  • the DL CNNs used lnception-v3 architecture, which is a state-of-the-art CNN for image classification; it contains 317 layers, comprising a total of over 21 million weights that are subject to training. Training was performed using two commonly used libraries: Keras and TensorFlow. All images were cropped to generate a square image field encompassing the macula and resized to 512 x 512 pixels. The hyperparameters were learning rate 0.0001 and batch size 32. The training was stopped after 5 epochs once the accuracy on the development set no longer increased.
  • CNNs are vulnerable to adversarial attacks (FIG. 16). Images can be attacked by adding a small adversarial perturbation to the original images; the perturbation is imperceptible to humans but misleads a standard CNN model into producing incorrect outputs, with a substantial decline in its predictive performance. As a result, CNNs under adversarial attacks would fail to assist and might even mislead human clinicians. Importantly, such a vulnerability also poses severe security risks and represents a barrier to the deployment of automated CNN-based systems in real-world use, especially in the medical domain where accurate diagnostic results are of paramount importance in patient care.
  • FIG. 17 To alleviate the effect of noisy features learned during training, a robust CNN framework was developed (FIG. 17), in which a novel denoising operator was embedded into each convolutional layer to reduce the noise in its outputs, thereby combatting the effect of adversarial perturbations.
  • the denoising operator contained two layers: an inter-sample denoising layer and an intra-sample denoising layer.
  • the former utilized the entire batch of data to decrease the noise contained in feature representations, which might otherwise be mistaken under adversarial attacks as discriminative features.
  • the latter reduced the noise in each medical image itself, to further lower the noise in feature representations.
  • inter-sample denoising is the key to defend against both adversarial and transferable adversarial examples, probably because it utilizes the other samples to reduce the noise in feature representations of one sample in each batch.
  • intra-sample denoising can further enhance model robustness.
  • inter-sample denoising might require more testing time, when images are tested one by one. This is because inter-sample denoising requires a different set of images to reduce the noise in feature representations of each testing image.
  • the success of combining adversarial training and the method suggests that adversarial training might be complementary to the denoising layers in the method by further decreasing the noise manipulated by adversarial perturbations.
  • Example 4 Deep learning models trained on the combined AREDS/AREDS2 training sets and validated on the combined AREDS/AREDS2 test sets
  • the prediction accuracy of the approaches was compared using the five- year C-statistic as the primary outcome measure.
  • the five-year C-statistic of the two DL approaches met and substantially exceeded that of both existing standards (Table 3).
  • the five-year C-statistic was 86.4 (95% confidence interval 86.2-86.6) for deep features/survival, 85.1 (85.0-85.3) for DL grading/survival, 82.0 (81.8-82.3) for retinal specialists/calculator, and 81.3 (81.1-81.5) for retinal specialists/SSS.
  • the performance of the risk prediction models was assessed by the C-statistic at five years from study baseline. Five years from study baseline was chosen as the interval for the primary outcome measure since this is the only interval where comparison can be made with the SSS, and the longest interval where predictions can be tested using the AREDS2 data.
  • the C-statistic represents the area under the receiver operating characteristic curve (AUC).
  • AUC receiver operating characteristic curve
  • the C- statistic is computed as follows: all possible pairs of participants are considered where one participant progressed to late AMD and the other participant in the pair progressed later or not at all; out of all these pairs, the C-statistic represents the proportion of pairs where the participant who had been assigned the higher risk score was the one who did progress or progressed earlier.
  • a C-statistic of 0.5 indicates random predictions, while 1 .0 indicates perfectly accurate predictions.
  • the Brier score was calculated from prediction error curves.
  • the Brier score is defined as the squared distances between the model’s predicted probability and actual late AMD, GA, or NV status, where a score of 0.0 indicates a perfect match.
  • the Wald test was used to assess the statistical significance of each factor in the survival models. It corresponds to the ratio of each regression coefficient to its standard error.
  • the ‘survival’ package in R version 3.5.2 was used for Cox proportional hazards model evaluation.
  • saliency maps were generated to represent the image locations that contributed most to decision-making by the DL models (for drusen or pigmentary abnormalities). This was done by back-projecting the last layer of the neural network.
  • the Python package ‘keras-vis’ was used to generate the saliency map.
  • the performance of the CNN with the highest AUC was compared (separately for the FAF and the CFP task) with the performance of four ophthalmologists with a special interest in RPD, who manually graded the images by viewing them on a computer screen at full image resolution.
  • the ophthalmologists comprised two retinal specialists (in practice as a retinal specialist for 35 years (EC) and 2 years (TK)) and two post-residency retinal fellows (one second year (CH) and one first year (AT)). Prior to grading, all four ophthalmologists were provided with the same RPD imaging definitions as those used by the reading center graders (see above).
  • Each of five CNNs was used to analyze the FAF images in the test set. Their accuracy in detecting RPD presence, relative to the gold-standard (reading center grading of FAF images), was quantitated using multiple performance metrics. Separately, each of five CNNs was used to analyze CFP images in the test set. Their accuracy in detecting RPD presence, relative to the gold-standard (labels transferred from reading center grading of the corresponding FAF images), was quantitated in a similar way. The results are shown in Table 6 and in FIGS. 8A-8B.
  • DenseNet For the FAF image analyses, DenseNet, relative to the other four CNNs, achieved the highest AUC, the primary performance metric, at 0.939 (95% confidence interval 0.927-0.950), and the highest kappa, the secondary performance metric, at 0.718 (0.685-0.751). Of the five other performance metrics, it was highest for three (sensitivity, accuracy, and F1 -score), relative to the other CNNs. It achieved sensitivity 0.704 (0.667-0.741) and specificity 0.967 (0.958-0.975).
  • DenseNet achieved the highest AUC, at 0.832 (0.812-0.851 ), and the highest kappa, at 0.470 (0.426-0.511 ). Of the five other performance metrics, it was highest for four (sensitivity, accuracy, precision, and F1- score), relative to the other CNNs. It achieved sensitivity 0.538 (0.498-0.575) and specificity 0.904 (0.889-0.918).
  • Example 8 Performance of automated deep learning models versus human practitioners in detecting reticular pseudodrusen
  • DenseNet The highest performing CNN, DenseNet, was used to analyze images from a random subset of the test set (263 FAF and CFP corresponding image pairs).
  • the performance metrics for the detection of RPD presence were compared to those obtained by each of four ophthalmologists who manually graded the images (when viewed on a computer screen at full image resolution).
  • the performance metrics were also compared with CFP-derived labels (i.e. , reading center grading of the CFP images), using the FAF- derived labels as the ground truth.
  • the results are shown in Table 7.
  • the ROC curves for the two DenseNet deep learning models are shown in FIGS. 9A-9B.
  • the performance of the four ophthalmologists is shown as four single points; for the CFP task, the performance of the reading center grading is also shown as a single point.
  • DenseNet relative to the four ophthalmologists, achieved the highest kappa, the primary performance metric, at 0.789 (0.675-0.875). This was numerically higher than the kappa of one retinal specialist, substantially higher than that of the other retinal specialist, and very substantially higher than that of the two retinal fellows. Regarding accuracy and F1 -score, DenseNet achieved the highest performance, at 0.937 (0.907-0.963) and 0.828 (0.725-0.898), respectively.
  • the two retinal specialists demonstrated high levels of specificity (0.995 (0.986-1 .000) for both) and precision (0.961 (0.870-1 .000) and 0.973 (0.906-1.000)), but at the expense of decreased sensitivity (0.498 (0.367-0.632) and 0.673 (0.544-0.800)).
  • the performance of the two retinal specialists was similar or very slightly superior to that of DenseNet (AUC 0.962), while the performance of the two fellows was either moderately or substantially inferior.
  • DenseNet relative to the four ophthalmologists, achieved the highest kappa, at 0.471 (0.330-0.606), the primary performance metric. This was very substantially higher than the kappa of all four ophthalmologists (range 0.105-0.180). It was also substantially higher than the reading center grading of CFP, at 0.258 (0.130-0.387). Regarding accuracy and F1-score, DenseNet achieved the highest performance, at 0.844 (0.798-0.886) and 0.565 (0.434- 0.674), respectively.
  • the performance metrics were generally inferior for subgroup (i) (i.e., eyes with RPD in evolution) and superior for subgroup (ii).
  • subgroup i.e., eyes with RPD in evolution
  • subgroup i.e., eyes with RPD in evolution
  • the AUC was 0.827 versus 0.960, respectively, and the kappa was 0.383 versus 0.766, respectively.
  • the CFP analyses the AUC was 0.636 and 0.863, respectively, and the kappa was 0.180 and 0.507, respectively.
  • Example 10 Attention maps generated on fundus autofluorescence images and color fundus photographs by deep learning model evaluation
  • FIGS. 10A-10C show the study participants’ original FAF image in the top left, the FAF image with the deep learning attention map overlaid in the bottom left, the original CFP image in the top right, and the CFP image with the deep learning attention map overlaid.
  • RPD reticular pseudodrusen
  • the areas of highest signal correspond very well with both (i) the retinal locations observed to contain RPD in the original FAF image (even though the deep learning model never received the FAF image as input), and (ii) the areas of highest signal in the FAF image attention map.
  • the CFP attention map has areas of high signal in both (i) the area (superonasal macula) with RPD possibly visible to humans on the CFP (broken gray arrow), and (ii) other areas where RPD appear invisible on the CFP but visible on the corresponding FAF image.
  • RPD are observed in the original FAF image clearly as widespread ribbon-like patterns of round and oval hypoautofluorescent lesions with intervening areas of normal and increased autofluorescence.
  • the area affected is large, affecting almost the whole macula but sparing the central macula, i.e., in a doughnut configuration.
  • the doughnut configuration is captured well on the attention map, i.e., the areas of highest signal correspond very well with the retinal locations observed to contain RPD in the FAF image.
  • the superior peripheral macula appears to contain RPD (broken gray arrow), but they are not clearly visible in the inferior peripheral macula.
  • the CFP attention map has areas of high signal in both (i) the area (superior macula) with RPD visible to humans on the CFP (broken gray arrow), and (ii) other areas (inferior macula) where RPD appear invisible on the CFP but visible on the corresponding FAF image (even though the deep learning model never received the FAF image as input).
  • RPD are observed in the original FAF image as ribbon-like patterns of round and oval hypoautofluorescent lesions with intervening areas of normal and increased autofluorescence. These are relatively widespread across the macula (black arrows; corresponding almost to a doughnut configuration), as well as the area superior to the optic disc (black arrow), but are less apparent in the inferior and inferotemporal macula (broken black arrows).
  • the areas of highest signal correspond very well with the retinal locations observed to contain RPD in the original FAF image (black arrows).
  • the CFP attention map has areas of high signal in both (i) the area (superotemporal macula) with RPD potentially visible to humans on the CFP (broken gray arrow), and (ii) another area (inferior to the optic disc) that might potentially contain additional signatures of RPD presence.
  • the areas of the fundus images that contributed most consequentially to RPD detection were located in the outer areas of the central macula within the vascular arcades, approximately 3-4 mm from the foveal center.
  • these ‘high attention’ areas correspond well to the typical localization of clinically observable RPD within fundus images.
  • FIGS. 10A-10C clinically observable RPD can be located within these high attention areas.
  • the outputs of the deep learning models display a degree of face validity and interpretability for the detection of RPD.
  • Example 11 Exploratory error analyses
  • the method of detecting RPD from retinal images was externally validated by measuring its performance on images from a completely different population on a different continent: participants in the population-based Rotterdam Eye Study (Rotterdam, Netherlands).
  • the study population consisted of 278 eyes from 230 participants. Of the 278 eyes, 73 were positive for RPD and the remaining 205 were negative (including 113 eyes with soft drusen but no RPD and 92 control eyes with neither soft drusen nor RPD).
  • the ground truth used to measure performance were the labels (RPD present/absent) from reading center grading (EyeNED Reading Center, Rotterdam), based on the combination of CFP, FAF, and near-infrared imaging.
  • Example 13 Multi-modal, multi-task, multi-attention (M3) deep learning framework
  • non-M3 deep learning models were created, one for each image scenario, in order to compare performance between these and the M3 models.
  • the structure of the non-M3 CFP model and the non-M3 FAF model represents what is used in existing studies of other medical computer vision tasks to achieve state-of-the-art performance.
  • the non-M3 models created were expected to have a high level of performance, in order to set a high standard for the M3 models.
  • the non-M3 model comprised a CNN backbone, followed by the fully-connected and output layers. To ensure a fair comparison, lnceptionV3 was used as the CNN backbone for both the non-M3 and the M3 models.
  • lnceptionV3 is a state-of-the-art CNN architecture that is used commonly in medical computer vision applications. For the same reasons, the fully-connected and output layers were exactly the same as those used in the M3 models.
  • the non-M3 model used a typical concatenation to combine the CFP and FAF image features from the lnceptionV3 CNN backbones. Unlike the M3 models, the three non-M3 models were trained separately and did not use attention mechanisms.
  • an M3 model was trained 10 times using the same training/validation/test split shown in Table 1 , to create 10 individual M3 models (i.e. 30 models in total).
  • each non-M3 model was trained 10 times (i.e. another 30 models), using the same training/validation/test split. This was to allow a fair comparison between the two model types, including meaningful statistical analysis (as described below). Both the M3 and the non-M3 models shared the same hyperparameters and training procedures to ensure a fair comparison (except that the M3 models had an additional cascading task fine-tuning step, as shown in FIG. 11).
  • the lnceptionV3 CNN backbones were pre-trained using ImageNet, an image database of over 14 million natural images with corresponding labels, using methods described previously. During the training process, each input image was scaled to 512x512 pixels. The model parameters were updated using the Adam optimizer (learning rate of 0.001) for every minibatch of 16 images. An early stop procedure was applied to avoid overfitting: the training was stopped if the loss on the validation set no longer decreased for 5 epochs. The M3 models completed training within 30 epochs, whereas the non-M3 model completed training within 10 epochs.
  • Example 14 Evaluation of the deep learning models in comparison with each other
  • each model was evaluated against the gold standard reading center grades on the full test set of images.
  • the following metrics were calculated: F1 -score, area under receiver operating characteristic (AUROC), sensitivity (also known as recall), specificity, Cohen’s kappa, accuracy, and precision.
  • the F1 -score (which incorporates sensitivity and precision into a single metric) was the primary performance metric.
  • the AUROC was the secondary performance metric.
  • the performance of the deep learning models was evaluated separately for the three imaging scenarios; for each scenario, the performance of the M3 models was compared with those of the non-M3 models.
  • the Wilcoxon rank sum test was used to compare the F1 -scores of the 10 M3 and 10 non-M3 models (separately for each imaging modality).
  • the differential performance of the models was analyzed by examining the distribution of cases correctly classified by both models, neither model, the non-M3 model only, or the M3 model only. For these analyses, bootstrapping was performed with 50 iterations, with one of the 10 models selected randomly for each iteration. Similar methods were followed for the other two AMD features (geographic atrophy and pigmentary abnormalities).
  • FIG. 12 shows box plots showing the F1 score results of the multi-modal, multi-task, multi-attention (M3) and standard (non-M3) deep learning convolutional neural networks for the detection of reticular pseudodrusen from color fundus photographs (CFP) alone, their corresponding fundus autofluorescence (FAF) images alone, or the CFP-FAF image pairs, using the full test set.
  • CFP color fundus photographs
  • FAF fundus autofluorescence
  • Each model was trained and tested 10 times (i.e. 60 models in total), using the same training and testing images each time.
  • the F1-scores were substantially higher for the FAF and CFP-FAF scenarios than for the CFP scenario. In all three image scenarios, the F1 -score of the M3 model was significantly and substantially higher than that of the non-M3 model.
  • Table 9 shows performance results of the M3 and non-M3 deep learning convolutional neural networks for the detection of reticular pseudodrusen from color fundus photographs (CFP) alone, their corresponding fundus autofluorescence (FAF) images alone, or the CFP-FAF image pairs, using the full test set.
  • the median and interquartile range (brackets) are shown for each performance metric.
  • using the same default cut-off threshold of 0.5 the sensitivity of the M3 models was substantially higher for all three image scenarios, and particularly for the CFP scenario.
  • the M3 had higher AUROC for all three image scenarios, suggesting that the M3 could better distinguish positive and negative cases.
  • the differential performance of the models was further analyzed by examining the distribution of cases correctly classified by both models, neither model, the M3 model only, or the non-M3 model only, as shown in FIG. 13. Analysis of the positive cases demonstrated a relatively high frequency where only the M3 model was correct, particularly for the CFP image scenario (mean 23.7%, SD 9.1%), and a very low frequency of cases where only the non-M3 model was correct (mean 6.1 %, SD 4.1 %). Similarly, in the FAF scenario, the equivalent figures were 14.2% (SD 6.1%) and 2.1% (SD 1.1 %), respectively.
  • the performance of the deep learning models was compared with the performance of 13 ophthalmologists who manually graded the same images (when viewed on a computer screen at full image resolution).
  • the test set of images was a random subset of the full test set (at the participant level) and comprised 100 CFP, and the 100 corresponding FAF images, from 100 different participants (comprising 68 positive cases and 32 negative cases).
  • Each model was trained and tested 10 times, using the same training and testing images each time.
  • the ophthalmologists performed the grading independently of each other, and separately for the two image scenarios (i.e. CFP-alone then FAF-alone).
  • the ophthalmologists comprised three different levels of seniority and specialization in retinal disease: ‘attending’ level (highest seniority) specializing in retinal disease (4 people), attending level not specializing in retinal disease (4 people), and ‘fellow’ level (lowest seniority) (5 people).
  • ‘attending’ level highest seniority
  • attending level not specializing in retinal disease 4 people
  • ‘fellow’ level lowest seniority
  • FIGS. 15A and 15B and Table 10 The results are shown in FIGS. 15A and 15B and Table 10.
  • the performance of the deep learning models is shown by their ROC curves, with the performance of each ophthalmologist shown as a single point.
  • the median F1 -scores of the ophthalmologists were 31.14 (IQR 10.43), 35.04 (IQR 5.34), and 40.00 (IQR 9.64), for the attending (retina), attending (non-retina), and fellow levels, respectively.
  • This low level of human performance was expected, since RPD are typically observed very poorly on CFP, even at the gold standard level of reading center experts.
  • the median F1 -score was 64.35 (IQR 6.29) for the M3 models and 49.14 (IQR 24.58) for the non-M3 models.
  • the F1 -scores of the M3 models were approximately 84% higher than those of the ophthalmologists (p ⁇ 0.0001). Indeed, the performance of the M3 models was twice as high as that of the retinal specialists at attending level (the most senior level of ophthalmologists and those specialized in retinal disease).
  • the median F1 -scores of the ophthalmologists were 81.81 (IQR 3.43), 68.32 (IQR 5.86), and 79.41 (IQR 4.83), for the attending (retina), attending (non-retina), and fellow levels, respectively.
  • the median F1 -score was 85.25 (IQR 5.24) for the M3 models and 78.51 (IQR 8.51 ) for the non-M3 models.
  • the F1 - scores of the M3 models were significantly higher than those of the ophthalmologists (p ⁇ 0.001).
  • Table 10 provides performance results of the M3 and non-M) deep learning convolutional neural networks, in comparison with those of 13 ophthalmologists, for the detection of reticular pseudodrusen from color fundus photographs (CFP) alone or their corresponding fundus autofluorescence (FAF) images alone, using a random subset of the test set.
  • the median and interquartile range (brackets) are shown for each performance metric.
  • FIGS. 14A-14C shows representative examples where the non-M3 model missed RPD presence but the M3 model correctly detected it.
  • the non-M3 models had only one or very few focal areas of high signal; often, these did not correspond with retinal areas where RPD are typically located.
  • the M3 models tended to demonstrate more widespread areas of high signal that corresponded well with retinal areas where RPD are located (e.g. peripheral macula).
  • Example 17 External validation of deep learning models using a secondary dataset not involved in model training
  • a secondary and separate dataset was used to perform external validation of the trained deep learning models in the detection of RPD.
  • the secondary dataset was the dataset of images, labels, and accompanying clinical information from a previously published analysis of RPD in the Rotterdam Study.
  • eyes with and without RPD were selected from the Rotterdam Study, a prospective cohort study investigating risk factors for chronic diseases in the elderly. The study adhered to the tenets in the Declaration of Helsinki and institutional review board approval was obtained.
  • the dataset comprised 278 eyes of 230 patients aged 65 years and older, selected from the last examination round of the Rotterdam Study and for whom three image modalities were available (CFP, FAF, and NIR).
  • the ground truth labels for RPD presence/absence came from human expert graders locally in the Rotterdam Study.
  • RPD were defined as indistinct, yellowish interlacing networks with a width of 125 to 250 ⁇ m on CFP; groups of hypoautofluorescent lesions in regular patterns on FAF, and groups of hyporeflectant lesions against a mildly hyperreflectant background in regular patterns on NIR images.
  • the results are shown in Table 11.
  • the F1 -scores of the three M3 models were 78.74 (CFP-alone), 65.63 (FAF-alone), and 79.69 (paired CFP-FAF).
  • the equivalent AUROC values were 96.51 , 90.83, and 95.03, respectively.
  • the F1 -score of the FAF M3 model was inferior on the external dataset, and AUROC was modestly inferior.
  • the F1 -score of the CFP-FAF M3 model on the external dataset was very similar to that for the primary dataset, and the AUROC was actually superior on the external dataset.
  • Example 18 Automated detection of geographic atrophy and pigmentary abnormalities by multi-modal, multi-task , multi -attention (M3) deep learning models
  • M3 deep learning models were trained to detect two other important features of AMD, geographic atrophy and pigmentary abnormalities.
  • the median F1 -scores of the M3 models were numerically higher than those of the non-M3 models. The differences were statistically significant for the CFP-only and FAF-only scenarios (p ⁇ 0.001 and p ⁇ 0.01 , respectively). The superiority of the M3 models was particularly evident for the clinically important CFP-only scenario.
  • the median F1 -score was 83.99 (IQR 1 .80) for the M3 model and 80.20 (IQR 1.48) for the non-M3 model.
  • the model with the highest F1 -score was the M3 model in the CFP-FAF scenario, at 85.45 (IQR 1.24).

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Pathology (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Quality & Reliability (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biochemistry (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)

Abstract

L'invention concerne des systèmes et des méthodes pour prédire un risque de dégénérescence maculaire liée à l'âge (DMLA) tardive. La méthode peut consister à recevoir une ou plusieurs images de photographie couleur du fond de l'œil (CFP) des deux yeux d'un patient, à classifier chaque image de CFP et à prédire le risque de DMLA tardive par l'estimation d'une durée jusqu'à l'apparition d'une DMLA tardive. La classification de chaque image de CFP peut comprendre l'extraction d'une ou plusieurs caractéristiques de profondeur de corps colloïdes maculaires et d'anomalies pigmentaires dans chaque image de CFP, le classement des corps colloïdes et des anomalies pigmentaires et/ou la détection de la présence de pseudocorps colloïdes réticulaires (RPD) dans chaque image de CFP. La prédiction du risque de DMLA tardive peut comprendre l'estimation d'une durée jusqu'à l'apparition d'une DMLA tardive à l'aide d'un modèle à risques proportionnels de Cox en utilisant la présence de RPD, de ladite ou desdites caractéristiques de profondeur et/ou des anomalies pigmentaires et des corps colloïdes classés.
EP21711144.2A 2020-02-18 2021-02-18 Méthodes et systèmes pour prédire des taux d'évolution de la dégénérescence maculaire liée à l'âge Pending EP4107290A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202062978070P 2020-02-18 2020-02-18
PCT/US2021/018589 WO2021168121A1 (fr) 2020-02-18 2021-02-18 Méthodes et systèmes pour prédire des taux d'évolution de la dégénérescence maculaire liée à l'âge

Publications (1)

Publication Number Publication Date
EP4107290A1 true EP4107290A1 (fr) 2022-12-28

Family

ID=74867683

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21711144.2A Pending EP4107290A1 (fr) 2020-02-18 2021-02-18 Méthodes et systèmes pour prédire des taux d'évolution de la dégénérescence maculaire liée à l'âge

Country Status (5)

Country Link
US (1) US20230093471A1 (fr)
EP (1) EP4107290A1 (fr)
AU (1) AU2021224660A1 (fr)
CA (1) CA3177173A1 (fr)
WO (1) WO2021168121A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230005620A1 (en) * 2021-06-30 2023-01-05 Johnson & Johnson Vision Care, Inc. Systems and methods for identification and referral of at-risk patients to eye care professional

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3525659A4 (fr) * 2016-10-13 2020-06-17 Translatum Medicus, Inc. Systèmes et procédés de détection d'une maladie oculaire

Also Published As

Publication number Publication date
US20230093471A1 (en) 2023-03-23
AU2021224660A1 (en) 2022-10-06
WO2021168121A1 (fr) 2021-08-26
CA3177173A1 (fr) 2021-08-26

Similar Documents

Publication Publication Date Title
US10722180B2 (en) Deep learning-based diagnosis and referral of ophthalmic diseases and disorders
Li et al. Deep learning-based automated detection of glaucomatous optic neuropathy on color fundus photographs
Keel et al. Development and validation of a deep‐learning algorithm for the detection of neovascular age‐related macular degeneration from colour fundus photographs
He et al. Multi-label ocular disease classification with a dense correlation deep neural network
Reguant et al. Understanding inherent image features in CNN-based assessment of diabetic retinopathy
WO2018045363A1 (fr) Procédé de criblage pour la détection automatisée de maladies dégénératives de la vision à partir d'images de fond d'œil en couleur
Coan et al. Automatic detection of glaucoma via fundus imaging and artificial intelligence: A review
US20210407637A1 (en) Method to display lesion readings result
Sengar et al. EyeDeep-Net: a multi-class diagnosis of retinal diseases using deep neural network
Khanna et al. Deep learning based computer-aided automatic prediction and grading system for diabetic retinopathy
Islam et al. Enhancing lung abnormalities detection and classification using a Deep Convolutional Neural Network and GRU with explainable AI: A promising approach for accurate diagnosis
Singh et al. A novel hybridized feature selection strategy for the effective prediction of glaucoma in retinal fundus images
US20230093471A1 (en) Methods and systems for predicting rates of progression of age-related macular degeneration
Bhati et al. An interpretable dual attention network for diabetic retinopathy grading: IDANet
Bilal et al. NIMEQ-SACNet: A novel self-attention precision medicine model for vision-threatening diabetic retinopathy using image data
Islam et al. Enhancing lung abnormalities diagnosis using hybrid DCNN-ViT-GRU model with explainable AI: A deep learning approach
US20240108276A1 (en) Systems and Methods for Identifying Progression of Hypoxic-Ischemic Brain Injury
Sridhar et al. Artificial intelligence in medicine: diabetes as a model
Xu et al. Computer aided diagnosis of diabetic retinopathy based on multi-view joint learning
Rafid et al. An early-stage diagnosis of diabetic retinopathy based on ensemble framework
Meel et al. Melatect: A Machine Learning Model Approach For Identifying Malignant Melanoma in Skin Growths
Zhou et al. Discovering abnormal patches and transformations of diabetics retinopathy in big fundus collections
Surabhi et al. Diabetic Retinopathy Classification using Deep Learning Techniques
Rathor et al. Deep Learning based Diabetic Retinopathy Detection using Image Processing
Arif et al. Fundus images classification for Diabetic Retinopathy using machine learning technique

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220907

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230526