WO2023121575A1

WO2023121575A1 - Determining the age and arrest status of embryos using a single deep learning model

Info

Publication number: WO2023121575A1
Application number: PCT/TR2021/051472
Authority: WO
Inventors: Muhammed Yasin OZKUL; Mustafa Mert TUNALI; Toprak Mustafa OZTURK; Derya Unutmaz
Original assignee: Kodmed Saglik Ve Bilisim Teknolojileri A.S
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2023-06-29

Abstract

The present invention relates to In-vitro Fertilization (IVF) and methods for selecting embryos. In particular, the present invention relates to an artificial intelligence supported monitoring system and determining the age and arrest status of embryos using a single deep learning model. The invention describes a method for monitoring embryos, collection of embryo images from IVF patients, uploading images to a server, cropping images, giving unique id to each cropped images, generating a compiled dataset comprising grading scores by embryologists, process of data augmentation where images are randomly rotated clockwise and counterclockwise for each training step for data augmentation, training the deep learning model through ground truth labels and preprocessed images and, detecting both arrest status and physiological age of embryos through single deep learning model.

Description

DETERMINING THE AGE AND ARREST STATUS OF EMBRYOS USING A SINGLE DEEP LEARNING MODEL

TECHNICAL FIELD

In general, the present invention relates to In-vitro Fertilization (IVF) and methods for selecting embryos. In particular, the present invention relates to an artificial intelligence supported monitoring system and determining the age and arrest status of embryos using a single deep learning model.

Conceiving a child is a problem for one in seven couples. Thus, there is a high demand for solutions such as in vitro fertilization (IVF), which involves fertilizing and then developing embryos in the laboratory for 5-6 days, selecting up to two of the highest graded embryos, that are then implanted into the patient. However, the livebirth rate (LBR) for the first cycle in all women undergoing IVF treatment is around 30%, which can be a major financial, physical, and emotional problem for the couples.

IVF procedures begin with an ovarian stimulation phase which stimulates egg production. Retrieved oocytes are fertilized in-vitro with sperm which penetrates the Zona Pellucida, a glycoprotein layer surrounding the oocyte, to form a zygote. An embryo develops over a period of around 5 days, after which time the embryo has formed a blastocyst. At around 5 days the blastocyst is still surrounded by the Zona Pellucida, from which the blastocyst hatches, to then implant into the endometrial wall. The selection of the best embryo is critical to ensure a positive pregnancy outcome. Generally, an embryologist visually assesses the embryos using a microscope to make this decision.

The global market size of IVF was estimated to have a value of USD 21.9 billion in 2021 , and with a predicted Compound Annual Growth Rate (CAGR) of 6.5%, this number is forecasted to increase to USD 33.9 billion by 2028.

Despite the size of the industry and its meteoric rise in popularity, the live-birth rate (LBR) for the first cycle in all women undergoing IVF treatment was 29.5% (95% Cl, 29.3%-29.7%). The success rate of IVF remains unacceptably low, and a potential, proposed solution is the utilization of state-of-the-art technologies such as Artificial Intelligence (Al) to increase success rates.

However, despite growing demand and investment, some aspects of the IVF process remain an imperfect science. Embryo selection plays a crucial role in determining successful fertilization, and accounts for an estimated one-third of all implantation failures, yet it is often predicated on the subjective assessment of a single embryologist. Embryologists utilize guidelines to aid in their classification, but this is a purely visual process that is subject to inter- and intra-observer bias. Thus, embryo selection is a manual process that involves a subjective assessment of embryos by an embryologist through visual inspection. Factors such as field experience, work load, and microscope quality are just a few of the various elements that can impact an embryologist’s ability to accurately grade embryos. Furthermore, every IVF clinic utilizes a different approach to collecting, culturing, and evaluating embryos, with different laboratory environments influencing embryo development.

While still controversial, many attempts have been made to utilize Artificial Intelligence (Al) to aid in providing more objective insights into the matter.

Deep learning is one of the most popular paradigms in Al. Deep learning allows computational models, that are composed of multiple layers of artificial neurons, to learn patterns of data by itself. Deep learning methods can be modified according to data types in order to better represent the data that it's learning. One instance of this specialized deep learning class is Deep Convolutional Neural Networks (DCNN). DCNNs are specifically designed for visual data, as it seeks spatial patterns within given data to minimize prediction error. Many applications of DCNNs are commonly used for image classification problems, object detection, image segmentation, etc.

One fair criticism of deep learning is that the model’s performance is largely dependent on the type, quality, and quantity of data available. When tested models are trained under stringent and controlled environments, performance metrics tend to be significantly higher than if the models were tested under different scenarios. Additionally, models can falter and completely misinterpret data that its processing when presented with input that it has yet to be trained with.

The state of the art document US9404908B2 mentions a non-invasive time-lapse imaging of human embryos from the zygote to the blastocyst stage to determine the ploidy of an embryo. A subset of these parameters is also highly predictive of blastocyst quality and thus assists in embryo selection. These findings show that human embryo development is characterized by precise timing in developmental windows; and aneuploid embryos have altered timing that suggest perturbation of key cell cycle processes.

The document US10748288B2 which is another state of the art teaching, discloses methods and systems for determining quality of an oocyte to reach various reproductive milestones, including fertilizing, developing into a viable embryo (blastocyst), implanting into the uterus, and reaching a clinical pregnancy, through visual assessment (non-invasive) from a single image using artificial intelligence software.

The teaching of the document US10395211 B2 discloses a computer system automatically converting a set of training images of cells (e.g., oocytes or pronuclear embryos) and related outcome metadata into a description document by extracting features (e.g., cytoplasm features) from the pixel values of the training images that describe the cells and associating the extracted features with the outcome metadata. Based on the description document, the system automatically computes a decision model that can be used to predict outcomes of new cells. To predict outcomes of new cells, a computer system automatically extracts features from images that describe the new cells and predicts one or more outcomes by applying the decision model. The features extracted from the images that describe the new cells correspond to features selected for inclusion in the decision model and are calculated in the same way as the corresponding features extracted from the training images.

In state of the art, performance metrics tend to be significantly higher than if the models were tested under different scenarios. Additionally, models can falter and completely misinterpret data that its processing when presented with input that it has yet to be trained with.

A single model determines the age of embryos and arrest status through microscopic image captures alone in the present invention. In this respect, training the model using a relatively small and diverse data set is essential.

One of the aims of the invention is to provide an Al determining the age or arrest status of embryos through morphological features. Another aim of the invention is to enable a model using a relatively small and diverse data set.

Another aim of the invention is to combine the detection of age and arrest status into a singular model.

Another aim of the invention is to provide complete automation of the process.

Automation of the process allows for digitalization of embryologists’ workflow resulting in embryos remaining outside of their incubators for a shorter duration. This lowers the risk of homeostatic balance offset, leading to a higher probability of successful embryo development.

Automation of the process allows for digitalization of embryologists’ workflow, allowing for optimal/non-rushed evaluation of embryos, without having to be concerned about the detriments of embryos being outside of their incubators, resulting in more confident assessments.

Automated systems utilizing deep learning can determine the age and arrest status of embryos from a patient in less than a second, compared to the minutes required by the average embryologist.

In brief, the present invention outperformed experienced and successful embryologist in terms of sensitive results, pace of the process and quality outcomes.

In brief, embryology is a vital aspect of the IVF process and approximately one-third of all failed IVF procedures are related to embryo quality. Despite embryologists’ roles being so critical, the majority of their assessments are purely subjective (that is refined through clinical experience). The present invention offers a tool to aid the embryologist in their decision-making process, through providing more objective, critical data in the form of the Al model. Finally, it is desirable that the monitoring system is cost effective to maximize the return on investment and profit.

BRIEF DESCRIPTION OF THE DRAWINGS

The figures whose descriptions are given below, aim to exemplify monitoring IVF processes with a deep learning assistance whose advantages with respect to the state of the art are already summarized above and will be discussed in detail hereinafter. The figures should not be construed as limiting the scope of protection as defined in the claims and are not to be referenced solely in interpreting the scope of said claims without regarding the technique in the description.

Figure 1 shows generating compiled dataset process of the present invention.

Figure 2 shows processing of the data set and training of deep learning model in the present invention.

Figure 3 is labeling process of the present invention.

Figure 4 shows input (a), putting a grid to input (b), region proposal (c), class predictions (d), detected objects (d) and finally cropped objects (f) of the present invention.

Figure 5 shows examples of embryo images according to age and arrest status in the present invention.

Figure 6 depicts examples of embryos in their original state and under a heatmap. Heatmaps graphically illustrate the areas of images that the model prioritizes in deducing its prediction according to present invention.

Figure 7 presents a table with the best five performing models from training experiments according to present invention.

DETAILED DESCRIPTION OF THE IN ENTION

Generally, this invention relates to a method and apparatus for estimating an embryo viability score from images. Specifically, this invention may be utilized for configuration of computationally generating an Artificial Intelligence (Al) model. The present invention utilizes microscopic datasets of human embryo images taken from Day 1 to 5. A proprietary algorithm, which automatically cropped grouped embryos into individual images. Subsequently, each de-identified single embryo image is graded, according to their arrest status at different days. Each embryo image is differentiated, taken chronologically from day 1 to day 5, from each other by training deep learning models with embryologist-graded microscopic images. In addition, said models are trained to identify embryos that have ceased to develop in vitro, herein referred to as “arrested”. The non-arrested embryos are considered to be healthy and physiologically in accordance with the chronologic age. Arrested embryos appear distinctly as cells with abnormal morphology, fragmentation, and mitotic activity.

Non-arrested embryo classes (consisting of Day 1 , 2, 3, 4, 5) and the arrested embryo class constitute a total of 6 classes in present invention.

Figures 1 , 2 and 3 shows an Al model configured to estimate an embryo viability score from at least one image of an embryo. Figure 1 is schematic flow chart of the generation of compiled dataset from images collected from IVF patients according to an embodiment. A plurality of images is obtained from one or more data sources. Each image is captured during a pre-determined time window after IVF. The embryo images are obtained from patients undergoing routine infertility treatment. The images can be sourced from IVF clinics and images captured using optical light microscopy. In a preferred embodiment, routinely recorded images of grouped embryos in their culture media are taken using the camera attachment of a stereo microscope. Images are captured once per day post-fertilization (day 1 to day 5). A proprietary algorithm is applied to the images, which automatically identified individual embryos from a group and cropped them as individual images as it is shown in figure 4. The embryo images were then organized according to their chronological age. Poor quality images were subsequently excluded. Poor image quality is decided, if embryologists could not clearly assess, such as cases of blurry images resulting from out-of-focus captures.

In an embodiment of the invention, said proprietary algorithm involves an object detection model known as YOLO v4, that was trained using the same dataset that was annotated using an annotator. Training and procedures such as image augmentation was done using default settings.

In an embodiment of the invention, to prevent sample size imbalance among the categories, only day 4 and 5 images are taken from same subjects. After removing poor quality images that failed to meet our quality control criteria, the number of remaining images is classified under 6 classes: Day 1 , Day 2, Day 3, Day 4, Day 5, and Arrest. The arrest class contained images of embryos that have ceased to develop in vitro from all ages.

Figure 5 shows different cases that embryos can have through different sizes, cell counts and appearances. Although, there are certain morphological landmarks that may aid in appropriate classification. Examples of morphological landmarks used to determine the age of embryos are cell count, pronucleus presence, blastocoel formation.

In an embodiment, images are split into training and testing datasets, containing 80% and 20% of usable embryo images, respectively. Splitting is done on a subject basis i.e. , all the images gathered from one subject are inside either the training or testing dataset to prevent bias. In the present invention, images are randomly rotated clockwise and counterclockwise for each training step for data augmentation.

Data augmentation is a technique harnessed to overcome the heavy reliance on big data in deep learning models. Deep learning models enhance both the quality and quantity of available data.

Convolutional Neural Networks (CNNs) have been used extensively for embryo imaging. CNNs receive an input (embryo image) and the corresponding label for that input as its output (chronological age of the embryo, and arrest status) to learn the useful features that will lead to lower classification errors.

In an embodiment of the invention, popular CNN architectures such as Xception, Inception-v3, VGG and DenseNet are used to predict the chronological age and arrest status of an embryo.

All the models are initialized with their pre-trained weights of the ImageNet dataset. The last fully connected layer of each model is discarded (i.e., the “top” of the network) and replaced with a randomly initialized final layer which classifies embryos into 6 different categories in contrast to ImageNet’s 1000, while still maintaining the useful features learned before from ImageNet. Before the final layer, dropout layer is put to prevent quick overfitting. After the modifications, all layers of the original model were frozen so that we can fine-tune the final layer. This procedure is usually called transfer learning and is proven to be beneficial with regards to learning efficiency.

In the present invention, Receiver Operating Characteristic’s Area Under Curve (ROC AUC), Categorical Accuracy (ACC) of the models are used for comparisons between them.

The ROC curve is a graph of the true positive rate (TPR) [sensitivity] against the false positive rate (FPR) [1 -specificity] at various threshold settings for binary classifiers. The numeric value of the area under the ROC curve (AUC) is defined and used as a metric measuring successful decision making by the model. This work is multi-class by nature; therefore, the ROC curve is constructed in a one-vs-rest fashion in order to apply the metric. Meaning, each class is evaluated against all other classes combined. Overall AUC was then calculated by averaging each class’s one-vs-rest AUC score.

Categorical accuracy is calculated as:

The denotations for the formula above are as follows: TP (True Positive), TN (True Negative), FP (False Positive), FN (False Negative).

Multiple models of each architecture were trained and tested with a wide range of hyper-parameter settings to obtain the best configuration. Each model was trained using different dropout rates, learning rates and optimizers. Learning rates (LR) were tested within a range of 0.0001 to 0.01 , tested dropout rates measured between a range of 0 to 0.7, while the tested optimizers were Adam, RMSProp and Stochastic Gradient Descent (SGD). All the other hyper-parameters regarding the model and training settings were left as default. Consequently, a single model using a fairly small dataset can detect arrest status as well as physiological age of the embryo. Also, using a single model provides energy efficiency during the process. Deep Learning is a black box by nature, and as a result it cannot exactly be denoted that the methodology for determining its most confident prediction. However, our model did not suffer from over- or underfitting and was successful in learning by itself.

Overfitting is a consequence of models not learning, rather memorizing the training data, leading to poor generalization in the validation and/or test datasets.

Underfitting is a consequence of models neither learning nor memorizing the training data, leading to poor generalization in the validation and/or test datasets.

In a preferred embodiment, all implementation was done using Python language and Tensorflow. All models are trained using the training dataset with a batch size of 2- 32, with early stopping conducted to avoid overfitting. Evaluations were carried out with the testing dataset using the evaluation metrics mentioned before. All training and validation are done using a Nvidia Tesla V100 Graphics Processing Unit (GPU). In a preferred embodiment, an auto-crop algorithm which detects, isolates, and captures each embryo in grouped cultures was built using YOLOv4. When performance testing the model, it is shown to have an Intersection over Union (loU) score of 87.43%, 100% recall (sensitivity) and 98% precision (specificity).

In a preferred embodiment, 47 different models are trained and evaluated, each on the same training and validation datasets, for essentially a twofold task: Comparing the morphological characteristics of the embryo in question (EIQ) against the learned features in each day from Day 1 -5. Secondly, comparing the morphological characteristics of the embryo in question (EIQ) against the learned features for arrested versus non-arrested embryos.

To clarify, the present invention does not complete these two tasks in a stepwise manner, but rather compares 6 classes against one another simultaneously (D1 vs D2 vs D3 vs D4 vs D5 vs Arrest).

Figure 6 shows images embryos in their original forms, and under heat maps, depicting the areas that the model prioritizes during evaluation for decision making.

Figure 7 shows the 5 architectures and hyperparameters with the highest training accuracies. Out of all 47 experiments the most successful model used Xception architecture, trained with Adam optimizer, and a 0.0001 learning rate. When evaluated on the training dataset (n = 564) it has an overall accuracy of 72.7% and AUC score of 0.93. Embryos in day 1 had the highest score among all 6 classes with an AUC score of 1 .00, Day 5 is a close second with an AUC score of 0.99. Day 2, 3 and 4 had AUC scores of 0.96, 0.91 and 0.90, respectively. Arrest classification has the lowest AUC at a value of 0.85. Like the AUC score, Day 1 has the highest accuracy at 93%, followed again by Day 5 at an accuracy of 83%. Despite having the lowest AUC score, Arrest classification did not have the lowest accuracy at 61 %, Day 4 with an accuracy of 56% has the lowest accuracy. Day 2 and Day 3 had accuracies of 73% and 66%, respectively.

Claims

1 ) A method for analyzing embryos comprising the steps of: collection of embryo images from in-vitro fertilization patients, uploading images to a server, cropping images, giving unique identifications to each cropped image, generating a compiled dataset comprising grading scores by embryologists, training the deep learning model through ground truth labels and preprocessed images and, detecting both arrest status and physiological age of embryos through single deep learning model.

2) The method of claim 1 , further comprising the steps of: taking each embryo image from day 1 to day 5 separately to train deep learning model.

3) The method of claim 1 , further comprising the steps of: exclusion of poor quality images through the assessment of embryologists.

4) The method of claim 1 , further comprising the steps of: preventing sample size imbalance among the categories through taking day 4 and 5 images from same subjects.

5) The method of claim 1 , further comprising the steps of: using morphological landmarks of cell count or pronucleus presence or blastocoel formation to determine the age of embryos.

6) The method of claim 1 , further comprising the steps of: splitting images into training and testing datasets.

7) The method of claim 6, further comprising the steps of: splitting between %75-%85 of usable images into training datasets.

8) The method of claim 1 , further comprising the steps of: giving embryo images as an input to CNNs, the corresponding label for that input as its outputs which are chronological age of the embryo, arrest status.

9) The method of claim 1 , further comprising the steps of: discarding the last fully connected layer of each model, replacing it with a randomly initialized final layer which classifies embryos into 6 different categories.

10) The method of claim 1 , further comprising the steps of: preventing quick overfitting through putting a dropout layer before the final layer in deep learning training process.

11 ) The method of claim 1 , further comprising the steps of: training using the dataset between a batch size of 2-32, with early stopping.

12) The method of claim 1 , further comprising the steps of: collecting images through a photo detector mounted on a microscope.

13) The method of claim 1 , further comprising the steps of: process of data augmentation where images were randomly rotated clockwise and counterclockwise for each training step for data augmentation.

14) The method of claim 1 , further comprising the steps of: resizing images between 200-299x200-299

15) The method of claim 1 , further comprising the steps of: normalizing the incoming data of 0-255 and converting it between 0-1 .

16) The method of claim 1 , further comprising the steps of: applying dropout, randomly shutting down neurons during training.

17) The method of claim 16, further comprising the steps of: applied dropout is set between 0-0.7 dropout rate.

18) The method of claim 17, further comprising the steps of: the optimizer updates the weights and biases according to the categorial cross entropy between the models’ predictions and ground truth labels, at the current step. 19) An embryo analyzing system configured to computationally decide arrest status and physiological age of the embryo, wherein the computational system comprises of: a photo detector to collect embryo images, a server having a collection of embryo images from IVF patients in which said images cropped and given unique id’s, a dataset in the said server comprising grading scores by embryologists, a single deep learning model trained through ground truth labels and preprocessed images with 6 classes of day 1 , day 2, day 3, day 4, day 5, and arrest.