CN112582022B - System and method for non-invasive embryo transfer priority rating - Google Patents

System and method for non-invasive embryo transfer priority rating Download PDF

Info

Publication number
CN112582022B
CN112582022B CN202010705776.9A CN202010705776A CN112582022B CN 112582022 B CN112582022 B CN 112582022B CN 202010705776 A CN202010705776 A CN 202010705776A CN 112582022 B CN112582022 B CN 112582022B
Authority
CN
China
Prior art keywords
embryo
cnv
mos
euploid
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010705776.9A
Other languages
Chinese (zh)
Other versions
CN112582022A (en
Inventor
邹央云
姚雅馨
陆思嘉
薄世平
夏滢颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xukang Medical Science & Technology Suzhou Co ltd
Original Assignee
Xukang Medical Science & Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xukang Medical Science & Technology Suzhou Co ltd filed Critical Xukang Medical Science & Technology Suzhou Co ltd
Priority to CN202010705776.9A priority Critical patent/CN112582022B/en
Publication of CN112582022A publication Critical patent/CN112582022A/en
Priority to PCT/CN2021/107600 priority patent/WO2022017414A1/en
Application granted granted Critical
Publication of CN112582022B publication Critical patent/CN112582022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Abstract

The present invention provides a combination of biological features and biological characteristics useful for embryo euploid probability prediction and embryo implantation priority ranking based on embryo culture fluid and/or blastocoel fluid. The invention also relates to a method of inputting the magnitudes of these biological characteristics into a classification system, such as a random forest, to predict embryo euploid probabilities, and a method of using the embryo euploid probabilities to rank the implantation priority of embryos. The invention also provides for the use of agents comprising combinations for determining said biological features and biological features in the prediction and rating methods of the invention, as well as systems and computer program products for use in said prediction methods and rating methods.

Description

System and method for non-invasive embryo transfer priority rating
The present invention relates to the field of assisted reproduction technology, and more particularly to a non-invasive embryo screening (NICS) method for determining embryo transfer priority in vitro fertilization-embryo transfer (IVF-ET), and related combinations of biological characteristics, as well as a classification, prediction and ranking system and computer program product.
Background
In vitro fertilization-embryo transfer (IVF-ET) is increasingly adopted for the treatment of infertility. Annually, there are millions of IVF-ETs implemented worldwide. Conventional IVF-ET procedures, which typically transfer more than one embryo per treatment cycle, result in a multiple pregnancy rate of about 20%. Multiple pregnancies increase the occurrence of adverse events that may interfere with maternal and fetal health. Selective single embryo transfer (eSET) is the most effective method to reduce multiple pregnancy and has become increasingly internationally accepted, but this technique is often accompanied by relatively low pregnancy rates. Therefore, the selection of good quality embryos for eSET that can lead to the best clinical outcome is the key to successful implementation of IVF-ET, and this also places demands on the system for evaluation of embryo implantation potential.
In IVF treatment, error-prone during early embryonic development often leads to aneuploid human embryos, resulting in low clinical pregnancy rates and high miscarriage rates (b.hodes-Wertz et al, idiophathic recurrent mix of used mostly by and associated with embryo organizations, fertil. steril.98,675-680 (2012)). Pre-embryo implantation genetic screening is widely used to screen embryos for normal ploidy. The screening method mainly used at present is pre-implantation aneuploidy genetic screening (PGT-a), in which embryo biopsies are performed for chromosome ploidy analysis before implantation of embryos into the uterus. Clinical studies have demonstrated that the use of PGT-a improves the IVF clinical outcome of eSET (r.t. scott et al, Comprehensive chrome screening is high purity preliminary of the productive positive of human emulsions: a productive, blanked, non-productive study.polar.Steril.97, 870-875 (2012)). However, this embryo biopsy procedure is invasive, and may have adverse effects and unknown health risks to the clinical outcome of embryo implantation and long term development of the embryo. It has been shown that The greater The Number and amount of trophoblasts taken in a biopsy, The lower The embryo implantation rate (S. Zhang et al, Number of biocompatible cells is tissue to effect The implantation of embryos with porous ceramic quality. Fertil. Steril.105,1222-1227 (2016); L.Guzman et al, The Number of biocompatible cells is mass after implantation of embryos. J.Assist. Gene. 36,145-151 (2019)). Furthermore, embryo biopsies require specialized equipment and skill, are difficult to standardize, and present a significant challenge to each SET treatment. Therefore, there is a need for effective non-invasive chromosome screening methods for use in IVF clinical practice to prioritize embryo implantation and improve the clinical outcome of IVF.
Recently, with the development of whole genome sequencing technology and the discovery of fetal DNA in embryo culture fluid, non-invasive pre-implantation screening possibilities for embryo karyotype and ploidy detection using embryo culture fluid as a sample have been provided.
After Stigliani et al first observed the presence of genomic DNA in embryo culture fluids (S.Stigliani et al, Mitochondrial DNA content in embryo culture medium is designed with human embryo segmentation. hum. reprod.28,2652-2660(2013)), several studies reported the use of embryo culture fluids or blastocoel fluids for the analysis of chromosome ploidy. According to different study designs, the reported fold concordance rate varied significantly from 33% to 100% compared to whole embryos. This may be due to the presence of various confounding factors in the culture broth, including, for example, confounding sources of DNA, such as maternal DNA, DNA from apoptotic embryonic cells, and DNA from normal embryonic cells. This makes it a great challenge in the art to use embryo culture fluids to accurately reflect the aneuploidy of embryos and to construct an effective prediction and ranking system.
Furthermore, although there have been some reports of clinical applications for non-invasive chromosome screening (NICS). For example, Xu et al performed clinical NICS and obtained 5 live fetus from 7 couples (J.xu et al, Noninvasive chromosome screening of human embryos by genome sequencing of embryo culture medium for in vitro transfer. Proc.Natl.Acad.Sci.U.S. A.113,11907-11912 (2016)). Long et al obtained a 58% sustained pregnancy rate in a Pilot clinical study using NICS and reported 27 normal live-born fetuses (R.Fang et al, Chromosome screening using culture medium of embryos transformed in vitro: a Pilot clinical study. J.Transl.Med.17,73 (2019)). However, large-scale embryo validation studies are still lacking. There is an unmet need in the art for an effective embryo implantation priority ranking system that is clinically validated through large scale experimentation and blinding.
Summary of The Invention
Through intensive research, the inventor develops a computer-aided method for predicting the probability of the blastula aneuploidy based on non-invasive embryo chromosome detection (NICS) and machine learning algorithm (RF) on an embryo culture solution, and verifies the method in prospective clinical observation research. In conjunction with studies of clinical outcome of embryo transfer, the present inventors have further established a ranking system for assessing embryo implantation priority based on predicted euploid probability levels. The rating system of the present invention well balances the two important indicators of euploid prediction and specificity. Moreover, clinical trial results demonstrate that screening embryos for clinical IVF-ET using the embryo implantation priority ranking system of the invention can effectively improve clinical outcomes of IVF-ET, such as sustained pregnancy rates and miscarriage rates, and has the advantage of reducing the cycle abrogation rate of Frozen Embryo Transfer (FET).
Thus, in one aspect, the present invention provides a classification system and method that can evaluate a set of biological characteristics of a subject (a test embryo to be transplanted) using a classifier, such as a random forest, by computer-aided execution.
In yet another aspect, based at least in part on the classification system of the present invention, the present inventors provide methods and systems for physiological characterization (i.e., embryonic euploid status) in a subject, comprising first obtaining a physiological sample of the subject (i.e., the "spent" culture fluid and/or blastocoel fluid, especially blastocyst fluid, produced during in vitro culture of a test embryo); then, extracting a series of (total 11) magnitudes of biological features from the sample; and classifying the sample based on said plurality of characteristic magnitudes using the classification system of the present invention, said classification being related to a physiological characteristic (euploid/aneuploid status) of the subject (test embryo) (preferably, the classification may be expressed as a predicted euploid probability of the test embryo). Typically, the classification system comprises a machine learning system, such as a random forest classifier.
In yet another aspect, the present invention provides an index and corresponding rating system and method that can be used to indicate the implantation priority of a test embryo based on the physiological characterization method of the present invention.
In the system and method of the present invention, preferably, in order to build a classifier model for classification, CNVs are detected on samples of a plurality of training subjects (i.e., culture fluid and/or blastocoel fluid samples of embryos with known chromosomal ploidy), the magnitudes of the 11 biological features of the present invention and the corresponding classification labels are extracted, and a training data set is constructed. Typically, the biological characteristic magnitude obtained from each training subject is contained in the training data vector corresponding to that subject. In the present invention, preferably, the training data set comprises at least 100, 200, 300 or 500 training vectors.
In one embodiment, a machine learning classification system, particularly a random forest, is trained, preferably using a training data set, to generate a euploid probability prediction model based on embryo culture fluid and/or blastocoel fluid.
In one embodiment, the prediction performance of the model is evaluated by plotting ROC curves for the predictive model using a training data set and cross validation (e.g., 10 fold cross validation). In one embodiment, the accuracy (AUC of ROC curve), specificity, sensitivity, and/or Negative Predictive Value (NPV) of the model prediction is above 0.7, preferably above 0.8, more preferably above 0.9.
In one embodiment, the test data is classified using a trained classifier model to obtain the predicted euploid probability of the test embryo.
In yet another embodiment, the embryo transfer is prioritized based on the euploid probability of the test embryo predicted by the model, preferably, when the probability value is greater than about 0.90, more preferably greater than about 0.94, the embryo is determined to be a first transfer priority;
determining an embryo as a second transfer priority when the probability value is between 0.90-0.60, more preferably between 0.94-0.70;
when the probability value is less than 0.60, more preferably less than 0.70, the embryo is determined to be a third transfer priority.
As demonstrated by the clinical results in the examples, the use of the rating system of the present invention can help to improve the success rate of embryo transfer (e.g., miscarriage rate and sustained pregnancy rate) and reduce the FET cycle abrogation rate.
In further embodiments, the embryo priority rating system of the present invention may be used in combination with a priority rating system based on embryo morphology.
Brief Description of Drawings
An illustrative example of training set construction is shown in fig. 1, where a > 50% chimerism ratio is used as a threshold for "abnormal" embryo reporting.
FIG. 2 shows ROC curves for different prediction models established with different chimerism ratios as thresholds.
FIG. 3 shows ROC curves for different prediction models established at different resolutions using different chimerism ratios as thresholds.
FIG. 4 shows a schematic of the establishment of an embryo ranking system by machine learning methods (A); and the rate of concordance (B) of the euploid probability predicted by the model on the blastocyst broth with the ploidy state of the chromosome determined using the entire blastocyst.
Figure 5 shows the performance of the embryo ranking strategy using non-invasive chromosome screening (NICS) results. (A) Based on different euploid probability values obtained by a random forest method as threshold values, embryos are classified into A, B and C levels (an ROC curve of a random forest prediction model is generated, and embryos are classified into A, B and C levels according to the euploid probability level obtained by the model). (B) The proportion of embryos of each grade in the total sample; 37% of A grade; 29 percent of grade B; grade C is 34%. (C) The proportion of euploid in each embryo grade rated based on the NICS results was: class a, 91.4%; b stage: 76.4 percent; c level: 34.6 percent.
FIG. 6 shows the ploidy patterns of different graded embryos from culture broth. Wherein CNV represents copy number variation; arm represents an abnormality at the level of the chromosome Arm; whole indicates abnormalities at the level of the Whole chromosome; MicroCNV indicates chromosomal segment abnormalities with small segment sizes; mos-30, Mos-40, Mos-50, Mos-60, Mos-70 respectively represent the mosaic ratio of 30,40,50,60, 70%; mos-100 indicates chromosomal abnormalities without chimerism.
Fig. 7 shows a comparison of the effect of different chimerism threshold values on model performance using an expanded training set.
FIG. 8 shows embryo ranking strategy for ranking embryos using the expanded training set into A, B, C grades.
Figure 9 shows a flow chart of a prospective observation study.
Figure 10 shows the pregnancy outcome for patients with different NICS ratings.
FIG. 11 shows a schematic representation of the prioritization of clinical embryo transfer using the NICS screening method. Grade a embryos are the first transfer priority; class B embryos can be considered when there are no class a embryos in the transfer cycle. If there are only grade C embryos, the patient should be informed of the re-rating and risk of miscarriage and may choose whether to continue the transfer.
Detailed Description
Preimplantation Genetic Screening (PGS) is an early prenatal Screening method for detecting whether Genetic material abnormality exists in an embryo by performing biopsy and Genetic analysis on an early embryo before embryo planting. In general, PGS involves embryonic tissue from development to day 3 (cleavage stage) or day 5 (blastocyst stage), with a small number of cells taken for analysis by biopsy. Cleavage stage biopsy 1-2 blastomeres were biopsied in most cases. Blastocyst biopsies typically biopsy 3-6 trophoblast cells.
Relative to an invasive PGS screening method, the invention provides a non-invasive screening method (NICS) for embryo culture fluid and/or blastocoel fluid. The NICS method of the invention extracts a specific biological characteristic combination suitable for predicting the embryo chromosome euploid probability by analyzing and detecting more than 10M CNV of all 24chromosome pairs. Clinical tests prove that the screening method based on the blastocyst culture solution and/or blastocyst cavity solution can effectively realize the selection of high-quality embryos, is beneficial to improving the clinical outcome of the embryos after being implanted into the uterus, and has the advantage of reducing the cancellation rate of the embryo transfer cycle.
Before the present invention is described in detail, it is to be understood that this invention is not limited to the particular methodology and experimental conditions set forth herein as such may vary. In addition, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
Definition of
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. For the purposes of the present invention, the following terms are defined below.
The term "about," when used in conjunction with a numerical value, is intended to encompass a numerical value within a range having a lower limit that is 5% less than the stated numerical value and an upper limit that is 5% greater than the stated numerical value.
The term "and/or" when used to connect two or more selectable items should be understood to mean either one of the selectable items or any two or more of the selectable items.
As used herein, the term "comprising" or "comprises" is intended to mean including the stated elements, integers or steps, but not excluding any other elements, integers or steps. When the term "comprising" or "includes" is used herein, unless otherwise specified, it also encompasses the presence of stated elements, integers or steps.
As used herein, "biological characteristic" or "characteristic" refers to any biological characteristic that can be objectively measured as a characteristic indicator indicative of an embryo's euploid properties. For the purposes of the present invention, biological characteristics include chromosomal ploidy patterns extracted from embryo culture fluid and/or blastocoel fluid, including, for example, the location of chromosomal copy number abnormalities (CNVs), region sizes, chimerism ratios, and the like. The "magnitude" or "feature value" of a biological feature refers to information associated with the biological feature that can be used to characterize the likelihood that the corresponding embryo in the sample is euploid/aneuploid. For example, such information may be a quantity that qualitatively or quantitatively indicates the following properties of the CNV: for example, an embryo with a specific pattern of CNV as a threshold "normal"/"abnormal" reported a condition, the number of chromosomes involved in the CNV, whether the CNV involved sex chromosomes, the chimerism proportion of the CNV with the highest chimerism proportion, the type of CNV with the highest chimerism proportion. In this context, each feature may be represented as a dimension of a vector space, where each vector is a multi-dimensional vector comprising a plurality of feature quantities associated with a particular object. Accordingly, the dimensions of the vector space correspond to the size of the set of biometric features. A machine learning algorithm may be trained to find patterns consisting of magnitudes of a plurality of features of a feature set to generate a classifier model for predictive classification of test data.
As used herein, "classifier" refers to machine learning algorithms such as support vector machines, AdaBoost classifiers, logistic regression, elastic networks, gradient tree boosting systems, naive bayes classifiers, neural networks, bayesian neural networks, k-nearest neighbor classifiers, and random forests. The invention especially considers random forests, but also other classifiers listed, and combinations of one or more classifiers.
The term "training set" refers herein to a set of training samples comprising a plurality of broth and/or blastocoel fluid samples from a plurality of embryos of known ploidy. In the present invention, a training set is used to develop a predictive model to analyze test samples. In one embodiment, the training set includes samples from at least 100 or more. The biometric features extracted on the training samples are herein included in the training data vector.
The term TN (true negative), as it relates to the classification system and method of the present invention, refers herein to euploid embryos that are correctly classified by the model using the whole embryo as the gold standard. TP (true positive) refers to aneuploid embryos correctly classified by the model. FN (false negative) refers to euploid embryos that were misclassified by the model. FP (false positive) refers to aneuploid embryos misclassified by the model.
Herein, the euploid prediction value (NPV, negative prediction value) is calculated as TN/(TN + FN). The higher the NPV value, the better the transplanted embryo is euploid, thus reducing the failure rate of pregnancy.
In this context, specificity is calculated as TN/(TN + FP). In the case of a relatively low false negative rate, high specificity may ensure that as many embryos as possible are available for transfer, thereby reducing the embryo cycle abrogation rate.
Herein, embryo transfer cycles have the meaning generally known to those skilled in the art, including fresh cycles and frozen embryo transfer cycles (FETs). In some embodiments, the cycle is preferably a frozen embryo transfer cycle.
In this context, the term "subject" (including "subject" and "training subject") refers to an embryo obtained by in vitro fertilization technology, in a manner complying with the ethical principles of assisted reproduction technology prescribed by a qualified regulatory body, with an in vitro culture period of no more than 10 days from the start of fertilization, in particular a blastocyst from D5 to D6.
The terms "culture broth" or "spent culture broth" are used interchangeably herein to refer to the spent culture broth, also referred to herein as used culture broth, produced during the in vitro culture of embryos of IVF-ET and remaining after removal of the cultured embryos. Similarly, as used herein, "blastocyst broth" or "spent blastocyst broth" are used interchangeably to refer to the portion of medium used to culture embryos to the blastocyst stage during in vitro culture of embryos in IVF-ET, the waste broth remaining after removal of the blastocyst, and is also referred to herein as spent blastocyst broth. In some embodiments of the invention, an embryo culture of D1-D6 is used, preferably an embryo culture of D3-D5, D4-D6 or D5-D6, for example, an embryo culture of D3, D4, D5 or D6.
As will be appreciated by those skilled in the art, the culture medium may or may not be changed during the in vitro culture of IVF embryos. Thus, in some embodiments, the culture medium used may be an embryo culture medium that has not been changed from day 1 of in vitro culture (D1) to the day of blastocyst removal (e.g., D5 or D6). In other embodiments, the culture medium used may be one that is used for embryo culture after a fluid change, for example, a culture medium used for culturing embryos to blastocysts (e.g., D5-D6) after a fluid change from D3.
In this context, terms such as D1, D3, D4, D5, D6, and the like, when referring to embryos and embryo culture fluids, refer to day 1, day 3, day 4, day 5, day 6 embryos cultured in vitro and culture fluids thereof. Similarly, embryos and embryo culture solutions of D1-D6 refer to embryos and culture solutions thereof from day 1 to day 6 of in vitro culture, including, for example, but not limited to, embryos and culture solutions thereof from day 1 to day 3 of in vitro culture (D1-D3), from day 3 to day 5 (D3-D5), from day 3 to day 6 (D3-D6), from day 4 to day 5 (D4-D5), from day 4 to day 6 (D4-D6), and from day 5 to day 6 (D5-D6).
As used herein, the term "sample" or "culture fluid sample" or "blastocoel fluid sample" refers to a sample of culture fluid or blastocoel fluid obtained from an embryo cultured in vitro using non-invasive means, which contains DNA nucleic acid from the embryo cultured in vitro. In some embodiments, the sample according to the invention is a culture fluid sample. In other embodiments, the sample according to the invention is a blastocoel fluid sample, and methods for collecting blastocoel fluid without destroying the embryo are known in the art.
As used herein, the term "culture fluid genetic test", such as "culture fluid CNV pattern test", refers to the use of spent culture medium (i.e., spent culture fluid after removal of embryos) produced during the in vitro culture of embryos to reflect the genetic properties of the embryos, such as chromosomal copy number variation, by genetically testing for free DNA present in the culture fluid.
As used herein, the term "detection reagent" refers broadly to a reagent used to detect the CNV pattern of DNA nucleic acids in a sample. These reagents may be used to form the kits of the invention for use in the prediction and ranking methods of the invention.
As used herein, the term "sustained pregnancy rate" means that any pregnancy in which pregnancy lasts more than 12 weeks is considered sustained pregnancy. Ectopic pregnancy is considered to be a clinical pregnancy rather than a single abortion.
In the present invention, the term "clinical pregnancy rate" means the number of cycles of gestational sacs observed by vaginal ultrasound at 7-8 gestational weeks divided by the total number of transplantation cycles.
In the present invention, the term "miscarriage rate" refers to the number of pregnancy failures after a gestational sac has been recorded by vaginal ultrasound, divided by the total number of clinical pregnancies.
The term "CNV", also known as copy number abnormality (copy number variation) and copy number variation, refers herein to an abnormality in the copy number of a genomic nucleic acid sequence. The abnormality may be a change in chromosome copy number caused by a deletion, insertion, replication, doubling, inversion, translocation, or the like. In general, CNVs can be structural alterations at the microscopic and sub-microscopic levels. The genomic structural variation at the microscopic level includes chromosomal structural variation that is visible at the microscopic level. The genomic structural variation at the sub-microscopic level refers to the genomic structural variation with fragment lengths ranging from 1kb to 3 Mb. In the present invention, CNV refers to CNV with fragment length greater than 1M, preferably with fragment length greater than 10M in some embodiments. In some preferred embodiments, the CNVs of the invention can be divided into the following three levels, depending on the size of the chromosomal region involved: copy number variation of gene fragments with fragment sizes above 10M, copy number variation at arm level (short arm and long arm), and copy number variation of the entire chromosome.
Herein, the resolution of the CNV refers to the copy number abnormality detected at different resolutions, accordingly. For example, a 10M resolution CNV refers herein to the presence of copy number abnormalities of chromosome fragments (e.g., 45Mb fragments) of 10M or more, such as duplication or deletion of fragments, and the like. Arm resolution CNV includes herein, copy number abnormalities, such as deletions or duplications, at the short and long arm level. For example, if chromosome 17 short-arm repeats, resulting in an increase in the number of copies of the short-arm from the original 2 copies to 3 copies, the arm resolution CNV can be expressed as +17p (x 3); and the short arm of chromosome 17 is deleted, which results in the reduction of the copy number of the short arm from 2 copies to 1 copy, the arm resolution CNV can be expressed as-17 p (x 1). Similarly, chromosome resolution CNV refers herein to an abnormality in copy number of the entire chromosome, such as a deletion or duplication of the entire chromosome. For example, chromosome 22 repeats, resulting in an increase in the copy number of the entire chromosome from the original 2 copies to 3 copies, the chromosome resolution CNV can be expressed as +22(x 3); and the chromosome 22 is deleted from the whole chromosome, resulting in the reduction of the copy number of the whole chromosome from 2 copies to 1 copy, the chromosome resolution CNV can be represented as-22 (x 1). It will be appreciated that due to the chimeric nature of the embryo, CNVs of these resolutions may be present in chimeric form in embryos produced by in vitro fertilization, for example, the chimerism ratio may be 30%, or greater than 70%, or 100%. Likewise, due to the heterogeneous origin of DNA in embryo culture fluid and blastocoel fluid, CNVs detected in culture fluid and blastocoel fluid may also be present in a chimerism ratio, which may be, for example, 30%, or greater than 70%, or 100%.
Herein, the term "aneuploidy" in relation to CNV refers to an imbalance of genetic material caused by the acquisition or loss of an entire chromosome or a portion of a chromosome. For the purposes of the present invention, it is understood that aneuploidy exists for CNVs of different resolutions, for example 10M, arm or chromosome level CNVs.
In this context, the term "chimeric" in relation to CNV means that the presence of both euploid and aneuploid karyotypes are recognized on the embryo or culture fluid and blastocoel fluid for a particular chromosomal region (which may be whole chromosomes, chromosomal arms, chromosomal segments). "mosaic ratio" means the percentage of aneuploid karyotypes and may range from 0% to 100%. When the mosaic ratio is 0%, it means that only the euploid karyotype is recognized; on the other hand, when the mosaic ratio is 100%, it means that only an aneuploid karyotype is recognized.
In this context, the CNV information of the embryo or culture fluid and/or blastocoel fluid can be expressed using the conventional CNV karyotype description of the invention and the corresponding symbols. For example, p represents the short arm of the chromosome and q represents the long arm of the chromosome; mos denotes the chimera, the subsequent percentages of which denote the chimera ratio; "+" indicates copy number acquisition; "-" indicates that the copy number is lost; "x" represents the copy number. The normal karyotype of humans, corresponding to the abnormal karyotype, is: female 46, XX, male 46, XY.
Various aspects of the invention are described in further detail below.
Method of the invention
In one aspect, the present invention provides a NICS (non-invasive chromosome screening) method comprising the steps of:
(1) obtaining test data from a culture solution and/or a blastocoel fluid of a test embryo;
(2) classifying the test data on the trained classifier model;
(3) obtaining a predicted euploid probability of the test embryo based on the classification; and
(4) the transfer priority of the test embryo is determined.
In still another aspect, the present invention also provides a test data classification method, an embryo euploid probability prediction method, and an embryo transfer priority ranking method related to the NICS method of the present invention.
Various aspects of the methods of the invention are described further below. These descriptions are mainly made by taking a culture solution as an example. However, as will be appreciated by those skilled in the art, any of the technical features and aspects above and below relating to the culture solution, and combinations thereof, are equally applicable to the blastocoel fluid of the present invention, and combinations of the culture solution and the blastocoel fluid. Therefore, the technical scheme of using the blastocoel fluid of the invention instead of the culture solution and the technical scheme of using the blastocoel fluid of the invention in addition to the culture solution are both covered in the scope of the invention. In the detailed description that follows, when reference is made to "culture fluid" it is to be understood that the term also encompasses "blastocoel fluid" and "culture fluid and blastocoel fluid" unless explicitly stated otherwise.
I. Culture fluid and/or blastocoel fluid CNV patterns and detection thereof
The NICS screening method of the present invention is based on the identification and application of CNV patterns in embryo culture fluids and/or blastocoel fluids.
For the purposes of the present invention, the term "CNV pattern", also referred to herein as a "chromosomal ploidy pattern", includes CNV patterns identified on a sample, including, but not limited to, resolution, type and chimerism ratio of CNVs, asexual-chromosome CNVs, number of abnormal chromosomes with CNVs, chimerism ratio of CNVs of highest chimerism ratio, type of CNVs of highest chimerism ratio. In some embodiments, the CNV pattern of the culture fluid and/or blastocoel fluid can be identified by mapping genomic sequence information obtained on the culture fluid and/or blastocoel fluid samples onto a reference genome (e.g., a full-length human genomic sequence, such as the human h19 genome). In some embodiments, CNV patterns identified from embryo culture fluid and/or blastocoel fluid may be used to predict the euploid probability of an embryo, preferably as a biometric input classifier in the computer-assisted methods of the invention.
In one embodiment of the invention, the culture fluid and/or blastocoel fluid CNV pattern detection involves detecting CNVs selected from 10M resolution, arm resolution and chromosome resolution, or a combination thereof. In another embodiment of the invention, the broth and/or blastocoel fluid CNV pattern detection involves detecting the presence or absence in the sample of 10M resolution, arm resolution and chromosome resolution CNV or a combination thereof greater than a defined chimerism ratio, preferably a 50% chimerism ratio.
Various techniques have been developed for copy number karyotyping of all 23 chromosomes (22 for autosomes and 1 for sex chromosomes) in humans, such as CCS-based karyotyping methods. These techniques are also applicable to the culture broth and/or blastocoel fluid CNV pattern analysis of the present invention. See: chen et al, Can complex chromosome screening technology improved IVF/ICSI outcomesome A meta-analysis PLoS ONE 10, e0140779 (2015). CCS-based karyotyping methods can be performed in a variety of ways, and typically include a DNA amplification step and an evaluation step for the amplified DNA. For DNA amplification, the means may include multiple annealing circular cycle amplification technique (MALBAC), pre-amplification primer extension PCR, degenerate oligonucleotide primer PCR (DOP-PCR); multiple displacement amplification technology (MDA); for the evaluation of amplified DNA, for example, CGH (synthetic genetic hybridization) array, SNP (single nucleotide polymorphism) array, NGS (next generation sequencing) -based CCS and qPCR-based CCS can be used. The practice of these methods is known in the art.
In a preferred embodiment of the invention, the CNV pattern of the broth and/or blastocoel fluid is identified using NGS-based CNV detection methods. For NGS-based CNV detection, see the exterior sequencing and floor sequencing for the detection of copy number variation, exterior rev. mol. diagn. early online, 1-10 (2015).
In yet another preferred embodiment of the present invention, an exemplary method for identifying media and/or blastocoel fluid CNV patterns comprises the steps of whole genome amplification, NGS sequencing and sequence copy number analysis, which more preferably comprises the steps of:
-whole genome sequencing data on a culture fluid and/or blastocoel fluid sample,
-extracting high quality reads from sequencing data, mapping to a human reference genome (e.g. hg19 genome);
-counting the number of reads along the whole genome after removing duplicate reads, wherein the relative copy number is expressed after scaling by GC content and reference dataset using 1Mb of bin size;
-the copy number of each bin is segmented by a cyclic binary segmentation algorithm (CBS), bins with similar trends are merged, and a final copy number segmentation is calculated.
In some embodiments, the genomic amplification method is a ChromInst amplification method. See chinese patent ZL 201610264059.0.
In this method, the Coefficient of Variation (CV), calculated as the ratio of the standard deviation of the read density to its mean, can be used to assess whether the amplification was successful.
After CNV detection, preferably, a chromosome copy number map may be drawn to visualize CNVs. For example, CNV spectrogram visualization can be performed on all 24 chromosomes using an R program to map the copy number of each BIN.
For methods for determining copy number variation based on NGS, see also Genome-wide detection of single nucleotide and copy number variations of a single human cell, Science 338: 1622-.
Extracting biological characteristics and their magnitudes
In the present invention, after identifying the CNV pattern of the sample nucleic acid on the culture fluid and/or blastocoel fluid samples, the biological characteristics and their magnitudes used in the prediction model of the present invention will be extracted.
In this regard, the present inventors have conducted intensive studies to suggest that, in addition to the abnormal chromosome percentage, characteristics such as chromosome gain/loss, CNV fragment size, genomic position, and abnormal patterns should also be considered in calculating the normal probability of an embryo. Thus, in an embodiment of the invention, the features of the machine learning classifier used in the invention include the following: features associated with chromosomal chimerism ratio, size of aneuploid fragments, and polysomal abnormalities.
In a preferred embodiment, the following biological characteristics are extracted from the culture broth and/or blastocoel fluid genetic test results: the "normal"/"abnormal" case of embryos with a specific pattern of CNVs as a threshold, the number of chromosomes involved in the CNVs, whether the CNVs are involved in sex chromosomes, the chimerism ratio of the highest chimerism ratio of the CNVs, and the type of the highest chimerism ratio of the CNVs.
In a more preferred embodiment, the present invention relates to the following 11 biological characteristics and their use in the classification, prediction and ranking methods of the present invention:
the method is characterized in that: the 10M resolution CNV results (also abbreviated as "10M CNV"). This feature describes the "normal"/"abnormal" reporting of embryos thresholded for the presence of 10M resolution CNVs. Accordingly, the characteristic value of the feature is designated as "Normal" (Normal, N) when the sample does not have CNV at a level of 10M or more, and is designated as "Abnormal" (Abnormal, a) otherwise.
And (2) feature: a 10M resolution CNV result (also abbreviated as "10M _ mos CNV") redefined at a prescribed mosaic ratio. This feature describes the "normal"/"abnormal" reporting of embryos with a threshold of the presence or absence of 10M resolution CNVs greater than a specified chimerism ratio. Accordingly, when the sample does not have a 10M resolution CNV larger than the prescribed mosaic ratio, the feature value of the feature is designated as "normal" (N), otherwise the feature value is designated as "abnormal" (a);
and (3) feature: arm resolution CNV results (also abbreviated as "Arm CNV"). This feature describes the embryo "normal"/"abnormal" reporting with the presence or absence of arm resolution CNV as a threshold. The characteristic value of this feature is designated "Normal" (Normal, N) when the sample does not have CNV at arm level, otherwise it is designated "Abnormal" (Abnormal, a).
And (4) feature: arm resolution CNV results (also abbreviated as "Arm _ mos CNV") redefined at a prescribed mosaic ratio. This feature describes the case of an embryo "normal"/"abnormal" report with the presence or absence of arm resolution CNV greater than a prescribed chimerism ratio as a threshold. Accordingly, when the sample does not have an arm resolution CNV larger than the prescribed fitting ratio, the feature value of the feature is designated as "normal" (N), otherwise the feature value is designated as "abnormal" (a).
And (5) feature: chromosome resolution CNV results (also abbreviated as "Chrom CNV"). This feature describes the "normal"/"abnormal" reporting of embryos thresholded for the presence or absence of chromosome resolution CNV. The characteristic value of the feature is designated "Normal" (Normal, N) when the sample does not have CNV at the chromosome level, otherwise the characteristic value is designated "Abnormal" (Abnormal, a).
And (6) feature: the result of the chromosome resolution CNV redefined according to a prescribed mosaic ratio (also abbreviated as "Chrom _ mos CNV"). This feature describes the case of an embryo "normal"/"abnormal" report with the presence or absence of chromosome resolution CNV greater than a prescribed chimerism ratio as a threshold. Accordingly, when the sample does not have a chromosome resolution CNV larger than the prescribed mosaic ratio, the feature value of the feature is designated as "normal" (N), otherwise the feature value is designated as "abnormal" (a).
And (7) feature: count of euploids at different CNV resolutions (also abbreviated as "NormalNum"). This feature describes the number of "normal" events that occurred with the aforementioned 6 different embryo "normal" reporting thresholds set. Accordingly, the feature value of the feature is equal to the number of "normal" features to which the feature value is assigned among the aforementioned features 1 to 6.
And (2) characteristic 8: number of abnormal chromosomes (also abbreviated as "Ab _ ChromNum"). This feature describes the number of abnormal chromosomes with CNV. Accordingly, the feature value of the feature is equal to the number of the abnormal chromosomes detected in the sample. For example, if a CNV at levels above 10M is detected in the sample to occur only on chromosomes 14 and 17, the eigenvalue for this feature will be 2.
And (2) characteristic 9: maximum CNV mosaic ratio (also abbreviated as "Mos _ rate"). This feature describes the chimerism proportion of the CNV with the highest chimerism proportion among all CNV types detected. Accordingly, the characteristic value of the characteristic is a numerical value of the fitting ratio, ranging from 0% to 100%.
The characteristics are as follows: the Type of CNV of the maximum mosaic ratio (also abbreviated as "Mos _ Type"). This feature describes the type of CNV with the highest chimerism ratio among all CNV types detected. Accordingly, the eigenvalues of the signature will be assigned to the "whole chromosome" (whole)/"short arm" (p)/"long arm" (q)/CNV fragment size (e.g. 45 Mb).
And (2) characteristic 11: sex chromosome abnormality (also abbreviated as "Ab _ SexChrom"). This feature describes whether only CNV abnormalities of the sex chromosomes are present. Accordingly, the feature value of the feature will be designated as "YES" (YES) if only CNV of the sex chromosome is abnormal, and will be designated as "NO" (NO) otherwise.
In addition to the 11 features described above, the biological features used in the classification, prediction and ranking methods of the present invention may also include other biological characteristics that can be objectively measured as a characteristic indicator indicative of the properties of an embryo euploid, such as additional 1,2, 3 or more features, as will be appreciated by those skilled in the art. Such biological characteristics may include, for example, but are not limited to, copy number abnormalities of chromosome fragments of 1M or more (i.e., 1M resolution CNV), copy number abnormalities of chromosome fragments of 4M or more (i.e., 4M resolution CNV), and chimerism ratios thereof.
Class label and chimerism ratio reporting thresholds
In the present invention, in order to construct a suitable training data set, each training data vector in the data set should also contain a classification label as to whether the corresponding embryo is euploid or aneuploid.
In the case of embryo ploidy analysis, the present invention employs two classifications to describe the euploid/aneuploid status of the embryo: "normal" (corresponding to a "euploid" condition) and "abnormal" (corresponding to an "aneuploid" condition), and the inventors have found that setting certain "normal"/"abnormal" embryo reporting thresholds for the purposes of the present invention is advantageous for the modeling of the present invention.
More specifically, in order to seek a consistent rate between broth and/or blastocoel fluid model predictions and whole embryo results, the present inventors compared the performance of prediction models obtained by setting different reporting thresholds (see examples). It was found to be advantageous for the model set-up to set the CNV chimerism ratio threshold for reporting in cases involving embryo "normal"/"abnormal" reporting.
Thus, in a preferred embodiment, the classification tags of the present invention are defined as: designated as "normal" or "abnormal" based on whether there is more than a specified chimerism of CNV (e.g., a level of 1M above, preferably 10M above) in the embryo, wherein a CNV greater than a specified chimerism is designated as "abnormal" if present; if there is no CNV larger than the prescribed mosaic ratio, it is reported as "normal". Accordingly, when 10M _ mos, Arm _ mos, and Chrom _ mos features are extracted on a culture fluid and/or blastocoel fluid sample, the feature values of these features are also designated as "normal" or "abnormal" according to the prescribed chimerism threshold value.
In a preferred embodiment, the defined chimerism ratio threshold is selected from the group consisting of: 20%, 30%, 40%, 50%, 60%, 70%, 80%, and most preferably 50%.
Accordingly, in view of the effect of chimerism ratio reporting thresholds on the performance of the model, in one embodiment, the classification, prediction and rating method of the present invention further comprises the step of selecting chimerism ratio reporting thresholds prior to using the model for test data classification. For example, the method further comprises:
using different chimerism ratios as classification labels and thresholds for embryo report "abnormal" in features 10M _ mos CNV, arm _ mos CNV, and chrom _ mos CNV, generating corresponding prediction models,
-evaluating and comparing the effectiveness of the models generated using different reporting thresholds by means of ROC curves,
-selecting a fitting proportion threshold value that achieves optimal model performance for generating classifier prediction models and test data classification.
Method and system for classifying data using classification system
Machine learning algorithm-based classification systems have been applied in the medical field for data analysis and data mining in order to extract important information and patterns contained in large data sets. In general, learning machines contain algorithms that can be trained using data with known classifications to achieve generalization. Thereafter, the trained learning machine algorithm may be applied to new data with unknown classifications, which are sorted according to learned patterns.
In the present invention, the present inventors provide a method for classifying data (test data, i.e., biological characteristic quantity values) obtained from a sample of a test embryo culture fluid and/or a blastocoel fluid by data analysis and mining. These methods generally involve preparing or obtaining training data, training a classifier, generating a predictive model, and evaluating test data using the model. In the method, preferably classifiers are used, such as learning machines, including for example Support Vector Machines (SVMs), AdaBoost, logistic regression, naive bayes classifiers, classification trees, k-nearest neighbor classifiers, neural networks, random forests and/or combinations thereof. The trained classifier may then output a classification of the test sample based on the test data, such as a predicted euploid probability of the sample corresponding to an embryo.
Thus, in some embodiments, the classification method of the present invention comprises the steps of:
in a first step, a classifier is used to describe a predetermined data set. This is a "learning step" performed on the "training" data.
The training database is a computer-implemented data store reflecting magnitudes of a plurality of biological features of a plurality of subjects having known classifications associated with euploid status of embryos of the subjects. The storage format of the data may be a flat file, a database, a table, or any other retrievable data storage format known in the art. In an exemplary embodiment, the training data is stored as a plurality of vectors, each vector corresponding to a subject and including magnitudes of a plurality of biological features of the subject and classification tags relating to the embryo euploid status of the subject. Typically, each vector contains one respective entry for each of a plurality of biometric quantities. The training database may be linked to a network, such as the internet, so that its contents may be remotely accessed by authorized entities (e.g., individual users or computer programs). Alternatively, the training database may be located in a network-isolated computer.
In the second step (which is an optional step), the classifier is applied to a "validation" database, or cross-validation is used to observe model performance metrics, including accuracy, sensitivity and specificity. For example, in some embodiments, only a portion of the training database may be used for the learning step, while the remainder of the training database is used as the validation database.
Third, the biometric quantity values from the test subject are submitted to a classification system that outputs a classification (e.g., embryo euploidy probability) calculated for the test subject.
In accordance with the classification method of the present invention, in one aspect, the invention also provides a classification system and classifier suitable for performing the method.
Classification system applicable to classification method of the invention
The classification system of the present invention may comprise computer-executable software, firmware, hardware or various combinations thereof. For example, the classification system may include a reference to the processor and a supporting data store. Further, the classification system may be implemented on multiple devices or other components, local or remote to one another. Moreover, reference to software may include a non-transitory computer readable medium that when executed on a computer causes the computer to perform a series of steps.
The classification system of the present invention may include data storage, such as network accessible storage, local storage, remote storage, or a combination thereof. The data storage may utilize a disk array ("RAID"), tape, disk, storage area network ("SAN"), internet small computer system interface ("iSCSI") SAN, fibre channel SAN, common internet archive system ("CIFS"), network attached storage ("NAS"), network file system ("NFS"), or other computer accessible storage. In one or more embodiments, the data store may be a database, such as an Oracle database, a Microsoft (Microsoft) SQL Server database, a DB2 database, a MySQL database, a seebecs (Sybase) database, an object-oriented database, a hierarchical database, or other database. In one embodiment, the data store may utilize a flat file structure to store data.
Classifiers suitable for use with the present invention
Various methods for classification are known in the art, including the use of classifiers such as support vector machines, AdaBoost, decision trees, Bayesian classifiers, Bayesian belief networks (Bayesian belief networks), naive Bayesian classifiers, k-nearest neighbor classifiers, case-based reasoning, logistic regression, neural networks, random forests, or any combination thereof (see, e.g., Han J and Kamber M,2006, chapter 6, "Data Mining, Concepts and technologies" (Data Mining, Concepts and technologies), 2 nd edition, einwei (Elsevier): amstedan.). Any such classifier or combination of classifiers may be used in the classification methods and systems of the present invention, as described herein.
By way of non-limiting example, classifiers that may be used in the classification of data of the present invention include, but are not limited to, support vector machines, genetic algorithms, logistic regression, naive bayes classifiers, classification trees, k-nearest neighbor classifiers, neural networks, elastic networks, bayesian neural networks, random forests, gradient enhanced trees, and/or AdaBoost. As discussed herein, the classifier may be trained using training data.
In a preferred embodiment, the classification system of the invention comprises a random forest classifier.
The random forest is an integrated learner based on a bagging strategy and a plurality of decision trees. The class of its output is determined by the mode of the class output by the respective tree. The randomness of the random forest is mainly reflected in two aspects: (1) when each tree is trained, a subset is selected from all training samples to be trained (namely bootstrap sampling); evaluating the residual data to evaluate the error; (2) at each node, a subset of all the features is randomly selected for calculating the optimal segmentation.
In a preferred embodiment of the present invention, the random forest model may be generated as follows: 1) randomly segmenting the training data into subsets through bootstrap resampling; 2) generating a plurality of decision trees (e.g., 500) for each subset of the training dataset and the subset of features; 3) averaging the euploid probability predicted values from the plurality of decision trees to obtain a predicted value of the model; 4) using an out of bag error rate (out of bag error, OOB error), the model was evaluated; and 5) iteratively training the model until an optimal model is obtained.
The new observations can then be classified using a trained random forest model. For this purpose, each classification tree in the random forest is used to classify the new observed values, and the euploid probability prediction values from each tree are averaged to obtain the prediction value of the model.
Tools for implementing random forests are commercially available, for example, for statistical software computing languages and environments, R. For example, the R software package "random forest" of version 3.6.1 includes tools for creating, processing, and utilizing random forests.
Method embodiment of the present invention for classifying test data
Based on the foregoing classification systems and classifiers, in some embodiments, the invention also provides a method of classifying test data comprising a magnitude of a biological feature, which is a combination of magnitudes of each feature in a set of biological features, extracted from CNV pattern analysis results obtained on a subject embryo's culture fluid and/or blastocoel fluid.
In one embodiment, the method of classifying test data of the present invention comprises:
(a) accessing a set of electronically stored training data vectors, each training data vector representing a subject and containing a biometric quantity value for the subject (i.e., a combination of the quantity values of each feature in a set of biometric features), the training data vectors further including classification tags relating to the subject's embryonic euploid status;
(b) training a classifier or ensemble classifier as described herein using an electronically stored set of training data vectors;
(c) receiving test data;
(d) evaluating the test data using the trained classifier and/or the ensemble classifier; and
(e) based on the evaluating step, a classification of the test object, preferably a euploid probability of the test object, is output.
The output according to the present invention may include, for example, presenting information about the classification of the test object in electronic form.
In yet another embodiment, the present invention provides a method of sorting test data, preferably the method is an ex vivo computer-implemented method,
wherein the test data comprises magnitudes for a set of biological characteristics of a human test embryo,
the method comprises the following steps:
-receiving, on at least one processor, said test data comprising quantities of said set of features, wherein said quantities are a combination of quantities extracted for each feature of the set of features on a culture fluid and/or blastocoel fluid sample from the human test embryo;
-evaluating, on the at least one processor, the test data using a classifier, wherein said classifier has been trained using an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human individual embryo and comprising a magnitude of each feature of said set of features extracted on a broth and/or blastocoel fluid sample of the respective embryo, and each training data vector further comprising a classification label as to whether the respective embryo is euploid or aneuploid, preferably the classification label declaring "normal" or "abnormal" based on whether the embryo has more than a specified chimerism ratio of CNVs (e.g. more than 1M, preferably more than 10M horizontal CNVs);
outputting, using the at least one processor, a classification relating to the human test embryo, wherein said classification is a euploid probability of said embryo,
wherein the set of biological features comprises the following 11 features: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom,
preferably, a CNV chimerism ratio selected from the following is used as a threshold for classification tags and features 10M _ mos CNV, arm _ mos CNV, and chrom _ mos CNV to report as "normal" or "abnormal": 20%, 30%, 40%, 50%, 60%, 70%, 80%, preferably 50%, wherein above the threshold value, an "abnormal" is reported, otherwise a "normal" is reported.
In yet another embodiment, the present invention provides a method of sorting test data, preferably the method is an ex vivo computer-implemented method,
wherein the test data comprises magnitudes for a set of biological characteristics of a human test embryo,
the method comprises the following steps:
-accessing, using at least one processor, an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human individual embryo and comprising a magnitude of each feature of the set of features determined on a broth and/or blastocoel fluid of the respective embryo, and each training data vector further comprising a classification label as to whether the respective embryo is euploid or aneuploid, preferably the classification label reporting "normal" or "abnormal" based on whether the embryo has a CNV greater than a specified chimerism;
-training a classifier using the electronically stored training data vector;
-receiving, on the at least one processor, said test data, wherein said test data comprises a combination of the quantities determined for each feature of the set of features in a culture fluid and/or blastocoel fluid sample from a human test embryo;
-evaluating, on the at least one processor, the test data using said classifier;
outputting, using the at least one processor, a classification relating to the human test embryo, wherein said classification is a euploid probability of said embryo,
wherein the set of biological features comprises the following 11 features: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom,
preferably, a CNV chimerism ratio selected from the following is used as a threshold for classification tags and features 10M _ mos CNV, arm _ mos CNV, and chrom _ mos CNV to report as "normal" or "abnormal": 20%, 30%, 40%, 50%, 60%, 70%, 80%, preferably 50%, wherein above the threshold value, an "abnormal" is reported, otherwise a "normal" is reported.
In some embodiments of the methods of the invention, the classifier is selected from the group consisting of random forests, AdaBoost, Naive Bayes, Support Vector Machine, Neural Net Neural networks, Genetic Algorithms Genetic Algorithms, Elastic nets, Gradient Boosting Trees, Bayesian Neural networks, k-Nearest neighbors, or combinations thereof.
In some embodiments of the methods of the invention, the classifier comprises a random forest.
In some embodiments of the method of the invention, the classifier is a random forest classifier comprising 100-600 trees, for example 100 or more than 100 trees, for example 200 or more than 200 trees, 300 or more than 300 trees, 400 or more than 400 trees, or more preferably 500 trees.
In some embodiments of the methods of the invention, the training data set comprises at least 100, 200, 300, 400 or 500 training data vectors.
In some embodiments of the methods of the invention, the method further comprises: and drawing an ROC curve of the trained classifier model, evaluating the efficiency of the model, preferably drawing the ROC curve of the model through cross validation by using a training data set, and preferably enabling the accuracy of the model to be more than 0.9.
In some embodiments of the methods of the invention, prior to using the trained classifier for test data classification, the method further comprises:
using different chimerism ratios as classification labels and thresholds for embryo report "abnormal" in features 10M _ mos CNV, arm _ mos CNV, and chrom _ mos CNV,
-evaluating and comparing the effectiveness of classifier models generated using different thresholds by means of ROC curves,
-selecting a chimerism ratio threshold that achieves optimal model performance for generating the classifier model and the test data classification.
In some embodiments of the method of the invention, the extraction of said characteristic quantity values comprises the detection of 10M, arm and chromosome resolution CNV and the chimerism ratio thereof on a culture broth and/or blastocoel fluid sample,
preferably, the CNV detection comprises: whole genome amplification, NGS sequencing and sequence copy number analysis were performed on broth and/or blastocoel fluid nucleic acids, thereby identifying CNVs and their chimerism ratios.
In some embodiments of the methods of the invention, the culture broth and/or blastocoel fluid used to extract the biometric quantity is blastocyst culture broth and/or blastocoel fluid, particularly D5-D6 blastocyst culture broth and/or blastocyst fluid. In some embodiments, the medium used to extract the biometric quantities is a D1-D6 embryo medium, such as a D3-D5, D3-D6, D4-D6, or D5-D6 embryo medium.
In some embodiments of the methods of the invention, the method further comprises a step of extracting a biometric quantity value from the culture fluid and/or blastocoel fluid of the test embryo, comprising: blastocyst broth and/or blastocoel fluid is obtained, preferably using NGS, and tested for 10M, arm and chromosome resolution CNV and chimerism ratios thereof.
In some embodiments of the methods of the invention, the embryo is from IVF, preferably from ICSI (intracytoplasmic single sperm microinjection).
Embryo euploid prediction method
In one aspect, the invention also provides a method for predicting the probability of embryo euploidy based on embryo culture fluid and/or blastocoel fluid. In some embodiments, the prediction method of the present invention comprises: the test embryos are classified by the test data classification method to obtain the predicted euploid probability of the test embryos.
In some embodiments, the prediction method of the present invention comprises:
(a) obtaining a culture fluid and/or a blastocyst cavity fluid sample of a test embryo, preferably a blastocyst culture fluid sample, more preferably D5-D6 blastocyst culture fluid;
(b) measuring 10M, arm, chromosome resolution CNV and the corresponding chimeric proportion in the sample, and extracting the following 11 characteristics of the sample: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, abnormal number, mos _ rate, mos _ type, Sex _ normal;
(c) accepting, on at least one processor, test data comprising magnitudes of the 11 features;
(d) evaluating, using the at least one processor, the test data by a classifier, wherein the classifier has been trained using an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human embryo and comprising a magnitude for each of the 11 features extracted on a broth and/or blastocoel fluid sample of the respective embryo, and each training data vector further comprising a classification label for whether the respective embryo is euploid or aneuploid;
(e) outputting, using the at least one processor, a classification relating to the human test embryo based on the evaluating step (d), wherein the classification is a euploid probability of the embryo.
V. embryo transfer priority rating method
In a further aspect, the present invention provides a method of determining embryo transfer priority based on embryo culture fluid and/or blastocoel fluid, comprising:
-predicting the euploid probability of a test embryo from a culture fluid and/or blastocoel fluid sample from said embryo according to the test data classification method of the present invention;
-determining embryo transfer priority based on the obtained probability values.
In some embodiments, the determined embryo priority is an order of priority for embryo transfer based on the probability values. The higher the predicted euploid probability value of an embryo, the more advanced the priority of embryo transfer.
In some embodiments, the classification level of embryo transfer priority may be further divided according to different probability value ranges. For example, the classification level may be 2 levels, 3 levels, 4 levels, or 5 or more than 5 levels. In some embodiments, the classification levels are three levels of priority, including a first priority, a second priority, and a third priority. In another embodiment, the classification level is a 4-level priority including a first priority, a second priority, and a third priority. In these embodiments, the embryo priority determined based on the probability value is a priority classification level.
In some embodiments, one skilled in the art can determine embryo prediction euploid probability thresholds for ranking embryo priority classifications based on the euploid prediction value (NPV) and specificity of the classification model.
In some preferred embodiments, the priority classification level is 3. Preferably, when the probability value is greater than about 0.90, more preferably greater than about 0.94, the embryo is determined to be a first transfer priority, wherein preferably the NPV is greater than 90%, e.g., 91%, 92%;
when the probability value is between 0.90-0.60, more preferably between 0.94-0.70, the embryo is determined as a second transfer priority, wherein preferably the NPV is greater than 70%, more preferably 75-85%;
when the probability value is less than 0.60, more preferably less than 0.70, the embryo is determined to be a third transfer priority,
preferably, the first priority embryos have a chromosomal ploidy pattern selected from the group consisting of: 100 percent of euploid, CNV with low chimeric ratio, small chromosome segment CNV,
the third-priority embryos have a chromosomal ploidy pattern selected from the group consisting of: high chimerism ratio CNVs, and extensive fragmentation of CNVs, especially at the arm or chromosome level.
In some embodiments, in the training dataset used to construct the model, a ranking threshold of priorities is determined such that the proportion of first priority embryos is no less than 30%, e.g., 40%, 50%; or the ratio of the first priority embryo to the second priority embryo is not less than 50%, 60%, or 70%.
In some embodiments, suggestions are given regarding the order of transfer of embryos. In some embodiments, a recommendation is given for the order of embryo transfer further in conjunction with the morphological rating of the embryos.
In connection with the embryo transfer sequence proposal of the present invention, during the embryo transfer process, as will be appreciated by those skilled in the art, the physician may decide, based on the circumstances, whether to proceed with the transfer and/or determine the embryo for transfer in a manner consistent with good medical practice based on empirical/clinical judgment, taking into account factors related to embryo transfer such as the patient's actual condition (e.g., age, medical history, etc.), embryo morphology, etc.
Thus, in one aspect, the embryo transfer recommendation provided by the methods of the present invention may be incorporated into the decision of a physician as a decision parameter, for example, to improve the clinical outcome of an embryo transfer.
As demonstrated in the examples, the ranking method of the present invention is relevant to improving the clinical outcome of embryo transfer.
Thus, in one embodiment, the invention also provides the use of the rating method of the invention for improving the clinical outcome of embryo transfer.
In one embodiment, the present invention provides a method for improving embryo transfer, preferably the embryo transfer is a single blastocyst resuscitation transfer, the method comprising:
according to the ranking method of the present invention, the transfer priority of a plurality of test embryos from an individual patient for one IVF cycle is determined;
sorting the plurality of test embryos according to the determined priorities;
providing an embryo transfer recommendation that, preferably,
preferentially transferring the embryo if there is an embryo of the first transfer priority, and more preferably transferring an embryo with a higher predicted euploid probability;
if there is no embryo of the first transfer priority, then a recommendation is given to transfer an embryo of the second transfer priority;
if only embryos of the third priority are available, the patient is informed of the associated risk of miscarriage, and is advised to start a new cycle to re-pick embryos for ranking.
In some preferred embodiments, the invention relates to the use of the grading methods of the invention to improve clinical outcomes of IVF-ET such as sustained pregnancy rates and miscarriage rates, and/or to reduce cycle (e.g., FET cycle) abrogation rates.
Classification, prediction and rating system and computer program product implementing the method of the present invention
The present invention provides a system that facilitates the implementation of the method of the present invention. An exemplary system includes a storage device for storing a training data set and/or a testing data set and a computer for executing a learning machine (e.g., a classifier or ensemble classifier as described herein). The computer is also operable to collect a training data set from the database, pre-process the training data set, train the learning machine using the pre-processed training data set, accept input test data, classify the test data using the trained learning machine, and output information regarding the classification of the test data.
The example system may also include a communication device to remotely receive the test data set and/or the training data set. In such cases, the computer may be operable to store the training data set in the storage device prior to preprocessing the training data set, and to store the test data set in the storage device prior to preprocessing the training data set. The example system may also include a display device for displaying the classification of the test data.
The term "computer" as used herein should be understood to include at least one hardware processor, which may use at least one memory. The at least one memory may store a set of instructions. The instructions may be stored permanently or temporarily in the one or more memories of the computer. The processor executes instructions stored in the one or more memories to process data. The set of instructions may include a plurality of instructions for performing one or more particular tasks (e.g., classification and/or rating tasks as described herein). The set of instructions to perform the particular task may be represented as a program, software program, or software.
As described above, a computer executes instructions stored in one or more memories to process data. The data processing may be, for example, in response to a computer user command, in response to previous processing, in response to a request by another computer, and/or any other input.
The computer used to implement embodiments at least in part may be a general purpose computer. However, a computer may also utilize any of a variety of other technologies, including a special purpose computer, a computer system including a microcomputer, minicomputer, or mainframe, e.g., a programmed microprocessor, microcontroller, peripheral integrated circuit elements, CSIC (customer specific integrated circuit) or ASIC (application specific integrated circuit) or other integrated circuit, logic circuitry, digital signal processor, programmable logic device such as an FPGA, PLD, PLA or PAL, or any other device or arrangement of devices capable of implementing at least some of the steps of the method of the invention.
It will be appreciated that the processors and/or memories of the computers need not be physically located in the same geographic location in order to practice the method of the present invention. That is, the various processors and memories used by the computer may be located in geographically distinct locations and connected so as to communicate in any suitable manner. Further, it is to be understood that the various processors and/or memories may be comprised of different physical components of the device. Thus, it is not required that the processor be a single device component at one location and the memory be another single device component at another location. That is, for example, it is contemplated that the processor may be two or more device components located at two different physical locations. The two or more different device components may be connected in any suitable manner, such as a network. Further, the memory may comprise two or more portions of memory in two or more physical locations.
Various techniques may be used to provide communications between the various computers, processors, and/or memories, as well as to allow the processors and/or memories of the present invention to communicate with any other entity; for example, to fetch further instructions or to access and use remote memory. Technologies for providing such communications may include a network, the internet, an intranet, an extranet, a LAN, an ethernet, or any client server system that provides communications. Such communication techniques may use any suitable protocol, such as TCP/IP, UDP, or OSI.
Further, it should be understood that the computer instructions or sets of instructions for implementing and operating the invention will be in the form of suitable computer-readable instructions.
In some embodiments, various user interfaces may be utilized to allow a user to have an interactive session with a computer or machine for at least partially implementing embodiments of the present invention. The user interface may be in the form of a dialog box. The user interface may also include a mouse, touch screen, keyboard, voice reader, voice recognizer, dialog screen, menu box, list, check box, toggle switch, button, or other device that allows a user to receive information regarding the operation of the computer and/or provide information to the computer. Thus, the user interface may be any means of providing communication between a user and a computer. For example, the information provided to the computer by the user through the user interface may be commands, data, or some other form of input.
It is also contemplated that the user interface of the present invention may interact with another computer of the non-user, for example, to transmit and receive information. Thus, the other computer may be characterized as a user. Further, it is contemplated that the user interface utilized in the systems and methods of the present invention can interact with another computer portion or portions, as well as with the user portion.
Classification system
In one aspect, the present invention provides a classification system adapted to perform the classification method. In some embodiments, the classification system of the invention may utilize a machine learning algorithm to evaluate a set of biological characteristics of a subject (an embryo to be transplanted) by way of computer-assisted execution, for example, on a random forest.
In a preferred embodiment, the classification system of the invention is used for predicting embryo euploid probability and comprises:
-at least one processor coupled to an electronic storage means containing an electronic representation of a classifier, the classifier being a trained classifier,
preferably, wherein the classifier is trained by using an electronically stored set of training data vectors, wherein each training data vector of the set of training data vectors represents a human individual embryo and comprises a magnitude of each of 11 features extracted on the broth and/or blastocoel fluid of the respective embryo, and each training data vector further comprises a classification label as to whether the respective embryo is euploid or aneuploid, preferably the classification label reports "normal" or "abnormal" based on whether the embryo has a more than 10M level CNV greater than a specified chimerism;
wherein the 11 features are: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom;
wherein the processor is configured to accept test data comprising a magnitude for each of the 11 features extracted on the broth and/or blastocoel fluid of a test embryo,
wherein the processor is further configured to evaluate the test data using the electronic representation of the classifier and output a classification of the corresponding embryo in the test data source broth and/or blastocoel fluid based on the evaluation, wherein the classification is preferably a predicted embryo euploid probability value.
Prediction system
In one embodiment, the present invention provides a system for predicting embryo euploid probability, the system comprising:
-at least one processor coupled to an electronic storage means containing an electronic representation of a classifier, the classifier being a trained classifier,
preferably, wherein said classifier is trained by using an electronically stored set of training data vectors, wherein each training data vector of the set of training data vectors represents a human individual embryo and comprises a magnitude for each of 11 features extracted on the broth and/or blastocoel fluid of the respective embryo, and each training data vector further comprises a classification label regarding whether the respective embryo is euploid or aneuploid, wherein said 11 features are: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, abnormal number, mos _ rate, mos _ type, Sex _ normal;
wherein the processor is configured to accept test data comprising a magnitude for each of the 11 features extracted on the broth and/or blastocoel fluid of a test embryo,
wherein the processor is further configured to evaluate the test data using the electronic representation of the classifier and output a classification of the corresponding embryo in the test data source broth and/or blastocoel fluid based on the evaluation, wherein the classification is a predicted embryo euploid probability value.
In some embodiments, the prediction system of the present invention further comprises: a sequencing module (e.g., a sequencer, particularly an NGS sequencer) for receiving a nucleic acid sample and providing nucleic acid sequence information from the sample regarding the broth and/or blastocoel fluid, and a CNV analysis and output module for identifying CNVs from the nucleic acid sequence information and outputting information regarding CNV types and chimerism.
In some embodiments, in the prediction system of the present invention, the processor is further configured to receive training data vectors and train classifiers, particularly random forest classifiers, using the training data vectors, and generate a casual forest model for predicting embryo euploid probabilities.
Rating system
In one embodiment, the present invention provides a system for embryo transfer priority assessment, comprising:
-an embryo euploid probability module capable of performing the steps of the classification method according to claim 1 or 2;
-an embryo transfer prioritization module capable of performing the steps of the embryo transfer prioritization method of claim 15; and optionally
-a presentation module capable of presenting the transfer priority of the test embryo and optionally giving an embryo transfer recommendation based on the priority.
Computer product
In one embodiment, the invention provides a non-transitory computer readable medium having stored thereon computer program instructions for execution by a computer or computer system to implement the steps of the classification method or prediction method or rating method according to any one of the preceding claims.
The invention relates to a combination of biological characteristics and the use thereof in the inventive prediction and rating method
In yet another aspect, the present invention also provides a method of extracting a set of biological characteristics of a pre-implantation embryo, wherein the method comprises:
on a sample of in vitro culture fluid and/or blastocoel fluid from an embryo, preferably a blastocyst culture fluid sample, 10M, arm, chromosome resolution CNV and their corresponding chimerism ratios are measured, and the following 11 characteristic quantities of the sample are extracted: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom,
preferably, the method further comprises predicting the euploid probability of the embryo based on the magnitude of the extracted biological feature.
In one embodiment, the method of the present invention further comprises:
-accepting, on at least one processor, test data comprising magnitudes of the 11 features;
-evaluating, using the at least one processor, the test data by a classifier, wherein said classifier has been trained using an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human individual embryo and comprising a magnitude for each of said 11 features extracted on a broth and/or blastocoel fluid of the respective embryo, and each training data vector further comprising a classification label regarding whether the respective embryo is euploid or aneuploid;
-outputting, using the at least one processor, a classification regarding the human test embryo based on the evaluating step, wherein the classification is a euploid probability of the embryo.
In a further aspect, the present invention also provides a combination of biological characteristics and its use for predicting embryo euploid probability and/or for determining embryo transfer priority, wherein the biological characteristics are one or more (preferably 7,8,9,10 or 11) biological characteristics selected from: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ chrom, mos _ rate, mos _ type, Ab _ SexChrom
In a further aspect, the present invention also provides the use of an agent for detecting CNV in the culture fluid and/or blastocoel fluid of a preimplantation embryo in the manufacture of a kit, device or system for predicting the euploid probability of an embryo and/or for determining embryo transfer priority, wherein said agent is useful for extracting the following biological feature or combination of biological features (preferably a combination of 7,8,9,10 or 11) on an embryo culture fluid: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom. In a preferred embodiment, said agent is used in combination with a computer product and/or system of the invention for classification data of the invention, or a computer product and/or system of the invention for embryo euploid prediction or grading, for said use.
Examples
General overview
Design of research
The objective of this study was to establish a non-invasive embryo transfer priority assessment strategy as a reference for embryo selection.
Firstly, 345 culture solution samples and chromosome CVN information of the whole embryos matched with the culture solution samples are utilized to construct a training data set, and a machine learning random forest model for predicting the embryo euploid probability based on the culture solution is established. And determining an embryo grading strategy according to the euploid probability predicted by the model. In this strategy, embryos are classified into 3 classes: A. b and C.
Second, the sample size required for validation studies was determined using PASS software (version 11.0; NCSS statistics software, Kaysville, UT, USA). The minimum total amount of sample required to achieve 80% test efficacy at 0.05 significance was estimated using "chi-square test for multiple projects" and "logistic regression".
And thirdly, collecting 160 patients based on the sample size determined in the second step, and verifying the embryo grading strategy by using a blind prospective observation research method. In this validation study, the primary endpoint was sustained pregnancy rate and the secondary endpoints were miscarriage rate and clinical pregnancy rate. The results confirmed that the clinical outcome of IVF could be successfully predicted using the culture broth.
Statistical analysis method
The following statistical analysis methods were applied in prospective clinical studies in the following examples for data processing and analysis.
The normal distribution of continuous variables was examined using the Shapiro-Wilk test. Continuous variables are expressed as mean ± standard deviation (normal distribution) or mean ± quartile (non-normal distribution); categorical variables are described in terms of frequency and percentage.
Data are expressed as mean (± SD), median (quartile range, IQR), or frequency (%) depending on the type of data. When the normal distribution and homogeneity of variance assumptions are satisfied, continuous variables are compared using one-way analysis of variance. Otherwise, the Kruskal-Wallis test (H test) was applied. For categorical variables, differences between groups were compared using the chi-square test or Fisher exact test. In addition, the influence of each embryo rank group (groups a, B and C) on clinical outcome was evaluated after adjustment for confounding variables using the logistic regression model. All p values were two-sided, with p <0.05 considered to be statistically significant. The SPSS software program (version 20.0; SPSS Inc., Chicago, IL, USA) was used for the analysis.
Example 1
Training data set construction and random forest model establishment
Embryo culture fluid collection
In 2017, a culture medium for 345 embryos in total was collected from the procreation medicine center of Nanjing Jinling Hospital, Changsheng Hospital and Nanjing Tuolong Hospital. Institutional ethical review board (IRB) approval [ Reference number:2016NJKY-028, CZEC (2017-06),2017-09-02] and informed consent were obtained prior to the development of the study. Female patients are 20-45 years of age. All embryos were ICSI (cytosolic single sperm injection) embryos and cultured in vitro to the cleavage stage (D3); after washing, a drop of fresh blastocyst medium (SAGE, CooperSurgical Fertility Co., Denmark) was transferred. Embryos of D5/6 were scored by morphological criteria described by Gardner et al (see, e.g., C. Racowsky et al, Standard of mapping morphology. Fertil. Steril.94,1152-1153(2010), and L.Scott, A.Finn et al, morphological parameters of early stage-stage templates with parameter definition and delivery: productive and applied data for detailed prediction. hum. reproduction. 22,230-240 (2007)). Blastocyst fluid from donated embryos that are morphologically not satisfactory for transplantation is collected for subsequent analysis.
Treatment of blastocyst Medium
Blastocyst medium (20-25ul) from each embryo was transferred to RNase-free PCR tubes containing 5ul of cell lysis buffer (Yikon Genomics, China) and stored at-80 ℃ until use. The cell lysis treatment was carried out according to the manufacturer's instructions (Yikon Genomics, China).
Whole genome amplification and NGS sequencing (next generation sequencing)
Whole genome amplification was performed on the culture lysates. Thereafter, the ChromInst (Yikon Genomics, EK100100724 NICSIInst) was usedTMLibrary preparation kit), a library is prepared. NGS sequencing was performed on the Illumina Miseq platform, producing approximately 2 million sequence reads per sample. Methods of amplification and sequencing may be referred to J.xu et al, Noninvasive chromosome screening of human emulsions by genome sequencing of emulsion culture medium for in vitro transduction, Proc.Natl.Acad.Sci.U.S. A.113,11907-11912 (2016); jiao et al, minimum innovative simulation testing using blast culture medium.hum.reprod.34,1369-1379(2019).
NGS sequencing data analysis
High quality reads were extracted and mapped to the human hg19 genome. After removing duplicate reads, the number of reads was counted along the whole genome, where relative copy number was expressed after scaling by GC content and reference dataset using 1Mb of bin size. Thereafter, the copy number of each bin is segmented by a Cyclic Binary Segmentation (CBS) algorithm, bins having similar trends are merged, and a final copy number segmentation is calculated. Coefficient of Variation (CV), calculated as the ratio of the standard deviation of the read density to its mean, was used to assess whether amplification was successful. CV values <0.2 were considered successful amplification. Using the R program, copy number of each BIN was plotted and CNV spectra were visualized for all 24 chromosomes. Using this method, the minimum resolution of the CNV is 10 Mb.
In the ploidy analysis of this study, embryos with 5 or more abnormal chromosomes were defined as carrying MACs (multiple abnormal chromosomes); insufficient data for analysis due to failed amplification was defined as unavailable (N/A).
Machine learning classifier
And (3) developing a copy number pattern related to chromosome euploidy or aneuploidy in the blastocyst culture solution by using a machine learning random forest algorithm and combining chromosome CNV information of the whole embryo.
Constructing a training data set
Using a prescribed CNV chimerism ratio (e.g., > 50% chimerism ratio) as the cutoff value reported for aneuploidy, the whole embryo is reported as a whole embryo body/aneuploidy (i.e., "normal"/"abnormal") based on whether the embryo has a CNV level of 10M or more of the prescribed chimerism ratio based on the chromosomal CNV information of the whole embryo. And (3) taking the reported result of the whole embryo as a gold standard, and distributing a corresponding type label (embryo _ label) to a corresponding culture solution sample, namely the embryo is normal or abnormal.
Extracting the following 11 characteristic values from the culture solution sample, and establishing a training data set:
characteristic 1-10M resolution CNV results (10M CNV); designated "normal"/"abnormal" (N/a) depending on the absence or presence of CNV at levels above 10M;
feature 2-10M resolution CNV result redefined at a defined mosaic ratio (10M _ mos CNV); designated "normal"/"abnormal" (N/a) according to the absence or presence of 10M resolution CNV greater than a prescribed mosaic ratio;
feature 3-Arm resolution CNV results (Arm CNV); CNVs according to arm level were absent or present, designated "normal"/"abnormal" (N/a);
feature 4-Arm resolution CNV result redefined at a prescribed mosaic ratio (Arm _ mos CNV); designated "normal"/"abnormal" (N/a) according to the absence or presence of arm resolution CNV greater than a prescribed chimerism ratio;
feature 5-chromosome resolution CNV results (Chrom CNV); CNVs, designated "normal"/"abnormal" (N/a) according to the absence or presence of whole chromosome levels;
feature 6-chromosome resolution CNV results (Chrom _ mos CNV) redefined at a defined chimerism ratio; designated "normal"/"abnormal" (N/a) according to the absence or presence of chromosome resolution CNV greater than a prescribed mosaic ratio;
feature 7-counts of euploids at different CNV resolutions (NormalNum); the total number of times the sample was designated as "normal", i.e., the number of "normal" features for which the feature value was designated among the features 1-6, based on the 10M, arm and chromosome resolution CNV results and the redefined CNV results described above;
characteristic 8-number of abnormal chromosomes (Ab _ ChromNum); the number of abnormal chromosomes having CNV detected;
feature 9-maximum CNV chimerism ratio (Mos _ rate); the chimerism ratio of the CNV having the highest chimerism ratio among all CNV types detected;
feature 10-Type of CNV of maximum mosaic ratio (Mos _ Type); among all CNV types detected, the one with the highest chimerism ratio, designated as "whole chromosome" (whole chromosome)/"short arm" (p)/"long arm" (q)/CNV fragment size (e.g., 45 Mb);
characteristic 11-sex chromosome abnormality (Ab-SexChrom) is designated "Yes" (YES)/No "(NO) depending on whether only CNV abnormalities of the sex chromosome are present.
An illustrative example of training set construction is shown in fig. 1, where a > 50% chimerism ratio is used as the cutoff value reported for an "abnormal" (i.e., aneuploid) embryo.
Generation of random forest models
The random forest is an integrated learner based on a bagging strategy and a plurality of decision trees.
Generating a random forest model according to the following steps: 1) randomly segmenting the training data into subsets through bootstrap resampling; 2) generating a plurality of decision trees (500) for each subset of the training dataset and the subset of features; 3) averaging the euploid probability predicted values from 500 decision trees to obtain the predicted value of the model; 4) using an out of bag error rate (out of bag error, OOB error), the model was evaluated; and 5) iteratively training the model until an optimal model is obtained.
Based on the euploid probabilities obtained for the best model, 345 embryos were ranked. And drawing an ROC curve of the model, calculating the area under the curve, and evaluating the efficiency of the model.
In the embodiment, a random forest model is constructed by adopting an R-3.6.1randomForest program package; the ROC curve was generated using the R-3.6.1ROCR program.
Comparing the efficacy of models generated by using different CNV chimeric ratios as aneuploidy cutoff values
By controlling different chimeric proportions of the embryo CNV as an aneuploid reported cutoff value, on the premise of fully considering other factors, including different resolutions, multiple chromosome abnormalities, whether the chromosome abnormality exists or not, and the like, the highest consistency of the culture solution prediction result and the whole embryo reported result can be obtained by researching how many chimeric proportions as aneuploid reported standards, namely the consistency rate of the chromosome aneuploid reported result (normal/abnormal) is higher.
A > 30%, > 40%, > 50%, > 60%, or > 70% chimeric ratio is specified as the cutoff value reported for abnormal embryos (aneuploidy). And (3) constructing training set data and generating a random forest model from 345 pairs of paired culture solutions and corresponding whole embryo CNV information according to the method, and simultaneously obtaining corresponding ROC curves and areas under the curves. FIG. 2 shows ROC curves for different prediction models established with different chimerism ratios as cutoff. As shown in fig. 2, the optimal model performance was obtained with a 50% chimerism ratio as the reported threshold for aneuploidy (i.e., "abnormal"), an area under the ROC curve (AUC) of 92.3%, and a concordance rate of the broth model prediction results with the whole embryo results of 86.4%.
In addition, different resolutions and different embedding proportions are adopted as cutoff values reported by abnormal embryos (aneuploidy), a model is generated according to the method, and corresponding ROC curves are obtained. Specifically, the 10M resolution level assumes a cutoff value of > 70%, the arm level assumes a cutoff value of > 50%, and the whole chromosome level assumes a cutoff value of > 40%. The model thus generated was compared in efficacy with a model generated from cutoff values reported as abnormal embryos (aneuploidy) at a 50% mosaic ratio at each resolution. The results are shown in fig. 3, each resolution reports cutoff values as aneuploidy at 50% chimerism ratio, yielding better model performance.
This optimal random forest prediction model, generated with an optimal chimerism value of > 50%, was used for the prospective observation study in subsequent example 3.
Example 2
Embryo rating system established based on euploid probability of model prediction
Model-based prediction of euploid probability
With the advent of the Comprehensive Chromosome Screening (CCS) approach, this approach has been used to select euploid embryos for clinical embryo transfer. Chromosomal ploidy of an embryo can be represented by the copy number abnormality (CNV) pattern of the embryo. Traditionally, thresholds for chromosome copy number gain/loss (cutoff value) were set to report euploid, aneuploid or chimeric embryos (A.R. Victor et al, One bound biological embryo transferred specific in a single envelope: expanding while and low the same result in a genetic embryo in genetic embryo. fertil. Steril.111,280-293 (2019)). However, DNA molecules in embryo culture have high fragmentation and degradation properties, a relatively arbitrary way of embryo classification, when applied to chromosome screening of embryo culture samples, may lead to high aneuploidy/chimera ratios (H. Bolton ET al, Mouse model of chromosome mosaic removal of embryo cells and normal genetic improvement patent. Nat. Commun.7,11165(2016)), which in turn leads to erroneous screening of euploid embryos and increased IVF-ET cycle abrogation rates.
Thus, instead of arbitrarily setting a threshold to determine euploid vs. aneuploid embryos, the present inventors developed a strategy for embryo priority ranking based on the model predicted embryo euploid probabilities.
Fig. 4A summarizes the prediction model building process of example 1, wherein a paired blastocyst fluid sample and corresponding whole embryo CNV information are used, and a random forest machine learning method is applied to study the ploidy pattern of the fluid associated with euploid/aneuploidy, with the chromosome ploidy state confirmed by the whole blastocyst CNV information as the gold standard. In the model building process, a number of features are considered, including CNV chimerism ratio, CNV chromosome fragment size, number of abnormal chromosomes, and sex chromosome information.
Random forest prediction models were generated using training sets constructed from 345 culture fluid samples using > 50% of the defined chimerism ratio as the "abnormal" embryo reported cutoff values as described in example 1. The 345 samples were ranked according to the euploid probability values predicted by the model. As a result, as shown in fig. 4B, the higher the probability of euploid predicted on the culture medium by the NICS method, the higher the probability that the whole embryo is euploid. The embryo culture solution with the prediction probability below 0.3 has the ploidy condition of most paired blastocysts as aneuploid; in contrast, in the embryo culture fluid with the prediction probability of 0.8 or more, the ploidy status of most of the paired blastocysts is euploid.
The receiver operating characteristic curve (ROC) of this model is shown in fig. 5A, which yields an area under the ROC curve (AUC) of 92.3%, indicating that the model performs well in distinguishing euploids from aneuploidies.
Establishing embryo rating policy
The morphological grading system is widely applied to clinically grading embryo implantation priority of IVF-ET. Each embryo is classified into different morphological levels according to number, size, and other morphological measures (C. Racousky et al, Standardization of mapping embryo morphology. Fertil. Steril.94,1152-1153 (2010); L. Scott et al, morphological parameters of early stage embryos with respect to their relative movement with respect to their temporal resolution and delivery, pro-active and applied data reproduction. 22,230-240 (2007)). Although easy to implement, this conventional morphological grading system does not account for the probability of embryo aneuploidy, which is one of the most important factors leading to early miscarriage (S.Munne et al, Technology requirements for prediction genetic diagnosis to improved reproduction in animals. real. step.94, 408-430 (2010)). For this purpose, an embryo transfer priority ranking strategy is established based on the embryo euploid probabilities predicted by the previous model.
Classifying the embryos into A, B and C grades according to the euploid probability value level predicted by the model, wherein the A, B and C grades correspond to the predicted euploid probabilities of being larger than or equal to 0.94, 0.7-0.94 and smaller than or equal to 0.7 respectively. As shown in fig. 5A, the rating strategy well balances euploid prediction with specificity. These two indicators are two important indicators in selecting the embryo for implantation.
For 345 pairs of paired training data, each sample can be predicted once using 10-fold cross-validation. Using the above-described ranking criteria, it was finally observed that 37% of the embryos were classified as class A, and 29% and 34% of the embryos were classified as class B and C, respectively (as shown in FIG. 5B). Based on the chromosomal ploidy information of the corresponding whole blastocyst, the proportion of euploid in the class a embryo is 91.4% while the proportion of non-euploid is 8.6%; the proportion of euploidy in embryos class B and C was 76.4% and 34.6%, respectively (as shown in figure 5C). Thus, 91.4% of grade a embryos predicted from culture broth showed chromosomal integrant properties consistent with corresponding whole embryos.
Chromosome ploidy patterns of staged embryos
Further, the culture broth chromosome ploidy patterns of different graded embryos (i.e., grade a, B and C embryos), i.e., the CNV patterns measured on culture broth by NICS, were explored.
Compared to grade B and C embryos, grade a embryos show the following chromosome copy number pattern: 100% euploid, or low chimeric ratio, or small size of aneuploid chromosome fragments (see FIG. 6). Whereas the grade C embryos have the following chromosome copy number pattern: CNVs of a large range of chromosome segments, e.g., CNVs at the whole chromosome level, and high CNV chimerism ratios or 100% aneuploidy patterns (fig. 6).
Extended training set
The training data set is expanded to 504 pairs of paired training data. The sampling mode and the modeling mode are the same.
The influence of different CNV embedding proportions on the model efficiency is researched on the expanded training set. As shown in fig. 7, consistent with the previous results, the random forest prediction model generated with a 50% mosaic ratio as the cutoff value has good efficiency (the area AUC under the ROC curve reaches 90.1%), and the prediction result shows a high consistent rate (86.8%) with the whole embryo result.
And obtaining the euploid prediction probability value of the culture solution sample on the amplified training set by using a model generated by a 50% chimeric proportion cutoff value. These were ranked, an ROC curve was generated, and the cutoff value for A/B/C embryo ranking was determined. As shown in FIG. 8, grade A, B and C embryos were identified according to the predicted euploid probabilities ≧ 0.95, 0.6-0.94 and ≦ 0.6. The proportion of grade a embryos thus determined in the total sample was 51.8% and NPV (negative predictive value) was 92%.
By the aid of the expanded training set, the effectiveness of the prediction model establishing method and the embryo grading strategy is further verified.
Example 3
Prospective observation study verification rating system-based embryo screening strategy
Patient selection
All oocyte recovery cycles and single-cyst-embryo resuscitation transfer cycles were performed from 2017 to 2018 at the reproductive medicine center of the Nanjing Jinling Hospital, and NICS screening was performed. The study was approved by the ethical committee of the Nanjing Jinling Hospital (volume 2016NJKY-028) and written informed consent was obtained prior to the study.
The inclusion criteria were as follows: mother age <45 and >20 years old; embryos were derived from ICSI (intracytoplasmic sperm injection) and cultured to the blastocyst stage (D5/6), where blastocyst fluid was collected in all cycles; between 7 years 2017 and 12 months 2018, at least one cycle of blastocyst resuscitation transplantation was performed.
Exclusion criteria included: the patients had abnormal uterine cavity morphology, endometrial lesions, intrauterine fluid accumulation, untreated fallopian tube fluid accumulation, and any situation deemed by the investigator to be inappropriate for participation in the study.
Oocyte retrieval, embryo culture, NICS and FET
All patients received controlled ovarian stimulation according to a routine protocol, including: a luteal phase long regimen, a follicular phase long regimen, and a microstimulation regimen. Ovarian stimulation was performed using recombinant FSH (Gonal-F, Merk-Serono, Germany) and menotrophin for injection (HMG, Lizhu pharmaceuticals, China). The dose for each patient was established according to the age, weight and ovarian reserve of each patient. Ovulation was triggered when the dominant follicle (lead folliculules) reached 18mm or both follicles reached 17mm diameter using 5,000-10,000IU of human chorionic gonadotropin (hCG, Lizhu Pharmaceuticals). After 36 hours, the oocytes were recovered. Intracytoplasmic single sperm injection (ICSI) was used for all cycles. Embryos were cultured using standard culture conditions (5% O2 and 6% CO2, Labotect C200, germany). On the morning of D5/6, the embryos were evaluated using the morphological rating criteria described previously by Gardner et al. At the same time, the spent culture broth was collected and subjected to NICS analysis (the analysis method was the same as that described in example 1). All embryos were individually vitrifted individually.
The hormone replacement is used for endometrial preparation during the FET cycle. Orally administering estradiol tablet for 10-18 days; progesterone was administered when endometrial thickness reached 8 mm. Single blastocysts were selected based on morphological evaluation, one blastocyst was transplanted per patient per cycle, and the embryos were transferred under abdominal ultrasound guidance.
Serum hCG levels were measured 14 days after embryo transfer. Vaginal ultrasonography was performed at 7-8 weeks of gestation. Patients were followed up by secondary vaginal ultrasound at 12 weeks gestation.
Clinical outcome variable evaluation
As with the study flow chart of fig. 9, all FET patients were divided into three groups, A, B and grade C embryos, based on the model predicted ploidy pattern, and primary and secondary clinical endpoint studies were performed.
The primary clinical outcomes were sustained pregnancy rate, miscarriage rate, and clinical pregnancy rate. Clinical pregnancy rates are defined as: the number of cycles of gestational sacs divided by the total number of transplant cycles was observed by vaginal ultrasound at 7-8 weeks of gestation. The flow yield was calculated as: the number of pregnancy failures after the gestational sac has been recorded by vaginal ultrasound, divided by the total number of clinical pregnancies. Any pregnancy that lasts more than 12 weeks is considered a sustained pregnancy. Ectopic pregnancy is considered to be a clinical pregnancy rather than a single abortion.
Results
Use of NICS rating system as predictor to reduce spontaneous abortion
To clinically validate the rating system of the present invention, 160 patients were recruited and a blind prospective observation study was performed to investigate the clinical outcome of embryo implantation.
All patients (n-160) were infertility patients, requiring intracytoplasmic single sperm injection (ICSI). As shown in fig. 9, all patients underwent oocyte retrieval, embryo culture, blastocyst freezing, and single frozen embryo resuscitation transfer based on morphological evaluation. While the blastocysts were frozen, blastocyst broth was collected and subjected to NICS analysis, using the random forest model and ranking strategy previously established based on 345 on the paired training data set (A, B and C). FET patients were classified into three groups according to the rating. The demographic characteristics of the three groups are shown in table 1 below.
Table 1 baseline demographics for all patients undergoing FET
Figure GDA0003313052760000211
Table 2 and figure 10 show the observed clinical outcomes of patients and a comparison thereof.
TABLE 2 clinical outcome of pregnancy for patients with different NICS screening results
Figure GDA0003313052760000221
For each blastocyst transfer performed as described above, the corresponding blastocyst culture fluid was collected and subjected to NICS. Clinical researchers were not aware of the NICS results until the clinical outcome was obtained and revealed. The results obtained on 160 transferred embryos were studied and a correlation between embryo grade predicted by the NICS rating system of the present invention and improvement in clinical outcome was observed.
Specifically, as shown in FIG. 9, the transferred embryos were classified into three grades according to the embryo grade evaluated by the results of the culture solution NICS. Of the 160 embryos with conclusive results, 79 were classified as class a, 40 as class B, and 41 as class C.
As shown in table 2, clinical pregnancy was achieved with 54 out of 79 class a embryos (68.35%), 23 out of 40 class B embryos (57.50%), and 24 out of 41 class C embryos (58.54%). There were no significant differences in clinical pregnancy rates (χ) between the three grades21.843, p 0.398) (table 2). However, significant differences in sustained pregnancy rates (χ) were shown between the three groups210.860, p 0.004) (table 2). The pregnancy rate in the group C was 31.71% (13/14), and the pregnancy rate in the group A was 63.29% (50/79 p 0.004)]. The pregnancy rate in the group C was 31.71% (13/14), and the pregnancy rate in the group B was 50% (20/40 p ═ 0.043)]. But no significant difference in sustained pregnancy rates was achieved between the class a and class B groups [ 63.29% vs.50.0%, p 0.370%]This may be caused by the limited number of embryo transfers observed. See table 3 below, which shows the pairwise comparison of clinical outcomes for the different cohorts.
As shown in Table 2, the three groups also had significantly different flow rates (χ)2=19.561,p<0.001). The highest flow yield was 45.83% (11/24) for group C; in-line with the aboveThe last is group B, 13.04% (3/23); the stream yield was lowest in the class a group, 5.56% (3/54). The flow yield achieved a significant difference between the C-stage and A-stage groups [ p ]<0.001]. Similarly, significant differences in flow yield between the C-stage and B-stage groups were also achieved [ p ═ 0.019]. But the difference between the B and a grade groups did not reach statistical significance [ p ═ 0.443]This may be due to the limited number of embryo transfers observed. See table 3 below for pairwise comparisons.
TABLE 3 comparison of clinical outcomes of different cohorts in pairs
Figure GDA0003313052760000231
Discussion of the related Art
Currently, typical embryo selection methods rely on morphological scoring. Although this method is convenient to implement, morphology is not a good indicator of embryo chromosome composition, and is highly subjective. Thus, PGT-a has been widely proposed for clinical IVF to avoid aneuploid embryo transfer. However, the art is still controversial with respect to this method due to the invasive nature of the method and the chimerism of the Embryo (see, e.g., L.Pagliardini et al, Shooting STAR: reinterpreting the data from the 'Single Embryo Transfer R of European Embryo' random cloned clinical. repeat. biological. Online 40,475 (478) (2020), and M.Popovic et al, Chromosomal research in human embryos: the ul exchange of transplantation genetic testing Hurod. repeat. 33,1342-1354 (2018)). On the one hand, TE (trophoblast) biopsies may not truly represent the chromosomal composition of the Inner Cell Mass (ICM) that will develop into a fetus. On the other hand, it has been reported that in embryos formed by in vitro fertilization, chimeric embryos are defined by the presence of two or more cell lines with different chromosome compositions in the embryo, and the incidence of chimerism can range from 2.0% to 2.9% up to 14.0% to 17.3% (A. Capalbo et al, Correlation between chromosome morphology, reproduction.29, 1173-1181 (2014); D.S. Johnson et al, comparative analysis of karyopic chromosome mapping and inner cell morphology. holm. reproduction.16, 944-9 E.2010, France, publication No. 24, and No. 26. expression, see FIGS. 14, 2, 14, 23, 2,23, 2,23, 2,5, 1,2, 5,1, 2, 1, 2. This all makes the clinical use of PGT-a somewhat limited.
Using whole embryos as gold standard, Huang et al (Noninvasive preimplantation genetic testing for amino utilization medium less reliable three dimensional reliable than trophectoderm biopsys. Proc. Natl. Acad. Sci. U.S. A.116,14105-14112(2019)), observed that the culture solution DNA had a higher consistency with whole embryos than the TE biopsy samples, suggesting that culture solution-based PGT-A may potentially be more accurate than TE-based PGT-A.
However, given the chimeric nature of human embryos, DNA in culture may be a mixture of ICMs and apoptotic cells that are cleared from the embryo, h. bolton et al indicate that this may result in a high false positive rate at the time of chromosome screening (Mouse model of chromosome mosome mosaicism removal of aneuploid cells and normal developmental potential. nat. Commun.7,11165 (2016)). Thus, labeling embryos as "aneuploid" or "euploid" based on dichotomous screening results may result in several embryos being mistakenly screened, thereby increasing the abrogation rate of the IVF cycle.
Previous PGT-a studies defined euploid based on percent chimerism. Typically, samples with 20-80% abnormal cells are defined as "chimeric" (A.R. vector et al, One bound biological assays transformed pro-active in a single clinic: expanding while and low the same result in a genetic prediction. Fertil. Steril.111,280-293 (2019)). For non-invasive PGT-A, Huang et al define a chimerism ratio of more than 60% as aneuploid (non-variable visualization genetic testing for animal uptake in molecular biology more than mobile codon reproducible and chromatographic biology system. Proc. Natl. Acad. Sci. U.S. A.116,14105-14112(2019)), but in different studies this chimerism ratio was defined as 40% (J.J., et al, minor visualization genetic testing using staining medium. Hum. reproduction.34, 1369-1379(2019) or more than 50% (C.Rubio et al, cell-free DNA mapping biology testing system: 2019, 2019) or (C.Rubio et al, culture cell-free DNA replication biological testing for animal uptake in biological cement, 2019, 19. C.Ruby et al, culture cement production system: 519. 19. and culture cement. 19. in the same study, culture system, culture.
Unlike previous studies, in the present invention, in addition to the abnormal chromosome percentage, characteristics such as chromosome gain/loss, CNV fragment size, genomic position, and abnormal pattern are also taken into account when calculating the normal probability of an embryo; and thus proposes a new strategy for embryo ranking based on the euploid probability and implantation potential of the embryo.
In this new strategy, the inventors added features to the random forest model, including chromosome mosaic ratio, size of aneuploid fragments, and polychromosomal abnormalities. As shown by the results, the model can effectively realize the prediction of the euploid probability of the corresponding embryo in the embryo culture solution. Unlike previous simple embryo binary screening, based on the euploid probability predicted by this model, the present invention can provide a priority list of embryos to be transferred, where embryos with higher predicted euploid probabilities can be more advanced on the transfer priority list, thus allowing for greater robustness and flexibility in the selection of transferred embryos.
Clinical trial studies have demonstrated the correlation between the rating system of the present invention and the clinical outcome of embryo transfer. Based on this correlation, the present invention proposes an embryo transfer sequence based on the NICS results, as shown in FIG. 11. Grade a embryos have a euploid probability > 90%, with the highest priority for transplantation. If the patient does not have a grade A embryo, a grade B embryo may be considered. In the above experiment using 160 transferred embryos, no significant difference in sustained pregnancy rate and miscarriage rate was observed between class a and B embryos. However, if the patient has only grade C embryos, the patient is informed of the associated risk of miscarriage, and is advised to start a new cycle to re-pick embryos for ranking.
With such a screening strategy, unnecessary overuse of embryo biopsies can be avoided, making most IVF cycles safer and easier to implement. Furthermore, this also avoids high IVF cycle abrogation rates due to false positive embryo chromosome screening results (E.Greco et al, health baby after Intrauterine Transfer of pharmaceutical Artificial embryo Blastocysts.N.Engl.J.Med.373,2089-2090 (2015)). Furthermore, chromosomal abnormalities are the first factor in abortion (especially in the case of age mates), and the present strategy may help to avoid early abortion caused by chromosomal abnormalities.

Claims (62)

1. A method of classifying test data, wherein the test data comprises magnitudes for a set of biological characteristics about a human test embryo, the method comprising:
-receiving, on at least one processor, the test data comprising quantities of the set of features, wherein the quantities are a combination of quantities extracted for each feature of the set of features on a broth and/or blastocoel fluid sample from the human test embryo;
-evaluating, on the at least one processor, the test data using a classifier, wherein said classifier has been trained using an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human individual embryo and comprising a magnitude of each feature of said set of features extracted on a broth and/or blastocoel fluid sample of the respective embryo, and each training data vector further comprising a classification label as to whether the respective embryo is euploid or aneuploid;
outputting, using the at least one processor, a classification relating to the human test embryo, wherein said classification is a euploid probability of said embryo,
wherein the set of biological features comprises the following 11 features: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom.
2. A method of classifying test data, wherein the test data comprises magnitudes for a set of biological characteristics about a human test embryo, the method comprising:
-accessing, using at least one processor, an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human individual embryo and comprising a magnitude of each feature of the set of features determined on a culture fluid and/or a blastocoel fluid of the respective embryo, and each training data vector further comprising a classification label as to whether the respective embryo is euploid or aneuploid;
-training a classifier using the electronically stored training data vector;
-receiving, on the at least one processor, said test data, wherein said test data comprises a combination of the quantities determined for each feature of the set of features in a culture fluid and/or blastocoel fluid sample from a human test embryo;
-evaluating, on the at least one processor, the test data using said classifier;
outputting, using the at least one processor, a classification relating to the human test embryo, wherein said classification is a euploid probability of said embryo,
wherein the set of biological features comprises the following 11 features: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom.
3. The method of claim 1 or 2, wherein the method is an ex vivo computer-implemented method.
4. The method of claim 1 or 2, wherein the classification label included in the training data vector reports as "normal" or "abnormal" based on whether an embryo has CNVs greater than a specified chimerism ratio.
5. The method of claim 4, wherein the CNV is at a level of 1M or more or 10M or more.
6. The method of claim 4, wherein a CNV chimerism ratio selected from the group consisting of the following is used as a threshold for class labels and features 10M _ mos CNV, arm _ mos CNV, and chrom _ mos CNV to report as "normal" or "abnormal": 20%, 30%, 40%, 50%, 60%, 70%, 80%, wherein greater than the threshold value, an "anomaly" is reported; if the sum is less than or equal to the threshold value, the state is reported as normal.
7. The method of claim 6, wherein a CNV chimerism ratio of 50% is used as the threshold.
8. The method of claim 1 or 2, wherein the classifier is selected from the group consisting of random forests, AdaBoost, Naive Bayes, Support Vector Machine, Neural Net Neural networks, Genetic Algorithms of Algorithms, Elastic nets, Gradient Boosting trees, Bayesian Neural networks, k-Nearest Neighbor, or combinations thereof.
9. The method of claim 1 or 2, wherein the classifier comprises a random forest.
10. The method of claim 9, wherein the classifier is a random forest classifier comprising 100-600 trees.
11. The method of claim 10, wherein the random forest classifier comprises 100 or more than 100 trees, or comprises 200 or more than 200 trees, or comprises 300 or more than 300 trees, or comprises 400 or more than 400 trees, or comprises 500 trees.
12. The method of claim 1 or 2, wherein the training data set comprises at least 100, 200, 300, 400 or 500 training data vectors.
13. The method of claim 1 or 2, wherein the method further comprises: and drawing an ROC curve of the trained classifier model, and evaluating the efficiency of the model.
14. The method of claim 13, wherein the ROC curve of the model is plotted by cross-validation using a training data set.
15. The method of claim 13, wherein the accuracy of the model is above 0.9.
16. The method of claim 1 or 2, wherein prior to test data classification using the trained classifier, the method further comprises:
using different chimerism ratios as classification labels and thresholds for embryo report "abnormal" in features 10M _ mos CNV, arm _ mos CNV, and chrom _ mos CNV,
-evaluating and comparing the effectiveness of classifier models generated using different thresholds by means of ROC curves,
-selecting a chimerism ratio threshold that achieves optimal model performance for generating the classifier model and the test data classification.
17. The method of claim 1 or 2, wherein the extraction of characteristic quantity values comprises the detection of 10M, arm and chromosome resolution CNV and chimerism ratios thereof on a culture broth and/or blastocoel fluid sample.
18. The method of claim 17, wherein the CNV detection comprises: whole genome amplification, NGS sequencing and sequence copy number analysis were performed on broth and/or blastocoel fluid nucleic acids, thereby identifying CNVs and their chimerism ratios.
19. The method of claim 1 or 2, wherein the culture broth used for extracting the biometric quantity value is an embryo culture broth of D1-D6.
20. The method of claim 19, wherein the embryo culture fluid is an embryo culture fluid of D1-D3 or D3-D5 or D4-D5 or D4-D6 or D5-D6.
21. The method of claim 19, wherein the embryo culture fluid is a blastocyst culture fluid.
22. The method of claim 21, wherein the blastocyst medium is D3-D5, D4-D6, D4-D5, or D5-D6 blastocyst medium.
23. The method of claim 1 or 2, wherein the blastocoel fluid used to extract the biometric quantity is D3-D5, D4-D6, D4-D5, or D5-D6 blastocoel fluid.
24. The method of claim 1 or 2, wherein the method further comprises the step of extracting a biometric quantity value from the culture fluid and/or blastocoel fluid of the test embryo comprising:
obtaining culture broth and/or blastocoel fluid, and testing the 10M, arm and chromosome resolution CNV and its chimerism ratio.
25. The method of claim 24, wherein 10M, arm and chromosome resolution CNVs and chimerism ratios thereof are detected using NGS.
26. The method of claim 1 or 2, wherein the embryo is from IVF.
27. The method of claim 1 or 2, wherein the embryo is an ICSI embryo.
28. A method for predicting embryo euploid probability based on embryo culture solution comprises the following steps:
(a) obtaining a culture solution and/or a blastocoel fluid sample of a test embryo;
(b) measuring 10M, arm, chromosome resolution CNV and the corresponding chimeric proportion in the sample, and extracting the following 11 characteristics of the sample: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, normalNum, Ab _ chromNum, mos _ rate, mos _ type, Ab _ SexChrom, to obtain test data containing the 11 characteristic quantities;
(c) the method of classifying test data according to any one of claims 1 to 27, processing the test data obtained in step (b) to thereby obtain a predicted embryo euploid probability.
29. The method of claim 28, wherein step (c) comprises:
(c1) receiving, on at least one processor, test data comprising magnitudes of the 11 features;
(c2) evaluating, using the at least one processor, the test data by a classifier, wherein the classifier has been trained using an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human embryo and comprising a magnitude for each of the 11 features extracted on a broth and/or blastocoel fluid sample of the respective embryo, and each training data vector further comprising a classification label for whether the respective embryo is euploid or aneuploid;
(c3) outputting, using the at least one processor, a classification relating to the human test embryo based on the evaluating step (c2), wherein the classification is a euploid probability of the embryo.
30. The method of claim 28 or 29, wherein the culture of the test embryo is D1-D6 embryo culture.
31. The method of claim 30, wherein the embryo culture fluid is a culture fluid selected from the group consisting of D1-D3, D3-D5, D4-D5, D4-D6, and D5-D6 embryo culture fluid.
32. The method of claim 28 or 29, wherein, in step (a), a blastocyst broth and/or blastocoel fluid sample of the test embryo is obtained.
33. The method of claim 32, wherein the sample of blastocyst broth and/or blastocyst cavity fluid is blastocyst broth and/or blastocyst cavity fluid from D3-D5, D4-D6, D4-D5, or D5-D6.
34. A method of determining embryo transfer priority based on embryo culture fluid and/or blastocoel fluid, comprising:
-predicting the euploid probability of an embryo from a culture fluid and/or blastocoel fluid sample from said embryo according to the method of any one of claims 1 to 27;
-determining embryo transfer priority based on the obtained probability values.
35. The method of claim 34, wherein when the probability value is greater than 0.90, the embryo is determined to be a first transfer priority;
when the probability value is between 0.90-0.60, the embryo is determined as a second transfer priority;
when the probability value is less than 0.60, the embryo is determined to be a third transfer priority.
36. The method of claim 34, wherein
When the probability value is greater than 0.94, determining the embryo as a first transplantation priority;
when the probability value is between 0.94-0.70, the embryo is determined as a second transfer priority;
when the probability value is less than 0.70, the embryo is determined to be a third transfer priority.
37. The method of claim 35 or 36, wherein the first priority embryo has a chromosomal ploidy pattern selected from the group consisting of: 100% euploid, low chimeric ratio CNV, small chromosome fragment CNV.
38. The method of claim 35 or 36, wherein the third priority embryo has a chromosomal ploidy pattern selected from the group consisting of: high chimerism ratio CNV, and large range segment CNV, where the large range segment CNV is CNV at the arm or chromosome level.
39. The method of claim 38, wherein the extensive CNVs are CNVs at the chromosomal level.
40. The method of claim 34, wherein the method further comprises: and (4) giving suggestions about the transfer sequence of the embryos according to the determined embryo priority.
41. The method of claim 34, wherein the method further comprises: and (4) giving suggestions about the transfer sequence of the embryos according to the determined embryo priority and the morphological rating of the embryos.
42. A method for improving embryo transfer, the method comprising:
the method according to any one of claims 34 to 41, determining transfer priorities of a plurality of test embryos of one IVF cycle;
sorting the plurality of test embryos according to the determined priorities;
providing embryo transfer suggestions.
43. The method of claim 42, wherein the suggesting comprises:
if there is an embryo of the first transfer priority, suggesting a priority transfer of the embryo or suggesting a transfer of an embryo having a higher predicted euploid probability therein;
if there is no embryo of the first transfer priority, then a recommendation is given to transfer an embryo of the second transfer priority;
if only embryos of the third priority are available, the patient is informed of the associated risk of miscarriage, and is advised to start a new cycle to re-pick embryos for ranking.
44. The method of claim 42, wherein said embryo transfer is a single blastocyst resuscitation transfer.
45. The method of any one of claims 42 to 44, wherein the method is for improving the clinical outcome of IVF-ET, and/or for reducing the embryo transfer cycle abrogation rate.
46. The method of claim 45, wherein the clinical outcome for improved IVF-ET is sustained pregnancy rate and miscarriage rate.
47. A method of extracting a set of biological features of a pre-implantation embryo, wherein the method comprises:
on samples of in vitro culture fluid and/or blastocoel fluid from embryos, 10M, arm, chromosome resolution CNV and their corresponding chimerism ratios were measured, and the following 11 characteristic quantities of the samples were extracted: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom,
wherein the method further comprises predicting, using a classifier, a euploid probability of the embryo based on the magnitude of the extracted biological feature.
48. The method of claim 47, wherein the method further comprises:
-receiving, on at least one processor, test data comprising magnitudes of the 11 features;
-evaluating, using the at least one processor, the test data by a classifier, wherein said classifier has been trained using an electronically stored set of training data vectors, each training data vector of the set of training data vectors representing a respective human individual embryo and comprising a magnitude for each of said 11 features extracted on a culture broth of the respective embryo, and each training data vector further comprising a classification label as to whether the respective embryo is euploid or aneuploid;
-outputting, using the at least one processor, a classification regarding the human test embryo based on the evaluating step, wherein the classification is a euploid probability of the embryo.
49. The method of claim 47, wherein the sample is a blastocyst broth sample.
50. The method of claim 47, wherein the classifier is a random forest classifier.
51. Use of a reagent for detecting CNVs in a culture fluid and/or blastocoel fluid sample of a pre-implantation embryo to extract the magnitude of 11 features of the sample, wherein the 11 features are 10M CNV,10M _ mos CNV, arm mos CNV, NormalNum, Ab _ chromum, arm _ rate, arm _ type, Ab _ SexChrom, in the preparation of a kit, device or system for predicting the euploid probability of an embryo in a method according to any of claims 28-33 and/or for determining the embryo transfer priority in a method according to any of claims 34-46.
52. A system for classifying test data or for predicting embryo euploidy probability, the system comprising:
-at least one processor coupled to an electronic storage means containing an electronic representation of a classifier, the classifier being a trained classifier,
wherein the classifier is trained using an electronically stored set of training data vectors, wherein each training data vector of the set of training data vectors represents a respective human individual embryo and comprises a magnitude for each of 11 features extracted on a culture of the respective embryo, and each training data vector further comprises a classification label for whether the respective embryo is euploid or aneuploid, wherein the 11 features are: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom;
wherein the processor is configured to receive test data comprising a magnitude for each of the 11 features extracted on a culture of a test embryo,
wherein the processor is further configured to evaluate the test data using the electronic representation of the classifier and output a classification of a corresponding embryo in the culture fluid from which the test data originated based on the evaluation, wherein the classification is a predicted embryo euploid probability value.
53. The system of claim 52, further comprising: a sequencing module for receiving a nucleic acid sample and providing nucleic acid sequence information from the sample about the broth and/or blastocoel fluid, and a CNV analysis and output module for identifying CNVs from the nucleic acid sequence information and outputting CNV information about CNV type and chimerism.
54. The system of claim 53, wherein the sequencing module is a sequencer.
55. The system of claim 53, wherein the sequencing module is an NGS sequencer.
56. The system of claim 52, wherein the processor is further configured to receive training data vectors and train a random forest classifier using the training data vectors, and generate a random forest model for predicting embryo euploidy probabilities.
57. A system for embryo transfer priority assessment, comprising:
-an embryo euploid probability prediction module capable of performing the steps of the classification method according to any one of claims 1 to 27;
-an embryo transfer prioritization module capable of performing the steps of the embryo transfer prioritization method of any one of claims 34 to 41.
58. The system of claim 57, further comprising:
-a presentation module capable of presenting the transfer priority of the test embryo.
59. The system of claim 58, wherein said presentation module is further capable of giving embryo transfer recommendations based on said priorities.
60. A non-transitory computer readable medium having stored thereon computer program instructions for execution by a computer or computer system to carry out the steps of the classification method according to any one of claims 1 to 27 or the prediction method according to any one of claims 28 to 33 or the embryo prioritization method according to any one of claims 34 to 41.
61. Use of a system according to any one of claims 52-59 or a non-transitory computer readable medium of claim 60 in the preparation of a product for classifying test data in the method of any one of claims 1-27, or for predicting embryo euploid probability in the method of any one of claims 28-33, or for determining embryo transfer priority in the method of any one of claims 34-41, or for improving embryo transfer in the method of any one of claims 42-46.
62. The use of claim 61, wherein the system or non-transitory computer readable medium, in combination with a reagent for detecting CNV in a culture fluid and/or blastocoel fluid sample of a preimplantation embryo to extract a measure of 11 features of the sample, wherein the 11 features are: 10M CNV,10M _ mos CNV, arm CNV, arm _ mos CNV, chrom CNV, chrom _ mos CNV, NormalNum, Ab _ ChromNum, mos _ rate, mos _ type, Ab _ SexChrom.
CN202010705776.9A 2020-07-21 2020-07-21 System and method for non-invasive embryo transfer priority rating Active CN112582022B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010705776.9A CN112582022B (en) 2020-07-21 2020-07-21 System and method for non-invasive embryo transfer priority rating
PCT/CN2021/107600 WO2022017414A1 (en) 2020-07-21 2021-07-21 System and method for grading non-invasive embryo transplantation priorities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010705776.9A CN112582022B (en) 2020-07-21 2020-07-21 System and method for non-invasive embryo transfer priority rating

Publications (2)

Publication Number Publication Date
CN112582022A CN112582022A (en) 2021-03-30
CN112582022B true CN112582022B (en) 2021-11-23

Family

ID=75119411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010705776.9A Active CN112582022B (en) 2020-07-21 2020-07-21 System and method for non-invasive embryo transfer priority rating

Country Status (2)

Country Link
CN (1) CN112582022B (en)
WO (1) WO2022017414A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112582022B (en) * 2020-07-21 2021-11-23 序康医疗科技(苏州)有限公司 System and method for non-invasive embryo transfer priority rating
KR20240033181A (en) * 2021-11-23 2024-03-12 써니 쑨 Method and system for predicting the results of embryo transfer in artificial reproduction using artificial intelligence
CN114752552A (en) * 2022-05-07 2022-07-15 序康医疗科技(苏州)有限公司 Recovery, culture, optimization and single-capsule embryo transplantation method of frozen embryo and noninvasive detection method for recovering frozen embryo
CN115098740B (en) * 2022-07-25 2022-11-04 广州市海捷计算机科技有限公司 Data quality detection method and device based on multi-source heterogeneous data source
CN117230175A (en) * 2023-06-21 2023-12-15 广州序源医学科技有限公司 Embryo preimplantation genetics detection method based on third generation sequencing
CN117237324B (en) * 2023-10-09 2024-03-29 苏州博致医疗科技有限公司 Non-invasive euploid prediction method and system

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003903417A0 (en) * 2003-07-04 2003-07-17 Genera Biosystems Pty Ltd Multiplex detection
CA2641132A1 (en) * 2008-10-03 2010-04-03 Richard T. Scott, Jr. Improvements in in vitro fertilization
WO2012116185A1 (en) * 2011-02-23 2012-08-30 The Board Of Trustees Of The Leland Stanford Junior University Methods of detecting aneuploidy in human embryos
CA3205430A1 (en) * 2013-10-04 2015-04-09 Sequenom, Inc. Methods and processes for non-invasive assessment of genetic variations
CN104745718B (en) * 2015-04-23 2018-02-16 北京中仪康卫医疗器械有限公司 A kind of method for detecting human embryos microdeletion and micro- repetition
CN105368936B (en) * 2015-11-05 2021-07-30 序康医疗科技(苏州)有限公司 Method for detecting embryo chromosome abnormality by using blastocyst culture solution
CN105861658B (en) * 2016-04-12 2020-07-28 中国科学院北京基因组研究所 Noninvasive detection method for screening excellent-development blastocysts
CN106086199A (en) * 2016-07-05 2016-11-09 上海序康医疗科技有限公司 A kind of method that blastocyst culture liquid detection embryo chromosome utilized without zona pellucida is abnormal
KR20200060410A (en) * 2017-09-07 2020-05-29 쿠퍼제노믹스, 인크. Methods and systems for non-invasive pre-implantation genetic diagnosis (SYSTEMS AND METHODS FOR NON-INVASIVE PREIMPLANTATION GENETIC DIAGNOSIS)
CN108763859B (en) * 2018-05-17 2020-11-24 北京博奥医学检验所有限公司 Method for establishing analog data set required for providing CNV detection based on unknown CNV sample
CN109536581B (en) * 2018-12-29 2022-02-01 序康医疗科技(苏州)有限公司 Method for detecting health condition of embryo by using blastocyst culture solution and product
CN110423804B (en) * 2019-08-12 2022-09-20 中国福利会国际和平妇幼保健院 Biomarker set for screening remaining abortion risk and screening method
CN111154851A (en) * 2020-01-19 2020-05-15 苏州贝康医疗器械有限公司 Embryo implantation pre-chromosome aneuploidy detection reference product based on high-throughput sequencing and preparation method thereof
CN111172259A (en) * 2020-03-09 2020-05-19 云南省第一人民医院 Embryo chromosome detection method for blastomere period of embryo culture solution
CN112582022B (en) * 2020-07-21 2021-11-23 序康医疗科技(苏州)有限公司 System and method for non-invasive embryo transfer priority rating

Also Published As

Publication number Publication date
WO2022017414A1 (en) 2022-01-27
CN112582022A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112582022B (en) System and method for non-invasive embryo transfer priority rating
Zaninovic et al. Artificial intelligence in human in vitro fertilization and embryology
US9348972B2 (en) Method of assessing risk of multiple births in infertility treatments
Liu et al. Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor
JP2022087297A (en) Apparatuses, methods, and systems for image-based human embryo cell classification
Miyagi et al. Feasibility of artificial intelligence for predicting live birth without aneuploidy from a blastocyst image
Seli et al. Receiver operating characteristic (ROC) analysis of day 5 morphology grading and metabolomic Viability Score on predicting implantation outcome
Adolfsson et al. Morphology vs morphokinetics: a retrospective comparison of inter-observer and intra-observer agreement between embryologists on blastocysts with known implantation outcome
Riegler et al. Artificial intelligence in the fertility clinic: status, pitfalls and possibilities
Hourvitz et al. Role of embryo quality in predicting early pregnancy loss following assisted reproductive technology
Borup et al. Competence classification of cumulus and granulosa cell transcriptome in embryos matched by morphology and female age
Chen et al. Selecting the embryo with the highest implantation potential using a data mining based prediction model
Ahlstrom et al. Conventional morphology performs better than morphokinetics for prediction of live birth after day 2 transfer
Gil et al. Screening for trisomies 21 and 18 in a Spanish public hospital: from the combined test to the cell-free DNA test
Sivanantham et al. Morphology of inner cell mass: a better predictive biomarker of blastocyst viability
Milan et al. Fetal sex determination in twin pregnancies using cell free fetal DNA analysis
Tian et al. Predicting pregnancy rate following multiple embryo transfers using algorithms developed through static image analysis
Chen et al. Pregnancy from mosaic embryo transfer: genetic counseling considerations
Jiang et al. Non-invasive genetic screening: current advances in artificial intelligence for embryo ploidy prediction
Totonchi et al. Preimplantation genetic screening and the success rate of in vitro fertilization: a three-years study on Iranian population
US20120253685A1 (en) Testing process
WO2021243650A1 (en) Method for determining pregnancy status of pregnant woman
Ashoor Al Mahri et al. Evolution in screening for Down syndrome
WO2022029484A1 (en) Methods of assessing breast cancer using circulating hormone receptor transcripts
Boddupally et al. Artificial Intelligence for Prenatal Chromosome Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Zou Yangyun

Inventor after: Yao Yaxin

Inventor after: Lu Sijia

Inventor after: Bao Shiping

Inventor after: Xia Yingying

Inventor after: Yao Bing

Inventor after: Chen Li

Inventor before: Zou Yangyun

Inventor before: Yao Yaxin

Inventor before: Lu Sijia

Inventor before: Bao Shiping

Inventor before: Xia Yingying

CB03 Change of inventor or designer information