WO2008115497A2 - Système et procédé pour nettoyer des données génétiques bruyantes et déterminer un numéro de copie de chromosome - Google Patents

Système et procédé pour nettoyer des données génétiques bruyantes et déterminer un numéro de copie de chromosome Download PDF

Info

Publication number
WO2008115497A2
WO2008115497A2 PCT/US2008/003547 US2008003547W WO2008115497A2 WO 2008115497 A2 WO2008115497 A2 WO 2008115497A2 US 2008003547 W US2008003547 W US 2008003547W WO 2008115497 A2 WO2008115497 A2 WO 2008115497A2
Authority
WO
WIPO (PCT)
Prior art keywords
genetic
target individual
individual
genetic data
data
Prior art date
Application number
PCT/US2008/003547
Other languages
English (en)
Other versions
WO2008115497A3 (fr
Inventor
Matthew Rabinowitz
Josh Sweetkind-Singer
Milena Banjevic
David Scott Johnson
Dusan Kijacic
Dimitri Petrov
Jing Xu
Zachary P. Demko
Original Assignee
Gene Security Network
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gene Security Network filed Critical Gene Security Network
Priority to EP08742125A priority Critical patent/EP2140386A2/fr
Priority to CN2008800161237A priority patent/CN101790731B/zh
Publication of WO2008115497A2 publication Critical patent/WO2008115497A2/fr
Publication of WO2008115497A3 publication Critical patent/WO2008115497A3/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • the invention relates generally to the field of acquiring, manipulating and using genetic data for medically predictive purposes, and specifically to a system in which imperfectly measured genetic data of a target individual are made more accurate by using known genetic data of genetically related individuals, thereby allowing more effective identification of genetic variations, specifically aneuploidy and disease linked genes, that could result in various phenotypic outcomes.
  • PGD Planar Biharmonic Deformation
  • chromosomal abnormalities such as aneuploidy and balanced translocations
  • the other main focus of PGD is for genetic disease screening, with the primary outcome being a healthy baby not afflicted with a genetically heritable disease for which one or both parents are carriers.
  • the likelihood of the desired outcome is enhanced by excluding genetically suboptimal embryos from transfer and implantation in the mother.
  • the process of PGD during IVF currently involves extracting a single cell from the roughly eight cells of an early-stage embryo for analysis. Isolation of single cells from human embryos, while highly technical, is now routine in IVF clinics. Both polar bodies and blastomeres have been isolated with success.
  • the most common technique is to remove single blastomeres from day 3 embryos (6 or 8 cell stage). Embryos are transferred to a special cell culture medium (standard culture medium lacking calcium and magnesium), and a hole is introduced into the zona pellucida using an acidic solution, laser, or mechanical techniques. The technician then uses a biopsy pipette to remove a single blastomere with a visible nucleus.
  • the technician uses a biopsy pipette to remove a single blastomere with a visible nucleus.
  • Features of the DNA of the single (or occasionally multiple) blastomere are measured using a variety of techniques. Since only a single copy of the DNA is available from one cell, direct measurements of the DNA are highly error-prone, or noisy. There is a great need for a technique that can correct, or make more accurate, these noisy genetic measurements.
  • chromosomes normal humans have two sets of 23 chromosomes in every diploid cell, with one copy coming from each parent.
  • Aneuploidy the state of a cell with extra or missing chromosome(s), and uniparental disomy, the state of a cell with two of a given chromosome both of which originate from one parent, are believed to be responsible for a large percentage of failed implantations and miscarriages, and some genetic diseases.
  • Detection of chromosomal abnormalities can identify individuals or embryos with conditions such as Down syndrome, Klinefelter' s syndrome, and Turner syndrome, among others, in addition to increasing the chances of a successful pregnancy.
  • chromosomal abnormalities is especially important as the age of a potential mother increases: between the ages of 35 and 40 it is estimated that between 40% and 50% of the embryos are abnormal, and above the age of 40, more than half of the embryos are like to be abnormal.
  • the main cause of aneuploidy is nondisjunction during meiosis. Maternal nondisjunction constitutes 88% of all nondisjunction of which 65% occurs in meiosis I and 23% in meiosis II.
  • Common types of human aneuploidy include trisomy from meiosis I nondisjunction, monosomy, and uniparental disomy.
  • M2 trisomy In a particular type of trisomy that arises in meiosis II nondisjunction, or M2 trisomy, an extra chromosome is identical to one of the two normal chromosomes. M2 trisomy is particularly difficult to detect. There is a great need for a better method that can detect for many or all types of aneuploidy at most or all of the chromosomes efficiently and with high accuracy.
  • FISH fluorescent in situ hybridization
  • the system disclosed enables the cleaning of incomplete or noisy genetic data using secondary genetic data as a source of information, and also the determination of chromosome copy number using said genetic data. While the disclosure focuses on genetic data from human subjects, and more specifically on as-yet not implanted embryos or developing fetuses, as well as related individuals, it should be noted that the methods disclosed apply to the genetic data of a range of organisms, in a range of contexts.
  • the techniques described for cleaning genetic data are most relevant in the context of pre- implantation diagnosis during in-vitro fertilization, prenatal diagnosis in conjunction with amniocentesis, chorion villus biopsy, fetal tissue sampling, and non-invasive prenatal diagnosis, where a small quantity of fetal genetic material is isolated from maternal blood.
  • the use of this method may facilitate diagnoses focusing on inheritable diseases, chromosome copy number predictions, increased likelihoods of defects or abnormalities, as well as making predictions of susceptibility to various disease-and non-disease phenotypes for individuals to enhance clinical and lifestyle decisions.
  • the invention addresses the shortcomings of prior art that are discussed above.
  • methods make use of knowledge of the genetic data of the mother and the father such as diploid tissue samples, sperm from the father, haploid samples from the mother or other embryos derived from the mother's and father's gametes, together with the knowledge of the mechanism of meiosis and the imperfect measurement of the embryonic DNA, in order to reconstruct, in silico, the embryonic DNA at the location of key loci with a high degree of confidence.
  • genetic data derived from other related individuals such as other embryos, brothers and sisters, grandparents or other relatives can also be used to increase the fidelity of the reconstructed embryonic DNA. It is important to note that the parental and other secondary genetic data allows the reconstruction not only of SNPs that were measured poorly, but also of insertions, deletions, and of SNPs or whole regions of DNA that were not measured at all.
  • the fetal or embryonic genomic data which has been reconstructed, with or without the use of genetic data from related individuals can be used to detect if the cell is aneuploid, that is, where fewer or more than two of a particular chromosome is present in a cell.
  • the reconstructed data can also be used to detect for uniparental disomy, a condition in which two of a given chromosome are present, both of which originate from one parent. This is done by creating a set of hypotheses about the potential states of the DNA, and testing to see which hypothesis has the highest probability of being true given the measured data.
  • the use of high throughput genotyping data for screening for aneuploidy enables a single blastomere from each embryo to be used both to measure multiple disease-linked loci as well as to screen for aneuploidy.
  • the direct measurements of the amount of genetic material, amplified or unamplified, present at a plurality of loci can be used to detect for monosomy, uniparental disomy, trisomy and other aneuploidy states. The idea behind this method is that measuring the amount of genetic material at multiple loci will give a statistically significant result.
  • the measurements, direct or indirect, of a particular subset of SNPs can be used to detect for chromosomal abnormalities by looking at the ratios of maternally versus paternally miscalled homozygous loci on the embryo.
  • homozygous loci the ratios of maternally versus paternally miscalled homozygous loci on the embryo.
  • Allele drop outs at those loci are random, and a shift in the ratio of loci miscalled as homozygous can only be due to incorrect chromosome number.
  • the goal of the disclosed system is to provide highly accurate genomic data for the purpose of genetic diagnoses.
  • the disclosed system makes use of the expected similarities between the genetic data of the target individual and the genetic data of related individuals, to clean the noise in the target genome. This is done by determining which segments of chromosomes of related individuals were involved in gamete formation and, when necessary where crossovers may have occurred during meiosis, and therefore which segments of the genomes of related individuals are expected to be nearly identical to sections of the target genome. In certain situations this method can be used to clean noisy base pair measurements on the target individual, but it also can be used to infer the identity of individual base pairs or whole regions of DNA that were not measured.
  • the target individual is an embryo, and the purpose of applying the disclosed method to the genetic data of the embryo is to allow a doctor or other agent to make an informed choice of which embryo(s) should be implanted during IVF.
  • the target individual is a fetus, and the purpose of applying the disclosed method to genetic data of the fetus is to allow a doctor or other agent to make an informed choice about possible clinical decisions or other actions to be taken with respect to the fetus.
  • SNP Single Nucleotide Polymorphism
  • Locus a particular region of interest on the DNA of an individual, which may refer to a SNP, the site of a possible insertion or deletion, or the site of some other relevant genetic variation.
  • Disease-linked SNPs may also refer to disease-linked loci.
  • To call an allele to determine the state of a particular locus of DNA. This may involve calling a SNP, or determining whether or not an insertion or deletion is present at that locus, or determining the number of insertions that may be present at that locus, or determining whether some other genetic variant is present at that locus.
  • Correct allele call An allele call that correctly reflects the true state of the actual genetic material of an individual.
  • To clean genetic data to take imperfect genetic data and correct some or all of the errors or fill in missing data at one or more loci. In the context of this disclosure, this involves using genetic data of related individuals and the method described herein. To increase the fidelity of allele calls: to clean genetic data.
  • Imperfect genetic data genetic data with any of the following: allele dropouts, uncertain base pair measurements, incorrect base pair measurements, missing base pair measurements, uncertain measurements of insertions or deletions, uncertain measurements of chromosome segment copy numbers, spurious signals, missing measurements, other errors, or combinations thereof.
  • noisy genetic data imperfect genetic data, also called incomplete genetic data.
  • Uncleaned genetic data genetic data as measured, that is, where no method has been used to correct for the presence of noise or errors in the raw genetic data; also called crude genetic data. Confidence: the statistical likelihood that the called SNP, allele, set of alleles, or determined number of chromosome segment copies correctly represents the real genetic state of the individual.
  • PS Parental Support: a name sometimes used for the any of the methods disclosed herein, where the genetic information of related individuals is used to determine the genetic state of target individuals. In some cases, it refers specifically to the allele calling method, sometimes to the method used for cleaning genetic data, sometimes to the method to determine the number of copies of a segment of a chromosome, and sometimes to some or all of these methods used in combination.
  • CNC Copy Number Calling
  • Qualitative CNC also qCNC: the name given to the method in this disclosure used to determine chromosome copy number in a cell that makes use of qualitative measured genetic data of the target individual and of related individuals.
  • Mucigenic affected by multiple genes, or alleles. Direct relation: mother, father, son, or daughter.
  • Chromosomal Region a segment of a chromosome, or a full chromosome.
  • Segment of a Chromosome a section of a chromosome that can range in size from one base pair to the entire chromosome.
  • Chromosome may refer to either a full chromosome, or also a segment or section of a chromosome.
  • the number of copies of a chromosome segment may refer to identical copies, or it may refer to non-identical copies of a chromosome segment wherein the different copies of the chromosome segment contain a substantially similar set of loci, and where one or more of the alleles are different. Note that in some cases of aneuploidy, such as the M2 copy error, it is possible to have some copies of the given chromosome segment that are identical as well as some copies of the same chromosome segment that are not identical.
  • Haplotypic Data also called 'phased data' or Ordered genetic data;' data from a single chromosome in a diploid or polyploid genome, i.e., either the segregated maternal or paternal copy of a chromosome in a diploid genome.
  • Unordered Genetic Data pooled data derived from measurements on two or more chromosomes in a diploid or polyploid genome, i.e., both the maternal and paternal copies of a chromosome in a diploid genome.
  • Genetic data 'in ', 'of, 'at ' or 'on ' an individual These phrases all refer to the data describing aspects of the genome of an individual. It may refer to one or a set of loci, partial or entire sequences, partial or entire chromosomes, or the entire genome. Hypothesis: a set of possible copy numbers of a given set of chromosomes, or a set of possible genotypes at a given set of loci. The set of possibilities may contain one or more elements.
  • Target Individual the individual whose genetic data is being determined. Typically, only a limited amount of DNA is available from the target individual. In one context, the target individual is an embryo or a fetus.
  • Related Individual any individual who is genetically related, and thus shares haplotype blocks, with the target individual.
  • Platform response a mathematical characterization of the input/output characteristics of a genetic measurement platform, such as Taqman or Infinium.
  • the input to the channel is the true underlying genotypes of the genetic loci being measured.
  • the channel output could be allele calls (qualitative) or raw numerical measurements
  • the platform response consists of an error transition matrix that describes the conditional probability of seeing a particular output genotype call given a particular true genotype input.
  • the platform response is a conditional probability density function that describes the probability of the numeric outputs given a particular true genotype input.
  • Copy number hypothesis a hypothesis about how many copies of a particular chromosome segment are in the embryo. In a preferred embodiment, this hypothesis consists of a set of sub-hypotheses about how many copies of this chromosome segment were contributed by each related individual to the target individual.
  • e (e, ,e 2 ) be the true, unknown, ordered SNP information on the embryo, e,,e 2 e A" .
  • e to be the genetic haploid information inherited from the father and e 2 to be the genetic haploid information inherited from the mother.
  • e, (e l( ,e 2 ,)to denote the ordered pair of alleles at the i-th position of e .
  • g be the true, unknown, haploid information on a single sperm from the father.
  • e (e,,e 2 ) be the estimate of e that is sought, e,,e 2 e A" .
  • a crossover map it is meant an n-tuple ⁇ e ⁇ 1,2 ⁇ " that specifies how a haploid pair such as (f,,f 2 ) recombines to form a gamete such as e, .
  • f, ACAAACCC
  • f 2 CAACCACA
  • «9 1 1 1 11222.
  • #(f,,f 2 ) ACAAAACA .
  • the aneuploidy calling method could be first employed to ensure that the embryo is indeed euploid and only then would the allele calling method be employed, or the aneuploidy calling method could be used to determine how many chromosome copies were derived from each parent and only then would the allele calling method be employed. It should also be obvious to one skilled in the art how this method could be modified in the case of a sex chromosome where there is only one copy of a chromosome present.
  • MAP maximum a posteriori
  • MCMC reversible-jump Markov Chain Monte Carlo
  • conditional probability distributions mentioned above can vary widely from experiment to experiment, depending on various factors in the lab such as variations in the quality of genetic samples, or variations in the efficiency of whole genome amplification, or small variations in protocols used.
  • these conditional probability distributions are estimated on a per-experiment basis.
  • e,) we focus in later sections of this disclosure on estimating P(e t
  • the distributions can each be modeled as belonging to a parametric family of distributions whose particular parameter values vary from experiment to experiment. As one example among many, it is possible to implicitly model the conditional probability distribution P( ⁇ 1 I e 1 ) SLS being parameterized by an allele dropout parameter pa and an allele dropin parameter p a .
  • the values of these parameters might vary widely from experiment to experiment, and it is possible to use standard techniques such as maximum likelihood estimation, MAP estimation, or Bayesian inference, whose application is illustrated at various places in this document, to estimate the values that these parameters take on in any individual experiment.
  • the key is to find the set of parameter values that maximizes the joint probability of the parameters and the data, by considering all possible tuples of parameter values within a region of interest in the parameter space. As described elsewhere in the document, this approach can be implemented when one knows the chromosome copy number of the target genome, or when one doesn't know the copy number call but is exploring different hypotheses.
  • non-parameteric methods it is also possible to use non-parameteric methods to estimate the above conditional probability distributions on a per-experiment basis. Nearest neighbor methods, smoothing kernels, and similar non-parameteric methods familiar to those skilled in the art are some possibilities. Although this disclosure focuses parametric estimation methods, use of non- parameteric methods to estimate these conditional probability distributions would not change the fundamental concept of the invention. The usual caveats apply: parametric methods may suffer from model bias, but have lower variance. Non-parametric methods tend to be unbiased, but will have higher variance.
  • the algorithm for allele calling can be structured so that it can be executed in a more computationally efficient fashion.
  • the equations are re-derived for allele-calling via the MAP method, this time reformulating the equations so that they reflect such a computationally efficient method of calculating the result.
  • X*, Y*, Z* E ⁇ A,C ⁇ wX are the true ordered values on the mother, father, and embryo respectively.
  • H* ⁇ ⁇ A,C ⁇ nxh are true values on ft sperm samples.
  • B* ⁇ ⁇ A,C ⁇ nxhx2 are true ordered values on b blastomeres.
  • D ⁇ x,y,z,B,H ⁇ is the set of unordered measurement data on father, mother, embryo, b blastomeres and ft sperm samples.
  • D i - ⁇ x i ,y i ,z i ,H i ,B i ⁇ is the data set restricted to the i- th SNP.
  • T G (AQ 4 represents a candidate 4-tuple of ordered values on both the mother and father at a particular locus.
  • Z 1 G ⁇ A.C ⁇ 2 is the estimated ordered embryo value at SNP ..
  • Q (2 + 2b + / ⁇ ) is the effective number of haploid chromosomes being measured, excluding the parents. Any hypothesis about the parental origin of all measured data (excluding the parents themselves) requires that Q crossover maps be specified.
  • ⁇ G ⁇ l,2 ⁇ nxQ is a crossover map matrix, representing a hypothesis about the parental origin of all measured data, excluding the parents. Note that there are 2 ⁇ Q different crossover matrices.
  • ⁇ . : ⁇ is the matrix restricted to the i-th row. Note that there are 2 Q vector values that the i-th row can take on, from the set ⁇ G ⁇ 1,2 ⁇ Q .
  • f(x;y,zy is a function of ⁇ x,y,z) that is being treated as a function of just x.
  • the values behind the semi-colon are constants in the context in which the function is being evaluated.
  • aneuploidy can be detected using the quantitative data output from the PS method discussed in this patent.
  • CNC Calling Calling
  • ct values and chromosomal copy number are modeled as follows: j n tn j s eX p ress j onj n ⁇ . j s the copy number of chromosome /.
  • Q is an abstract quantity representing a baseline amount of pre-amplified genetic material from which the actual amount of pre-amplified genetic material at SNP i, chromosome j can be calculated as a ⁇ n j Q.
  • a a is a preferential amplification factor that specifies how much more SNP i on chromosome j will be pre-amplified via MDA than SNP 1 on chromosome 1.
  • ⁇ tj is the doubling rate for SNP i chromosome j under PCR.
  • t ⁇ is the ct value.
  • Q ⁇ is the amount of genetic material at which the ct value is determined.
  • T is a symbol, not an index, and merely stands for threshold.
  • ⁇ y , ⁇ ⁇ and Q 7 are constants of the model that do not change from experiment to experiment.
  • ri j and Q are variables that change from experiment to experiment.
  • Q is the amount of material there would be at SNP 1 of chromosome 1, if chromosome 1 were monosomic.
  • the original equation above does not contain a noise term. This can be included by rewriting it as follows: The above equation indicates that the ct value is corrupted by additive Gaussian noise Z tJ -. Let the variance of this noise term be ⁇ .
  • the maximum likelihood estimation is used, with respect to the model described above, to determine n y -
  • the parameter Q makes this difficult unless another constraint is added:
  • the first equation can be interpreted as a log estimate of the quantity of chromosome j.
  • the second equation can be interpreted as saying that the average of the Q j is the average of a diploid quantity; subtracting one from its log gives the desired monosome quantity.
  • the third equation can be interpreted as saying that the copy number is just the ratio ⁇ t. Note that Ti 7 - is a 'double difference', since it is a difference of Q-values, each of which is itself a difference of values.
  • ⁇ t,- y - ⁇ be the regularized ct values obtained from MDA pre-amplification followed by PCR on the genetic sample.
  • L tj is the ct value on the i-th SNP of the j-th chromosome.
  • t j the vector of ct values associated with the j-th chromosome.
  • the i-th component of the matched filter i is given by:
  • the ideal matched filter is - 1.
  • the vector 1 can be used as the matched filter. This is equivalent to simply taking the average of the components of t ⁇ .
  • the matched filter paradigm is not necessary if the underlying biochemistry follows the simple model.
  • the i-th component of the matched filter f is given by:
  • the given data consists of (i) the data about the parental SNP states, measured with a high degree of accuracy, and (ii) measurements on all of the SNPs in a specific blastomere, measured poorly.
  • U - is any specific homozygote
  • U is the other homozygote at that SNP
  • H is the heterozygote.
  • the goal is to determine the probabilities (pi j ) shown in Table 2. For instance pn is the probability of the embryonic DNA being U and the readout being U as well. There are three conditions that these probabilities have to satisfy:
  • Probabilities p 3l and p 4 can be written out in terms of pi, and P 2 ,.
  • A is shown in Table 4, where empty cells have zeroes.
  • the observations come in the same 16 types as p u . These are shown in Table 5.
  • the likelihood of making a set of these 16 ny observations is defined by a multinomial distribution with the probabilities p ⁇ and is proportional to: Note that the full likelihood function contains multinomial coefficients that are not written out given that these coefficients do not depend on P and thus do not change the values within P at which L is maximized. The problem is then to find:
  • the PS method can be applied to determine the number of copies of a given chromosome segment in a target without using parental genetic information.
  • a maximum a-posteriori (MAP) method is described that enables the classification of genetic allele information as aneuploid or euploid.
  • the method does not require parental data, though when parental data are available the classification power is enhanced.
  • the method does not require regularization of channel values.
  • the method will be applied to ct values from TaqMan measurements; it should be obvious to one skilled in the art how to apply this method to any kind of measurement from any platform.
  • the description will focus on the case in which there are measurements on just chromosomes X and 7; again, it should be obvious to one skilled in the art how to apply the method to any number of chromosomes and sections of chromosomes.
  • the given measurements are from triploid blastomeres, on chromosomes X and 7, and the goal is to successfully make aneuploidy calls on these.
  • the only "truth" known about these blastomeres is that there must be three copies of chromosome 7.
  • the number of copies of chromosome X is not known.
  • the strategy here is to use MAP estimation to classify the copy number N-, of chromosome 7 from among the choices ⁇ 1,2,3 ⁇ given the measurements D. Formally that looks like this: D)
  • N 7 is the copy number of chromosome seven. It is a random variable, ⁇ ? denotes a potential value for N 7 ,
  • N x is the copy number of chromosome X.
  • ⁇ x denotes a potential value for N x ,
  • N J is the copy number of chromosome-j, where for the purposes here / G (7,.Y].
  • denotes a potential value for N j .
  • D is the set of all measurements. In one case, these are TaqMan measurements on chromosomes X and 7, so this gives is the set of
  • tf j is the ct value on channel- A of locus i of chromosome-/. Similarly, tf j is the ct value on channel- C of locus i of chromosome-/.
  • A is just a logical name and denotes the major allele value at the locus, while C denotes the minor allele value at the locus.
  • Q represents a unit-amount of genetic material such that, if the copy number of chromosome-/ is n,-, then the total amount of genetic material at any locus of chromosome-/ is ri j Q.
  • n A ,n c denotes an unordered allele patterns at a locus when the copy number for the associate chromosome is n.
  • n A is the number of times allele A appears on the locus and n c is the number of times allele C appears on the locus.
  • Each can take on values in 0, —,n, and it must be the case that n A +n c n.
  • the set of allele patterns is ⁇ (0,3), (1,2), (2,1), (3,0) ⁇ .
  • the allele pattern (2,1) for example corresponds to a locus value of A 2 C, i.e., that two chromosomes have allele value A and the third has an allele value of C at the locus.
  • the set of allele patterns is ⁇ (0,2), (1,1), (2,0) ⁇ .
  • the set of allele patterns is ⁇ (0,1), (1,0) ⁇ .
  • N 1 and N x are independent.
  • the goal is to classify the copy number of a designated chromosome.
  • the chromosome a designated chromosome
  • Equation (*) depends on being able to calculate values for P(n A ,n c In 71 I) and P(n A ,n c ⁇ n x ,i). These values may be calculated by assuming that the allele pattern (n A ,n c ) is drawn i.i.d (independent and identically distributed) according to the allele frequencies for its letters at locus i. An example should suffice to illustrate this. Calculate / 1 ((2,l)
  • n ? 3) under the assumption that the allele frequency for A is 60%, and the minor allele frequency for C is 40%. (As an aside, note that P((2,l)
  • Equation (*) depends on being able to calculate values for P(t A ⁇ Q,n A ) and
  • variable D is not really a variable. It is always a constant set to the value of the data set actually in question, so it does not introduce another array dimension when representing in Matlab. However, the variables D j do introduce an array dimension, due to the presence of the index /.
  • the disclosed method enables one to make aneuploidy calls on each chromosome of each blastomere, given multiple blastomeres with measurements at some loci on all chromosomes, where it is not known how many copies of each chromosome there are.
  • the a MAP estimation is used to classify the copy number N j of chromosome where / G ⁇ 1,2...22, ⁇ T, Y), from among the choices ⁇ 0,1,2,3 ⁇ given the measurements D, which includes both genotyping information of the blastomeres and the parents.
  • N j the copy number of j of chromosome where / G ⁇ 1,2...22, ⁇ T, Y
  • the measurements D which includes both genotyping information of the blastomeres and the parents.
  • N a is the copy number of autosomal chromosome a, where a E ⁇ 1,2...22 ⁇ . It is a random variable.
  • n Q denotes a potential value for N a .
  • N x is the copy number of chromosome X.
  • n x denotes a potential value for N x .
  • N j is the copy number of chromosome-j, where for the purposes here / E ⁇ 1,2...m ⁇ .
  • n j denotes a potential value for N j .
  • H is the set of aneuploidy states, h E H.
  • Paternal monosomy means the only existing chromosome came from the father; paternal trisomy means there is one additional chromosome coming from father.
  • Type 1 (tl) paternal trisomy is such that the two paternal chromosomes are sister chromosomes (exact copy of each other) except in case of crossover, when a section of the two chromosomes are the exact copies.
  • Type 2 (t2) paternal trisomy is such that the two paternal chromosomes are complementary chromosomes
  • D is the set of all measurements including measurements on embryo D E and on parents O P ,D M .
  • these are TaqMan measurements on all chromosomes
  • D k ( ⁇ B ⁇ .Dp ⁇ D MJc ), & Bj * s *h e set of TaqMan measurements on chromosome j.
  • t Eij is the ct value on channel- A of locus i of chromosome-/.
  • t E i] - is the ct value on channel- C of locus i of chromosome-/.
  • A is just a logical name and denotes the major allele value at the locus, while C denotes the minor allele value at the locus.
  • Q represents a unit-amount of genetic material after MDA of single cell's genomic DNA such that, if the copy number of chromosome- j is rip then the total amount of generic material at any locus of chromosome- j is n j Q.
  • a locus were AAC, then the amount of A-material at this locus is 2Q, the amount of C-material at this locus is Q, and the total combined amount of generic material at this locus is 3Q.
  • N is the number of SNPs per chromosome that will be measured.
  • the set of allele patterns is ⁇ (0,3), (1,2), (2,1), (3,0) ⁇ .
  • the allele pattern (2,1) for example corresponds to a locus value Of X 2 C, i.e., that two chromosomes have allele value A and the third has an allele value of C at the locus.
  • the set of allele patterns is ⁇ (0,2), (1,1), (2,0) ⁇ .
  • the set of allele patterns is ⁇ (0,1), (1,0) ⁇ .
  • the goal is to classify the copy number of a designated chromosome.
  • the MAP solution for chromosome a is given by
  • Equation (*) depends on being able to calculate values for P(n a ) and P(n x ), the distribution of prior probabilities of chromosome copy number, which is different depending on whether it is an autosomal chromosome or chromosome X. If these numbers are readily available for each chromosome, they may be used as is. If they are not available for all chromosomes, or are not reliable, some distributions may be assumed. Let the prior probability for autosomal chromosomes, let the probability of sex chromosomes being XY or XX be .
  • - 4 is the probability of the monosomic chromosome being X (as oppose to Y), is the probability of being XX for two chromosomes and -is the probability of the third chromosome being Y.
  • Y is the probability of being XX for two 3 .
  • Equation (*) depends on being able to calculate values for and these are shown in Table 6.
  • the symbols used in the Table 6 are explained below
  • Equation (*) depends on being able to calculate values for ) and These values may be calculated by assuming that the allele pattern ( n A ,n c ) is drawn i.i.d according to the allele frequencies for its letters at locus I. An illustrative example is given here. Calculate P((2,i)
  • n 7 3) under the assumption that the allele frequency for A is 60%, and the minor allele frequency for C is 40%. (As an aside, note that P( (2,l)
  • p tj is the minor allele frequency at locus i of chromosome /.
  • Equation (*) depends on being able to calculate values for which are listed in Table 7.
  • LDO will be known in either one of the parents, and the table would need to be augmented. IfLDO are known in both parents, one can use the model described in the Allele Distribution Model without Parents section.
  • Population Frequency For Parental Truth Equation (*) depends on being able to calculate P ⁇ T 7;ij T M ⁇ ij ).
  • Equation (*) depends on being able to calculate values for P(t A ⁇ Q,n A ) and
  • the overall integral may be calculated, which takes time on the order of (2+t x +2*t y )*9N*m*q. In the end, it takes 2*m comparisons to determine the best estimate for n,-. Therefore, overall the computational complexity is 0(N*m*q).
  • variable D is not really a variable. It is always a constant set to the value of the data set actually in question, so it does not introduce another array dimension when representing in Matlab. However, the variables D j do introduce an array dimension, due to the presence of the index
  • the aneuploidy calling method may be modified to use purely qualitative data. There are many approaches to solving this problem, and several of them are presented here. It should be obvious to one skilled in the art how to use other methods to accomplish the same end, and these will not change the essence of the disclosure.
  • N is the total number of SNPs on the chromosome.
  • n is the chromosome copy number.
  • n M is the number of copies supplied to the embryo by the mother: 0, 1, or 2.
  • n F is the number of copies supplied to the embryo by the father: 0, 1, or 2.
  • Pd is the dropout rate, and f(j?d) is a prior on this rate.
  • Pa is dropin rate, and f(p a ) is a prior on this rate.
  • D (* f c,>" f c) is the platform response on channels X and Y for SNP k.
  • 9. is the set of genotype calls on the chromosome. Note that the genotype calls depend on the no-call cutoff threshold c.
  • the variables (Sx, S Y ) are indicator variables (1 or 0), indicating whether the genotype g implies that channel X or Y has "lit up".
  • g ⁇ 1 just in case g contains the allele A
  • g ⁇ 1 just in case ⁇ contains the allele B.
  • 12. % is the known true sequence of genotype calls on the mother. ⁇ *' refers to the genotype value at some particular locus. 13. is the known true sequence of genotype calls on the father. g F refers to the genotype value at some particular locus.
  • n ⁇ ,n B are the true number of copies of A and B on the embryo (implicitly at locus k), respectively. Values must be in ⁇ 0,1,2,3,4 ⁇ .
  • c M fs are me number of A alleles and B alleles respectively supplied by the mother to the embryo (implicitly at locus k). The values must be in ⁇ 0, 1, 2 ⁇ , and must not sum to more than 2. Similarly, cf,c ⁇ are the number of A alleles and B alleles respectively supplied by the father to the embryo (implicitly at locus k). Altogether, these four values exactly determine the true genotype of the embryo. For example, if the values were (1,0) and (1,1), then the embryo would have type AAB.
  • Solution 1 Integrate over dropout and dropin rates.
  • the derivation other is the same, except applied to channel Y.
  • Solution 2 Use ML to estimate optimal cutoff threshold c Solution 2, variation A
  • one first uses the ML estimation to get the best estimate of the cutoff threshold based on the data, and then use this c to do the standard Bayesian inference as in solution 1. Note that, as written, the estimate of c would still involve integrating over all dropout and dropin rates. However, since it is known that the dropout and dropin parameters tend to peak sharply in probability when they are "tuned" to their proper values with respect to c, one may save computation time by doing the following instead:
  • D 1 (O) is the genotype data on chromosome j using c as the no-call threshold.
  • M j1 F j are the genotype data on the parents on chromosome j-
  • Solution 3 use all data to estimate cutoff, dropout/dropin
  • dropout and dropin rates are so important for the algorithm, it may be beneficial to analyze data with a known truth model to find out what the true dropout/dropin rates are. Note that there is no single true dropout rate: it is a function of the cutoff threshold. That said, if highly reliable genomic data exists that can be used as a truth model, then it is possible to plot the dropout/dropin rates of MDA experiments as a function of the cutoff- threshold. Here a maximum likelihood estimation is used.
  • n is the chromosome copy number.
  • n M is the number of copies supplied to the embryo by the mother: 0, 1 , or 2.
  • n p is the number of copies supplied to the embryo by the father: 0, 1, or 2.
  • p d is the dropout rate, and f(p ⁇ ) is a prior on this rate.
  • P a is the dropin rate, and /(p ⁇ ) is a prior on this rate.
  • D ⁇ is the set of genotype measurements on the chromosome of the embryo.
  • g k is the genotype call on the k-th SNP (as opposed to the true value): one of AA, AB, BB, or NC (no-call).
  • the embryo may be aneuploid, in which case the true genotype at a SNP may be, for example, AAB, or even AAAB, but the genotype measurements will always be one of the four listed.
  • 'B' has been used to indicate a heterozygous locus. That is not the sense in which it is being used here.
  • 'A' and 'B' are used to denote the two possible allele values that could occur at a given SNP.
  • genotype is the known true sequence of genotypes on the mother. is the genotype value at the k-th SNP. 8. is the known true sequence of genotypes on the father. is the genotype value at the k-th SNP. 9. is the set of genotype measurements on a sperm from the father. is the genotype call at the k-th SNP.
  • (In 1 ⁇ m 2 ) is the true but unknown ordered pair of phased haplotype information on the mother.
  • 77I 1 * is the allele value at SNP k of of the first haploid sequence
  • m ⁇ is the allele value at SNP k of the second haploid sequence.
  • E M is used to indicate the set of phased pairs ( ⁇ %,m 2 ) that are consistent with the known genotype M.
  • (m l( ⁇ n 2 ) is used to indicate the set of phased pairs that are consistent with the known genotype of the mother at SNP k.
  • (fi,f 2 ) is the true but unknown ordered pair of phased haplotype information on the father
  • fa is the allele value at SNP k of the first haploid sequence.
  • Z 2 * is the allele value at SNP k of the second haploid sequence.
  • Oi, f ⁇ ) ⁇ F is used to indicate the set of phased pairs ( ⁇ f 2 ) that are consistent with the known genotype F.
  • (A' fz) c 8 k is used to indicate the set of phased pairs that are consistent with the known genotype of the father at SNP k. 12.
  • S 1 is the true but unknown phased haplotype information on the measured sperm from the father.
  • S 1 ⁇ is the allele value at SNP k of this haploid sequence. It can be guaranteed that this sperm is euploid by measuring several sperm and selecting one that is euploid.
  • ⁇ M ⁇ t , — » ⁇ ⁇ M ) * s the multiset of crossover maps that resulted in maternal contribution to the embryo on this chromosome.
  • jf p — (O 1 ,- -, ⁇ 7 ⁇ F ⁇ is the multiset of crossover maps that results in paternal contribution to the embryo on this chromosome.
  • the possibility that the chromosome may be aneuploid is explicitly modeled.
  • Each parent can contribute zero, one, or two copies of the chromosome to the embryo. If the chromosome is an autosome, then euploidy is the case in which each parent contributes exactly one copy, i.e., .
  • ⁇ jf will be used to denote ⁇ tjt ,— » ⁇ « M * ⁇ ' the multiset of crossover map values restricted to the k-th SNP, and similarly for ⁇ p .
  • ⁇ fj ⁇ (m 1 ,m 2 ) is used to mean the multiset of allele values .
  • KeeP in mind that ⁇ u ⁇ ⁇ 1,2 ⁇ . 14.
  • is the crossover map that resulted in the measured sperm from the father.
  • S 1 ⁇ (fi,f 2 ). Note that it is not necessary to consider a crossover multiset because it is assumed that the measured sperm is euploid.
  • ⁇ fr will be used to denote the value of this crossover map at the k-th SNP.
  • the measurements on the mother and father are treated as known truth, while in other places in this disclosure they are treated simply as measurements. Since the measurements on the parents are very precise, treating them as though they are known truth is a reasonable approximation to reality. They are treated as known truth here in order to demonstrate how such an assumption is handled, although it should be clear to one skilled in the art how the more precise method, used elsewhere in the patent, could equally well be used.
  • a similar method to determine the number of copies of a chromosome can be implemented using a limited subset of SNPs in a simplified approach.
  • the method is purely qualitative, uses parental data, and focuses exclusively on a subset of SNPs, the so-called polar homozygotes (described below).
  • Polar homozygotic denotes the situation in which the mother and father are both homozygous at a SNP, but the homozygotes are opposite, or different allele values.
  • the mother could be AA and the father BB, or vice versa. Since the actual allele values are not important - only their relationship to each other, i.e.
  • MM the mother's alleles
  • FF the father's
  • the focus is solely on those loci on a particular chromosome that are polar homozygotes and for which the embryo, which is therefore known to be heterozygous, but is nonetheless called homozygous. It is possible to form the statistic
  • the statistic will not have a mean of 1/2. If, for example, the embryo has MMF trisomy, then the homozygous calls in the embryo will lean toward MM and away from FF, and vice versa. Note that because only loci where the parents are homozygous are under consideration, there is no need to distinguish Ml and M2 copy errors. In all cases, if the mother contributes 2 chromosomes instead of 1, they will be MM regardless of the underlying cause, and similarly for the father. The exact mean under trisomy will depend upon the dropout rate, p, but in no case will the mean be greater than 1/3, which is the limit of the mean as p goes to 1. Under monosomy, the mean would be precisely 0, except for noise induced by allele dropins.
  • null hypothesis of euploidy it is not necessary to model the distribution under aneuploidy, but only to reject the null hypothesis of euploidy, whose distribution is completely known. Any embryo for which the null hypothesis cannot be rejected at a predetermined significance level would be deemed normal.
  • those that result in no-call (NC) on the embryo contain information, and can be included in the calculations, yielding more loci for consideration.
  • AB can also be included in the calculations, yielding more loci for consideration. It should be obvious to one skilled in the art how to modify the method to include these additional loci into the calculation.
  • the TaqMan assay was used to measure single cell genotype data consisting of diploid measurements of a large buccal sample from the father (columns pi,p 2 ), diploid measurements of a buccal sample from the mother (mi,m 2 ), haploid measurements on three isolated sperm from the father (hi,h 2 ,h 3 ), and diploid measurements of four single cells from a buccal sample from the born child of the triad. Note that all diploid data are unordered. All SNPs are from chromosome 7 and within 2 megabases of the CFTR gene, in which a defect causes cystic fibrosis.
  • the true allele values (T1,T2) on the child are determined by taking three buccal samples of several thousand cells, genotyping them independently, and only choosing SNPs on which the results were concordant across all three samples.
  • This process yielded 94 concordant SNPs. Those loci that had a valid genotype call, according to the ABI 7900 reader, on the child cell that represented the embryo, were then selected. For each of these 69 SNPs, the disclosed method determined de-noised allele calls on the embryo (Ei, E 2 ), as well as the confidence associated with each genotype call. Twenty (29%) of the 69 raw allele calls in uncleaned genetic data from the child cell were incorrect (shaded in column el and e2, Table 8).
  • the disclosed method may achieve a higher level of accuracy at loci of interest by: i) continuing to measure single sperm until multiple haploid allele calls have been made at the locus of interest; ii) including additional blastomere measurements; iii) incorporating maternal haploid data from extruded polar bodies, which are commonly biopsied in pre-implantation genetic diagnosis today. It should be obvious to one skilled in the art that there exist other modifications to the method that can also increase the level of accuracy, as well as how to implement these, without changing the essential concept of the disclosure.
  • the method was used to call aneuploidy on several sets of single cells.
  • genotyping platform only selected data from the genotyping platform was used: the genotype information from parents and embryo.
  • a simple genotyping algorithm called "pie slice”, was used, and it showed itself to be about 99.9% accurate on genomic data. It is less accurate on MDA data, due to the noise inherent in MDA. It is more accurate when there is a fairly high "dropout" rate in MDA. It also depends, crucially, on being able to model the probabilities of various genotyping errors in terms of parameters known as dropout rate and dropin rate.
  • the unknown chromosome copy numbers are inferred because different copy numbers interact differently with the dropout rate, dropin rate, and the genotyping algorithm.
  • By creating a statistical model that specifies how the dropout rate, dropin rate, chromosome copy numbers, and genotype cutoff-threshold all interact it is possible to use standard statistical inference methods to tease out the unknown chromosome copy numbers.
  • X 1 , —,X n f(x; 0).
  • the X 1 are independent, identically distributed random variables, drawn according to a probability distribution that belongs to a family of distributions parameterized by the vector ⁇ .
  • the problem is as follows: ⁇ is unknown, and the goal is to get a good estimate of it based solely on the observations of solution is given by
  • the data may come from Infinium platform measurements where x Jk is the platform response on channel X to SNP k of chromosome j, and y ik is the platform response on channel Y to SNP k of chromosome j.
  • x Jk is the platform response on channel X to SNP k of chromosome j
  • y ik is the platform response on channel Y to SNP k of chromosome j.
  • These parameters are responsible for describing things such as probe efficiency, platform noise, MDA characteristics such as dropout, dropin, and overall amplification mean, and, finally, the genetic parameters: the genotypes of the parents, the true but unknown genotype of the embryo, and, of course, the parameters of interest: the chromosome copy numbers supplied by the mother and father to the embryo.
  • a good deal of information is discarded before data processing.
  • the advantage of doing this is that it is possible to model the data that remains in a more robust manner.
  • Pd and Pa are the dropout and dropin rates for genotyping, respectively. These reflect some of the modeling assumptions. It is known that in single-cell amplification, some SNPs "drop out", which is to say that they are not amplified and, as a consequence, do not show up when the SNP genotyping is attempted on the Infinium platform. This phenomenon is modeled by saying that each allele at each SNP "drops out” independently with probability p d during the MDA phase. Similarly, the platform is not a perfect measurement instrument.
  • M y F j are the true genotypes on the mother and father respectively.
  • the true genotypes are not known perfectly, but because large samples from the parents are genotyped, one may make the assumption that the truth on the parents is essentially known.
  • platform response models, or error models that vary from one probe to another can be used without changing the essential nature of the invention.
  • the amplification efficiency and error rates caused by allele dropouts, allele dropins, or other factors, may vary between different probes.
  • an error transition matrix can be made that is particular to a given probe.
  • Platform response models, or error models can be relevant to a particular probe or can be parameterized according to the quantitative measurements that are performed, so that the response model or error model is therefore specific to that particular probe and measurement.
  • Genotyping also requires an algorithm with some built-in assumptions. Going from a platform response (%,y) to a genotype g requires significant calculation. It is essentially requires that the positive quadrant of the x/y plane be divided into those regions where AA, AB, BB, and NC will be called. Furthermore, in the most general case, it may be useful to have regions where AAA, AAB, etc., could be called for trisomies.
  • a particular genotyping algorithm called the pie- slice algorithm, because it divides the positive quadrant of the x/y plane into three triangles, or "pie slices". Those (x,y) points that fall in the pie slice that hugs the X axis are called AA, those that fall in the slice that hugs the Y axis are called BB, and those in the middle slice are called AB. In addition, a small square is superimposed whose lower-left corner touches the origin. (x,y) points falling in this square are designated NC, because both x and y components have small values and hence are unreliable.
  • the width of that small square is called the no-call threshold and it is a parameter of the genotyping algorithm.
  • the cutoff threshold In order for the dropin/dropout model to correctly model the error transition matrix associated with the genotyping algorithm, the cutoff threshold must be tuned properly.
  • the error transition matrix indicates for each true-genotype/called- genotype pair, the probability of seeing the called genotype given the true genotype. This matrix depends on the dropout rate of the MDA and upon the no-call threshold set for the genotyping algorithm.
  • the no-call region could be defined by a many different shapes besides a square, such as for example a quarter circle, and the no call thresholds may vary greatly for different genotyping algorithms.
  • the Illumina Infinium II platform which allows measurement of hundreds of thousands of SNPs was used.
  • the standard Infinium II protocol was reduced from three days to 20 hours. Single cell measurements were compared between the full and accelerated Infinium II protocols, and showed -85% concordance.
  • the accelerated protocol showed an increase in locus drop-out (LDO) rate from ⁇ 1% to 5-10%; however, because hundreds of thousands of SNPs are measured and because PS accommodates allele dropouts, this increase in LDO rate does not have a significant negative impact on the results.
  • LDO locus drop-out
  • the entire aneuploidy calling method was performed on eight known-euploid buccal cells isolated from two healthy children from different families, ten known-trisomic cells isolated from a human immortalized trisomic cell line, and six blastomeres with an unknown number of chromosomes isolated from three embryos donated to research. Half of each set of cells was analyzed by the accelerated 20-hour protocol, and the other half by the standard protocol. Note that for the immortalized trisomic cells, no parent data was available. Consequently, for these cells, a pair of pseudo-parental genomes was generated by drawing their genotypes from the conditional distribution induced by observation of a large tissue sample of the trisomic genotype at each locus.
  • each table shows the chromosome number in the first column, and each pair of color-matched columns represents the analysis of one cell with the copy number call on the left and the confidence with which the call is made on the right.
  • Each row corresponds to one particular chromosome. Note that these tables contain the ploidy information of the chromosomes in a format that could be used for the report that is provided to the doctor to help in the determination of which embryos are to be selected for transfer to the prospective mother.
  • Table 9 shows the results for eight known-euploid buccal cells; all were correctly found to be euploid with high confidences (>0.99).
  • Table 10 shows the results for ten known-trisomic cells (trisomic at chromosome 21); all were correctly found to be trisomic at chromosome 21 and disomic at all other chromosomes with high confidences (>0.92).
  • Table 11 shows the results for six blastomeres isolated from three different embryos.
  • blastomeres While no truth models exist for donated blastomeres, it is possible to look for concordance between blastomeres originating from a single embryo, however, the frequency and characteristics of mosaicism in human embryos are not currently known, and thus the presence or lack of concordance between blastomeres from a common embryo is not necessarily indicative of correct ploidy determination.
  • the first three blastomeres are from one embryo (el) and of those, the first two (elbl and elb3) have the same ploidy state at all chromosomes except one.
  • the third cell (elb ⁇ ) is complex aneuploid. Both blastomeres from the second embryo were found to be monosomic at all chromosomes.
  • the blastomere from the third embryo was found to be complex aneuploid. Note that some confidences are below 90%, however, if the confidences of all aneuploid hypotheses are combined, all chromosomes are called either euploid or aneuploid with confidence exceeding 92.8%.
  • Adult diploid cells can be obtained from bulk tissue or blood samples.
  • Adult diploid single cells can be obtained from whole blood samples using FACS, or fluorescence activated cell sorting.
  • Adult haploid single sperm cells can also be isolated from a sperm sample using FACS.
  • Adult haploid single egg cells can be isolated in the context of egg harvesting during IVF procedures.
  • Isolation of the target single cell blastomeres from human embryos can be done using techniques common in in vitro fertilization clinics, such as embryo biopsy. Isolation of target fetal cells in maternal blood can be accomplished using monoclonal antibodies, or other techniques such as FACS or density gradient centrifugation.
  • Amplification of the genome can be accomplished by multiple methods inluding: ligation-mediated PCR (LM-PCR), degenerate oligonucleotide primer PCR (DOP-PCR), and multiple displacement amplification (MDA).
  • LM-PCR ligation-mediated PCR
  • DOP-PCR degenerate oligonucleotide primer PCR
  • MDA multiple displacement amplification
  • the genotyping of the amplified DNA can be done by many methods including molecular inversion probes (MIPs) such as Affymetrix's Genflex Tag Array, microarrays such as Affymetrix's 500K array or the Illumina Bead Arrays, or SNP genotyping assays such as AppliedBioscience's TaqMan assay.
  • MIPs molecular inversion probes
  • microarrays such as Affymetrix's 500K array or the Illumina Bead Arrays
  • SNP genotyping assays such as AppliedBioscience's TaqMan assay.
  • the Affymetrix 500K array, MlPs/GenFlex, TaqMan and Illumina assay all require microgram quantities of DNA, so genotyping a single cell with either workflow would require some kind of amplification.
  • Each of these techniques has various tradeoffs in terms of cost, quality of data, quantitative vs.
  • An advantage of the 500K and Illumina arrays are the large number of SNPs on which it can gather data, roughly 250,000, as opposed to MIPs which can detect on the order of 10,000 SNPs, and the TaqMan assay which can detect even fewer.
  • An advantage of the MIPs, TaqMan and Illumina assay over the 500K arrays is that they are inherently customizable, allowing the user to choose SNPs, whereas the 500K arrays do not permit such customization.
  • the standard MIPs assay protocol is a relatively time-intensive process that typically takes 2.5 to three days to complete.
  • annealing of probes to target DNA and post-amplification hybridization are particularly time-intensive, and any deviation from these times results in degradation in data quality.
  • Probes anneal overnight (12-16 hours) to DNA sample.
  • Post-amplification hybridization anneals to the arrays overnight (12-16 hours). A number of other steps before and after both annealing and amplification bring the total standard timeline of the protocol to 2.5 days.
  • LM-PCR ligation-mediated PCR
  • MDA multiple displacement amplification
  • dropouts of loci occur randomly and unavoidably. It is often desirable to amplify the whole genome nonspecifically, but to ensure that a particular locus is amplified with greater certainty. It is possible to perform simultaneous locus targeting and whole genome amplification.
  • the basis for this method is to combine standard targeted polymerase chain reaction (PCR) to amplify particular loci of interest with any generalized whole genome amplification method.
  • PCR polymerase chain reaction
  • This may include, but is not limited to: preamplification of particular loci before generalized amplification by MDA or LM-PCR, the addition of targeted PCR primers to universal primers in the generalized PCR step of LM-PCR, and the addition of targeted PCR primers to degenerate primers in MDA.
  • a variety of computational languages could be used to encode the algorithms described in this disclosure, and a variety of computational platforms could be used to execute the calculations.
  • the calculations could be executed using personal computers, supercomputers, a massively parallel computing platform, or even non-silicon based computational platforms such as a sufficiently large number of people armed with abacuses.
  • Some of the math in this disclosure makes hypotheses concerning a limited number of states of aneuploidy. In some cases, for example, only monosomy, disomy and trisomy are explicitly treated by the math.
  • a chromosome When this disclosure discusses a chromosome, this may refer to a segment of a chromosome, and when a segment of a chromosome is discussed, this may refer to a full chromosome. It is important to note that the math to handle a segment of a chromosome is the same as that needed to handle a full chromosome. It should be obvious to one skilled in the art how to modify the method accordingly
  • a related individual may refer to any individual who is genetically related, and thus shares haplotype blocks with the target individual.
  • related individuals include: biological father, biological mother, son, daughter, brother, sister, half-brother, half-sister, grandfather, grandmother, uncle, aunt, nephew, niece, grandson, granddaughter, cousin, clone, the target individual himself/herself/itself, and other individuals with known genetic relationship to the target.
  • the term 'related individual' also encompasses any embryo, fetus, sperm, egg, blastomere, blastocyst, or polar body derived from a related individual.
  • the target individual may refer to an adult, a juvenile, a fetus, an embryo, a blastocyst, a blastomere, a cell or set of cells from an individual, or from a cell line, or any set of genetic material.
  • the target individual may be alive, dead, frozen, or in stasis.
  • the target individual refers to a blastomere that is used to diagnose an embryo
  • the genome of the blastomere analyzed does not correspond exactly to the genomes of all other cells in the embryo.
  • the method disclosed herein in the context of cancer genotyping and/or karyotyping, where one or more cancer cells is considered the target individual, and the non-cancerous tissue of the individual afflicted with cancer is considered to be the related individual.
  • the non-cancerous tissue of the individual afflicted with the target could provide the set of genotype calls of the related individual that would allow chromosome copy number determination of the cancerous cell or cells using the methods disclosed herein.
  • the method described herein concerns the cleaning of genetic data, and as all living or once living creatures contain genetic data, the methods are equally applicable to any live or dead human, animal, or plant that inherits or inherited chromosomes from other individuals.
  • the algorithms described herein make use of prior probabilities, and/or initial values. In some cases the choice of these prior probabilities may have an impact on the efficiency and/or effectiveness of the algorithm. There are many ways that one skilled in the art, after reading this disclosure, could assign or estimate appropriate prior probabilities without changing the essential concept of the patent. It is also important to note that the embryonic genetic data that can be generated by measuring the amplified DNA from one blastomere can be used for multiple purposes. For example, it can be used for detecting aneuploidy, uniparental disomy, sexing the individual, as well as for making a plurality of phenotypic predictions based on phenotype-associated alleles.
  • one advantage to identifying particular conditions to screen for prior to genotyping the blastomere is that if it is decided that certain loci are especially relevant, then a more appropriate set of SNPs which are more likely to cosegregate with the locus of interest, can be selected, thus increasing the confidence of the allele calls of interest. It is also important to note that it is possible to perform haplotype phasing by molecular haplotyping methods. Because separation of the genetic material into haplotypes is challenging, most genotyping methods are only capable of measuring both haplotypes simultaneously, yielding diploid data. As a result, the sequence of each haploid genome cannot be deciphered.
  • This may include, but not be limited to: cloning amplified DNA fragments from a genome into a recombinant DNA constructs and sequencing by traditional dye-end terminator methods, isolation and sequencing of single molecules in colonies, and direct single DNA molecule or clonal DNA population sequencing using next-generation sequencing methods.
  • the systems, methods, and techniques of the present invention may be used to in conjunction with embyro screening or prenatal testing procedures.
  • the systems, methods, and techniques of the present invention may be employed in methods of increasing the probability that the embryos and fetuses obtain by in vitro fertilization are successfully implanted and carried through the full gestation period. Further, the systems, methods, and techniques of the present invention may be employed in methods of decreasing the probability that the embryos and fetuses obtain by in vitro fertilization that are implanted and gestated are not specifically at risk for a congenital disorder.
  • the present invention extends to the use of the systems, methods, and techniques of the invention in conjunction with pre-implantation diagnosis procedures.
  • the present invention extends to the use of the systems, methods, and techniques of the invention in conjunction with prenatal testing procedures.
  • the systems, methods, and techniques of the invention are used in methods to decrease the probability for the implantation of an embryo specifically at risk for a congenital disorder by testing at least one cell removed from early embryos conceived by in vitro fertilization and transferring to the mother's uterus only those embryos determined not to have inherited the congenital disorder.
  • the systems, methods, and techniques of the invention are used in methods to decrease the probability for the implantation of an embryo specifically at risk for a chromosome abnormality by testing at least one cell removed from early embryos conceived by in vitro fertilization and transferring to the mother's uterus only those embryos determined not to have chromosome abnormalities.
  • the systems, methods, and techniques of the invention are used in methods to increase the probability of implanting an embryo obtained by in vitro fertilization that is at a reduced risk of carrying a congenital disorder.
  • the systems, methods, and techniques of the invention are used in methods to increase the probability of gestating a fetus.
  • the congenital disorder is a malformation, neural tube defect, chromosome abnormality, Down's syndrome (or trisomy 21), Trisomy 18, spina bifida, cleft palate, Tay Sachs disease, sickle cell anemia, thalassemia, cystic fibrosis, Huntington's disease, and/or fragile x syndrome.
  • Chromosome abnormalities include, but are not limited to, Down syndrome (extra chromosome 21 ), Turner Syndrome (45X0) and Klinefelter's syndrome (a male with 2 X chromosomes).
  • the malformation is a limb malformation.
  • Limb malformations include, but are not limited to, amelia, ectrodactyly, phocomelia, polymelia, Polydactyly, syndactyly, polysyndactyly, oligodactyly, brachydactyly, achondroplasia, congenital aplasia or hypoplasia, amniotic band syndrome, and cleidocranial dysostosis.
  • the malformation is a congenital malformation of the heart.
  • Congenital malformations of the heart include, but are not limited to, patent ductus arteriosus, atrial septal defect, ventricular septal defect, and tetralogy of fallot.
  • the malformation is a congenital malformation of the nervous system.
  • Congenital malformations of the nervous system include, but are not limited to, neural tube defects (e.g., spina bifida, meningocele, meningomyelocele, encephalocele and anencephaly), Arnold-Chiari malformation, the Dandy-Walker malformation, hydrocephalus, microencephaly, megencephaly, lissencephaly, polymicrogyria, holoprosencephaly, and agenesis of the corpus callosum.
  • neural tube defects e.g., spina bifida, meningocele, meningomyelocele, encephalocele and anencephaly
  • Arnold-Chiari malformation e.g., the Dandy-Walker malformation
  • hydrocephalus e.g., microencephaly, megencephaly, lissencephaly, polymicrogyria, holoprosencephaly, and agenesis of the corpus
  • the malformation is a congenital malformation of the gastrointestinal system.
  • Congenital malformations of the gastrointestinal system include, but are not limited to, stenosis, atresia, and imperforate anus.
  • the systems, methods, and techniques of the invention are used in methods to increase the probability of implanting an embryo obtained by in vitro fertilization that is at a reduced risk of carrying a predisposition for a genetic disease.
  • the genetic disease is either monogenic or multigenic.
  • Genetic diseases include, but are not limited to, Bloom Syndrome, Canavan Disease, Cystic fibrosis, Familial Dysautonomia, Riley-Day syndrome, Fanconi Anemia (Group C), Gaucher Disease, Glycogen storage disease Ia, Maple syrup urine disease, Mucolipidosis IV, Niemann-Pick Disease, Tay-Sachs disease, Beta thalessemia, Sickle cell anemia, Alpha thalessemia, Beta thalessemia, Factor XI Deficiency, Friedreich's Ataxia, MCAD, Parkinson disease- juvenile, Connexin26, SMA, Rett syndrome, Phenylketonuria, Becker Muscular Dystrophy, Duchennes Muscular Dystrophy, Fragile X syndrome, Hemophilia A, Alzheimer dementia- early onset, Breast/Ovarian cancer, Colon cancer, Diabetes/MODY, Huntington disease, Myotonic Muscular Dystrophy, Parkinson
  • the disclosed method is employed to determine the genetic state of one or more embryos for the purpose of embryo selection in the context of IVF.
  • This may include the harvesting of eggs from the prospective mother and fertilizing those eggs with sperm from the prospective father to create one ore more embryos. It may involve performing embryo biopsy to isolate a blastomere from each of the embryos. It may involve amplifying and genotyping the genetic data from each of the blastomeres. It may include obtaining, amplifying and genotyping a sample of diploid genetic material from each of the parents, as well as one or more individual sperm from the father. It may involve incorporating the measured diploid and haploid data of both the mother and the father, along with the measured genetic data of the embryo of interest into a dataset.
  • the couple arranges to have her eggs harvested and fertilized with sperm from the man, producing nine viable embryos.
  • a blastomere is harvested from each embryo, and the genetic data from the blastomeres are measured using an Illumina Infinium Bead Array. Meanwhile, the diploid data are measured from tissue taken from both parents also using the Illumina Infinium Bead Array.
  • Haploid data from the father's sperm is measured using the same method.
  • the method disclosed herein is applied to the genetic data of the blastomere and the diploid maternal genetic data to phase the maternal genetic data to provide the maternal haplotype.
  • Another example may involve a pregnant woman who has been artificially inseminated by a sperm donor, and is pregnant. She is wants to minimize the risk that the fetus she is carrying has a genetic disease. She undergoes amniocentesis and fetal cells are isolated from the withdrawn sample, and a tissue sample is also collected from the mother. Since there are no other embryos, her data are phased using molecular haplotyping methods. The genetic material from the fetus and from the mother are amplified as appropriate and genotyped using the Illumina Infinium Bead Array, and the methods described herein reconstruct the embryonic genotype as accurately as possible. Phenotypic susceptibilities are predicted from the reconstructed fetal genetic data and a report is generated and sent to the mother's physician so that they can decide what actions may be best.
  • Table 1 Probability distribution of measured allele calls given the true genotype.
  • Table 2. Probabilities of specific allele calls in the embryo using the U and H notation.
  • Table 3. Conditional probabilities of specific allele calls in the embryo given all possible parental states.
  • Table 7 Probability of aneuploidy hypothesis (H) conditional on parent genotype.
  • Table 8 Results of PS algorithm applied to 69 SNPs on chromosome 7 Table 9. Aneuploidy calls on eight known euploid cells Table 10. Aneuploidy calls on ten known trisomic cells Table 1 1. Aneuploidy calls for six blastomeres.

Abstract

L'invention concerne un système et un procédé pour augmenter la fidélité de données génétiques mesurées, pour préparer des cellules d'allèles, et pour déterminer l'état d'aneuploïdie, dans une cellule ou un petit jeu de cellules, ou à partir d'ADN fragmentaire, où une quantité limitée de données génétiques est disponible. Une matière génétique provenant de l'individu cible est acquise, amplifiée et les données génétiques sont mesurées en utilisant des procédés connus. Des paires de bases mesurées médiocrement ou incorrectement, des allèles marquants et des régions manquantes sont reconstruits en utilisant des similitudes attendues entre le génome cible et le génome d'individus génétiquement apparentés. Selon un mode de réalisation de l'invention, des données génétiques incomplètes provenant d'une cellule embryonnaire sont reconstruites en une pluralité de lieux en utilisant les données génétiques plus complètes à partir d'un plus grand échantillon de cellules diploïdes provenant d'un ou des deux parents, avec ou sans données génétiques haploïdes provenant d'un ou des deux parents. Dans un autre mode de réalisation de l'invention, le numéro de copie de chromosome peut être déterminé à partir des données génétiques mesurées d'une seule cellule ou d'un petit nombre de cellules, avec ou sans informations génétiques provenant d'un ou des deux parents. Dans un autre mode de réalisation de l'invention, ces déterminations sont effectuées dans le but de sélectionner un embryon dans le contexte d'une fertilisation in vitro. Dans un autre mode de réalisation de l'invention, les données génétiques peuvent être reconstruites dans le but d'effectuer des prédictions phénotypes.
PCT/US2008/003547 2007-03-16 2008-03-17 Système et procédé pour nettoyer des données génétiques bruyantes et déterminer un numéro de copie de chromosome WO2008115497A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP08742125A EP2140386A2 (fr) 2007-03-16 2008-03-17 Système et procédé pour nettoyer des données génétiques bruyantes et déterminer un numéro de copie de chromosome
CN2008800161237A CN101790731B (zh) 2007-03-16 2008-03-17 用于清除遗传数据干扰并确定染色体拷贝数的系统和方法

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US91829207P 2007-03-16 2007-03-16
US60/918,292 2007-03-16
US92619807P 2007-04-25 2007-04-25
US60/926,198 2007-04-25
US93245607P 2007-05-31 2007-05-31
US60/932,456 2007-05-31
US93444007P 2007-06-13 2007-06-13
US60/934,440 2007-06-13
US310107P 2007-11-13 2007-11-13
US61/003,101 2007-11-13
US863707P 2007-12-21 2007-12-21
US61/008,637 2007-12-21

Publications (2)

Publication Number Publication Date
WO2008115497A2 true WO2008115497A2 (fr) 2008-09-25
WO2008115497A3 WO2008115497A3 (fr) 2009-05-28

Family

ID=39735264

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/003547 WO2008115497A2 (fr) 2007-03-16 2008-03-17 Système et procédé pour nettoyer des données génétiques bruyantes et déterminer un numéro de copie de chromosome

Country Status (3)

Country Link
EP (1) EP2140386A2 (fr)
CN (1) CN101790731B (fr)
WO (1) WO2008115497A2 (fr)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8024128B2 (en) 2004-09-07 2011-09-20 Gene Security Network, Inc. System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
US8515679B2 (en) 2005-12-06 2013-08-20 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US8532930B2 (en) 2005-11-26 2013-09-10 Natera, Inc. Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals
US8949036B2 (en) 2010-05-18 2015-02-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9163282B2 (en) 2010-05-18 2015-10-20 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9228234B2 (en) 2009-09-30 2016-01-05 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9424392B2 (en) 2005-11-26 2016-08-23 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US9499870B2 (en) 2013-09-27 2016-11-22 Natera, Inc. Cell free DNA diagnostic testing standards
US9639657B2 (en) 2008-08-04 2017-05-02 Natera, Inc. Methods for allele calling and ploidy calling
US9677118B2 (en) 2014-04-21 2017-06-13 Natera, Inc. Methods for simultaneous amplification of target loci
US10011870B2 (en) 2016-12-07 2018-07-03 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10083273B2 (en) 2005-07-29 2018-09-25 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10081839B2 (en) 2005-07-29 2018-09-25 Natera, Inc System and method for cleaning noisy genetic data and determining chromosome copy number
US10113196B2 (en) 2010-05-18 2018-10-30 Natera, Inc. Prenatal paternity testing using maternal blood, free floating fetal DNA and SNP genotyping
US10179937B2 (en) 2014-04-21 2019-01-15 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
CN109493919A (zh) * 2018-10-31 2019-03-19 中国石油大学(华东) 基于条件概率的基因型指派方法
US10262755B2 (en) 2014-04-21 2019-04-16 Natera, Inc. Detecting cancer mutations and aneuploidy in chromosomal segments
US10316362B2 (en) 2010-05-18 2019-06-11 Natera, Inc. Methods for simultaneous amplification of target loci
CN110444251A (zh) * 2019-07-23 2019-11-12 中国石油大学(华东) 基于分支定界的单体型格局生成方法
US10526658B2 (en) 2010-05-18 2020-01-07 Natera, Inc. Methods for simultaneous amplification of target loci
US10577655B2 (en) 2013-09-27 2020-03-03 Natera, Inc. Cell free DNA diagnostic testing standards
US10894976B2 (en) 2017-02-21 2021-01-19 Natera, Inc. Compositions, methods, and kits for isolating nucleic acids
US11111544B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11111543B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11186863B2 (en) 2019-04-02 2021-11-30 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11230731B2 (en) 2018-04-02 2022-01-25 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11322224B2 (en) 2010-05-18 2022-05-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11326208B2 (en) 2010-05-18 2022-05-10 Natera, Inc. Methods for nested PCR amplification of cell-free DNA
US11332785B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11332793B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for simultaneous amplification of target loci
US11339429B2 (en) 2010-05-18 2022-05-24 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11408031B2 (en) 2010-05-18 2022-08-09 Natera, Inc. Methods for non-invasive prenatal paternity testing
CN115064210A (zh) * 2022-07-27 2022-09-16 北京大学第三医院(北京大学第三临床医学院) 一种鉴定二倍体胚胎细胞中染色体交叉互换位置的方法及应用
US11479812B2 (en) 2015-05-11 2022-10-25 Natera, Inc. Methods and compositions for determining ploidy
US11485996B2 (en) 2016-10-04 2022-11-01 Natera, Inc. Methods for characterizing copy number variation using proximity-litigation sequencing
US11525159B2 (en) 2018-07-03 2022-12-13 Natera, Inc. Methods for detection of donor-derived cell-free DNA
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci
US11959129B2 (en) 2021-10-13 2024-04-16 Enumera Molecular, Inc. Methods, systems, and compositions for counting nucleic acid molecules

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101903547B1 (ko) 2010-04-29 2018-10-04 더 리젠츠 오브 더 유니버시티 오브 캘리포니아 게놈 모델에 대한 데이터 통합을 이용하는 경로 인지 알고리즘 (paradigm)
US10192641B2 (en) * 2010-04-29 2019-01-29 The Regents Of The University Of California Method of generating a dynamic pathway map
WO2016000267A1 (fr) * 2014-07-04 2016-01-07 深圳华大基因股份有限公司 Procédé permettant de déterminer la séquence d'une sonde et procédé de détection de variation structurale génomique
KR101817785B1 (ko) * 2015-08-06 2018-01-11 이원다이애그노믹스(주) 다양한 플랫폼에서 태아의 성별과 성염색체 이상을 구분할 수 있는 새로운 방법
CN109390039B (zh) * 2017-08-11 2020-10-16 深圳华大基因股份有限公司 一种统计dna拷贝数信息的方法、装置及存储介质
WO2019237230A1 (fr) * 2018-06-11 2019-12-19 深圳华大生命科学研究院 Procédé et système de détermination de type d'échantillon à tester
CN109754845B (zh) * 2018-12-29 2020-02-28 浙江安诺优达生物科技有限公司 模拟目标疾病仿真测序文库的方法及其应用
CN112840404A (zh) * 2019-10-18 2021-05-25 苏州亿康医学检验有限公司 清除噪音遗传数据、单体型定相、重构子代基因组的方法、系统和其用途
CN112375829B (zh) * 2020-11-25 2022-07-05 苏州赛美科基因科技有限公司 使用家系wes数据识别upd的方法、装置及电子设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060134662A1 (en) * 2004-10-25 2006-06-22 Pratt Mark R Method and system for genotyping samples in a normalized allelic space
WO2007062164A2 (fr) * 2005-11-26 2007-05-31 Gene Security Network Llc Systeme et procede de nettoyage de donnees genetiques bruitees, et utilisation de donnees genetiques, phenotypiques et cliniques pour faire des previsions

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6351712B1 (en) * 1998-12-28 2002-02-26 Rosetta Inpharmatics, Inc. Statistical combining of cell expression profiles
JP5180478B2 (ja) * 2004-02-10 2013-04-10 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ ゲノムベースの医療診断テストを最適化する遺伝アルゴリズム

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060134662A1 (en) * 2004-10-25 2006-06-22 Pratt Mark R Method and system for genotyping samples in a normalized allelic space
WO2007062164A2 (fr) * 2005-11-26 2007-05-31 Gene Security Network Llc Systeme et procede de nettoyage de donnees genetiques bruitees, et utilisation de donnees genetiques, phenotypiques et cliniques pour faire des previsions

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BEAUMONT MARK A ET AL: "The Bayesian revolution in genetics." NATURE REVIEWS. GENETICS APR 2004, vol. 5, no. 4, April 2004 (2004-04), pages 251-261, XP002521825 ISSN: 1471-0056 *
COLELLA STEFANO ET AL: "QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data." NUCLEIC ACIDS RESEARCH 2007, vol. 35, no. 6, 6 March 2007 (2007-03-06), pages 2013-2025, XP002506459 ISSN: 1362-4962 *
FREEMAN JENNIFER L ET AL: "Copy number variation: new insights in genome diversity." GENOME RESEARCH AUG 2006, vol. 16, no. 8, August 2006 (2006-08), pages 949-961, XP002506460 ISSN: 1088-9051 *
OGINO SHUJI ET AL: "Bayesian analysis and risk assessment in genetic counseling and testing." THE JOURNAL OF MOLECULAR DIAGNOSTICS : JMD FEB 2004, vol. 6, no. 1, February 2004 (2004-02), pages 1-9, XP002521826 ISSN: 1525-1578 *

Cited By (90)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8024128B2 (en) 2004-09-07 2011-09-20 Gene Security Network, Inc. System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
US10081839B2 (en) 2005-07-29 2018-09-25 Natera, Inc System and method for cleaning noisy genetic data and determining chromosome copy number
US10083273B2 (en) 2005-07-29 2018-09-25 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10260096B2 (en) 2005-07-29 2019-04-16 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11111543B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10266893B2 (en) 2005-07-29 2019-04-23 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US11111544B2 (en) 2005-07-29 2021-09-07 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10392664B2 (en) 2005-07-29 2019-08-27 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US10227652B2 (en) 2005-07-29 2019-03-12 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US9424392B2 (en) 2005-11-26 2016-08-23 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US11306359B2 (en) 2005-11-26 2022-04-19 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US9695477B2 (en) 2005-11-26 2017-07-04 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US10711309B2 (en) 2005-11-26 2020-07-14 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US10597724B2 (en) 2005-11-26 2020-03-24 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US9430611B2 (en) 2005-11-26 2016-08-30 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US8532930B2 (en) 2005-11-26 2013-09-10 Natera, Inc. Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals
US10240202B2 (en) 2005-11-26 2019-03-26 Natera, Inc. System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US8515679B2 (en) 2005-12-06 2013-08-20 Natera, Inc. System and method for cleaning noisy genetic data and determining chromosome copy number
US9639657B2 (en) 2008-08-04 2017-05-02 Natera, Inc. Methods for allele calling and ploidy calling
US10061890B2 (en) 2009-09-30 2018-08-28 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9228234B2 (en) 2009-09-30 2016-01-05 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10522242B2 (en) 2009-09-30 2019-12-31 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10216896B2 (en) 2009-09-30 2019-02-26 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10061889B2 (en) 2009-09-30 2018-08-28 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10597723B2 (en) 2010-05-18 2020-03-24 Natera, Inc. Methods for simultaneous amplification of target loci
US11339429B2 (en) 2010-05-18 2022-05-24 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11939634B2 (en) 2010-05-18 2024-03-26 Natera, Inc. Methods for simultaneous amplification of target loci
US11746376B2 (en) 2010-05-18 2023-09-05 Natera, Inc. Methods for amplification of cell-free DNA using ligated adaptors and universal and inner target-specific primers for multiplexed nested PCR
US10174369B2 (en) 2010-05-18 2019-01-08 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10316362B2 (en) 2010-05-18 2019-06-11 Natera, Inc. Methods for simultaneous amplification of target loci
US11525162B2 (en) 2010-05-18 2022-12-13 Natera, Inc. Methods for simultaneous amplification of target loci
US10113196B2 (en) 2010-05-18 2018-10-30 Natera, Inc. Prenatal paternity testing using maternal blood, free floating fetal DNA and SNP genotyping
US11519035B2 (en) 2010-05-18 2022-12-06 Natera, Inc. Methods for simultaneous amplification of target loci
US10017812B2 (en) 2010-05-18 2018-07-10 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10526658B2 (en) 2010-05-18 2020-01-07 Natera, Inc. Methods for simultaneous amplification of target loci
US11482300B2 (en) 2010-05-18 2022-10-25 Natera, Inc. Methods for preparing a DNA fraction from a biological sample for analyzing genotypes of cell-free DNA
US10538814B2 (en) 2010-05-18 2020-01-21 Natera, Inc. Methods for simultaneous amplification of target loci
US10557172B2 (en) 2010-05-18 2020-02-11 Natera, Inc. Methods for simultaneous amplification of target loci
US11286530B2 (en) 2010-05-18 2022-03-29 Natera, Inc. Methods for simultaneous amplification of target loci
US11408031B2 (en) 2010-05-18 2022-08-09 Natera, Inc. Methods for non-invasive prenatal paternity testing
US10590482B2 (en) 2010-05-18 2020-03-17 Natera, Inc. Amplification of cell-free DNA using nested PCR
US8949036B2 (en) 2010-05-18 2015-02-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11332793B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for simultaneous amplification of target loci
US11332785B2 (en) 2010-05-18 2022-05-17 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11326208B2 (en) 2010-05-18 2022-05-10 Natera, Inc. Methods for nested PCR amplification of cell-free DNA
US10655180B2 (en) 2010-05-18 2020-05-19 Natera, Inc. Methods for simultaneous amplification of target loci
US11306357B2 (en) 2010-05-18 2022-04-19 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US10731220B2 (en) 2010-05-18 2020-08-04 Natera, Inc. Methods for simultaneous amplification of target loci
US10774380B2 (en) 2010-05-18 2020-09-15 Natera, Inc. Methods for multiplex PCR amplification of target loci in a nucleic acid sample
US10793912B2 (en) 2010-05-18 2020-10-06 Natera, Inc. Methods for simultaneous amplification of target loci
US11322224B2 (en) 2010-05-18 2022-05-03 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US9334541B2 (en) 2010-05-18 2016-05-10 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11111545B2 (en) 2010-05-18 2021-09-07 Natera, Inc. Methods for simultaneous amplification of target loci
US9163282B2 (en) 2010-05-18 2015-10-20 Natera, Inc. Methods for non-invasive prenatal ploidy calling
US11312996B2 (en) 2010-05-18 2022-04-26 Natera, Inc. Methods for simultaneous amplification of target loci
US9499870B2 (en) 2013-09-27 2016-11-22 Natera, Inc. Cell free DNA diagnostic testing standards
US10577655B2 (en) 2013-09-27 2020-03-03 Natera, Inc. Cell free DNA diagnostic testing standards
US10597708B2 (en) 2014-04-21 2020-03-24 Natera, Inc. Methods for simultaneous amplifications of target loci
US11530454B2 (en) 2014-04-21 2022-12-20 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11486008B2 (en) 2014-04-21 2022-11-01 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11319596B2 (en) 2014-04-21 2022-05-03 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US10179937B2 (en) 2014-04-21 2019-01-15 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11319595B2 (en) 2014-04-21 2022-05-03 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US9677118B2 (en) 2014-04-21 2017-06-13 Natera, Inc. Methods for simultaneous amplification of target loci
US10262755B2 (en) 2014-04-21 2019-04-16 Natera, Inc. Detecting cancer mutations and aneuploidy in chromosomal segments
US10597709B2 (en) 2014-04-21 2020-03-24 Natera, Inc. Methods for simultaneous amplification of target loci
US11414709B2 (en) 2014-04-21 2022-08-16 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11371100B2 (en) 2014-04-21 2022-06-28 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US11390916B2 (en) 2014-04-21 2022-07-19 Natera, Inc. Methods for simultaneous amplification of target loci
US11408037B2 (en) 2014-04-21 2022-08-09 Natera, Inc. Detecting mutations and ploidy in chromosomal segments
US10351906B2 (en) 2014-04-21 2019-07-16 Natera, Inc. Methods for simultaneous amplification of target loci
US11946101B2 (en) 2015-05-11 2024-04-02 Natera, Inc. Methods and compositions for determining ploidy
US11479812B2 (en) 2015-05-11 2022-10-25 Natera, Inc. Methods and compositions for determining ploidy
US11485996B2 (en) 2016-10-04 2022-11-01 Natera, Inc. Methods for characterizing copy number variation using proximity-litigation sequencing
US10533219B2 (en) 2016-12-07 2020-01-14 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10011870B2 (en) 2016-12-07 2018-07-03 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US11530442B2 (en) 2016-12-07 2022-12-20 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US11519028B2 (en) 2016-12-07 2022-12-06 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10577650B2 (en) 2016-12-07 2020-03-03 Natera, Inc. Compositions and methods for identifying nucleic acid molecules
US10894976B2 (en) 2017-02-21 2021-01-19 Natera, Inc. Compositions, methods, and kits for isolating nucleic acids
US11230731B2 (en) 2018-04-02 2022-01-25 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11788121B2 (en) 2018-04-02 2023-10-17 Enumera Molecular, Inc. Methods, systems, and compositions for counting nucleic acid molecules
US11525159B2 (en) 2018-07-03 2022-12-13 Natera, Inc. Methods for detection of donor-derived cell-free DNA
CN109493919A (zh) * 2018-10-31 2019-03-19 中国石油大学(华东) 基于条件概率的基因型指派方法
CN109493919B (zh) * 2018-10-31 2023-04-14 中国石油大学(华东) 基于条件概率的基因型指派方法
US11186863B2 (en) 2019-04-02 2021-11-30 Progenity, Inc. Methods, systems, and compositions for counting nucleic acid molecules
CN110444251A (zh) * 2019-07-23 2019-11-12 中国石油大学(华东) 基于分支定界的单体型格局生成方法
CN110444251B (zh) * 2019-07-23 2023-09-22 中国石油大学(华东) 基于分支定界的单体型格局生成方法
US11959129B2 (en) 2021-10-13 2024-04-16 Enumera Molecular, Inc. Methods, systems, and compositions for counting nucleic acid molecules
CN115064210A (zh) * 2022-07-27 2022-09-16 北京大学第三医院(北京大学第三临床医学院) 一种鉴定二倍体胚胎细胞中染色体交叉互换位置的方法及应用

Also Published As

Publication number Publication date
CN101790731B (zh) 2013-11-06
WO2008115497A3 (fr) 2009-05-28
CN101790731A (zh) 2010-07-28
EP2140386A2 (fr) 2010-01-06

Similar Documents

Publication Publication Date Title
US11111543B2 (en) System and method for cleaning noisy genetic data and determining chromosome copy number
US10266893B2 (en) System and method for cleaning noisy genetic data and determining chromosome copy number
US11111544B2 (en) System and method for cleaning noisy genetic data and determining chromosome copy number
US8515679B2 (en) System and method for cleaning noisy genetic data and determining chromosome copy number
US20180300448A1 (en) System and method for cleaning noisy genetic data and determining chromosome copy number
WO2008115497A2 (fr) Système et procédé pour nettoyer des données génétiques bruyantes et déterminer un numéro de copie de chromosome
US20200172977A1 (en) System and method for cleaning noisy genetic data from target individuals using genetic data from genetically related individuals
US10522242B2 (en) Methods for non-invasive prenatal ploidy calling
EP2321642B1 (fr) Procédés pour une classification d'allèle et une classification de ploïdie
EP2437191B1 (fr) Procédé et système de détection d'anomalies chromosomiques
US20160371432A1 (en) Methods for allele calling and ploidy calling

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880016123.7

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08742125

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2008742125

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2008742125

Country of ref document: EP