CN114214425A - Method or device for identifying parent tendentiousness of nucleic acid sample - Google Patents

Method or device for identifying parent tendentiousness of nucleic acid sample Download PDF

Info

Publication number
CN114214425A
CN114214425A CN202111536093.6A CN202111536093A CN114214425A CN 114214425 A CN114214425 A CN 114214425A CN 202111536093 A CN202111536093 A CN 202111536093A CN 114214425 A CN114214425 A CN 114214425A
Authority
CN
China
Prior art keywords
sample
parental
predisposition
threshold
site
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111536093.6A
Other languages
Chinese (zh)
Other versions
CN114214425B (en
Inventor
邹央云
万成
姚雅馨
陆思嘉
任军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yikon Genomics Shanghai Co ltd
Original Assignee
Yikon Genomics Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yikon Genomics Shanghai Co ltd filed Critical Yikon Genomics Shanghai Co ltd
Publication of CN114214425A publication Critical patent/CN114214425A/en
Application granted granted Critical
Publication of CN114214425B publication Critical patent/CN114214425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Physiology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Ecology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a method or device for parental predisposition detection of a nucleic acid sample, in particular for detecting parental predisposition in a sample comprising only trace amounts of progeny DNA. The invention also relates to the use of the method or device for identifying parental contamination of a sample, or for identifying ploidy abnormalities of progeny DNA in a sample, by detecting parental predisposition of the sample.

Description

Method or device for identifying parent tendentiousness of nucleic acid sample
Technical Field
The present invention relates to a method or device for parental predisposition detection of a nucleic acid sample, in particular for detecting parental predisposition in a sample comprising only trace amounts of progeny DNA. The invention also relates to the use of the method or device for identifying parental contamination of a sample, or for identifying ploidy abnormalities of progeny DNA in a sample, by detecting parental predisposition of the sample.
Background
Preimplantation Genetic Test (PGT) refers to a method for obtaining healthy offspring by performing Preimplantation Genetic analysis on embryos of patients with high Genetic risk during in vitro seminal-embryo transfer, and selecting embryos with normal Genetic material for implantation into maternal uterine cavities. Current clinical applications of PGT are primarily genetic testing by taking cells from embryo biopsies. However, more and more studies have shown that this invasive cell biopsy procedure can have an adverse effect on the developmental potential of the embryo and subsequent ontogeny. In recent years, a plurality of researches show that the embryo culture solution contains free DNA (cfDNA) fragments derived from embryos, so that noninvasive genetic detection of embryos before implantation becomes possible. Successful application of cfDNA in the embryo culture solution in PGT-A (chromosome aneuploidy detection), PGT-M (monogenic hereditary disease) and PGT-SR (chromosome structural abnormality) shows good application prospect of the method in genetic detection before embryo implantation. However, both invasive embryo biopsy and noninvasive embryo culture fluid detection are susceptible to interference from father (sperm) and mother (cumulus cells) during genetic detection.
Clinically, in Vitro fertilization is usually selected by IVF (in Vitro fertilization) and ICSI (Intracytoplasmic Sperm Injection). First generation tube Infants (IVF) used large numbers of sperm co-cultured with ova to complete in vitro fertilization. For each ovum, only one sperm can be fertilized effectively, so that the ovum becomes a fertilized ovum, other redundant ineffective sperm can adhere to the surface of the fertilized ovum, and the ineffective sperm are released into the in vitro culture solution after death. Therefore, when genetic testing is performed on IVF embryo row cell biopsy, the results are often affected by parenting sperm. ICSI is used for injecting a single sperm into an ovum by means of a micromanipulation system to fertilize the ovum, which can avoid father source DNA interference of the sperm, but particle cells on the surface of the ovum bring mother source DNA interference. Although methods exist to reduce parental interference from embryo biopsy cells or culture fluid samples by washing the surface of ova or zygotes (patent No.: HK1229368a1), they sometimes do not clean completely and are potentially contaminated, which may lead to false negative results in PGT. Therefore, it is necessary to identify whether there is parental contamination before genetic analysis using embryo biopsy cells or culture fluid samples.
Whole genome triploid or uniparental diploid embryos are also common in PGT assays, particularly in some patients with family history of hydatidiform mole. Methods such as specific PCR or methylation PCR analysis have been proposed to determine the parental origin of the embryo ploidy variation, such as the parental origin of triploid (male or female parent), the parental origin of uniparental diploid. However, these assays have limitations due to the requirement for assay sites. Conventional next generation sequencing platforms can provide sequencing nucleic acid information on PGT biopsy samples across the entire genome of the sample. However, this technique is difficult to implement for the detection of whole genome triploid or uniparental diploid embryos.
For gDNA samples, STRs (Short tandom Repeat) are currently used for parental predisposition identification. However, for trace amounts of DNA samples, such as embryonic biopsy cells, or culture fluid cfDNA, single cell whole genome amplification is required to be sufficient for genetic testing. WGS amplification often gives rise to interfering information such as Allele dominant amplification (one Allele is preferentially amplified in both alleles) or Allele Dropout (ADO, only one Allele is amplified in both alleles). This effect presents difficulties in the parental predisposition detection of minute amounts of DNA samples.
Therefore, there is a need in the art for a method for detecting parental tendencies in a trace DNA sample, and a method that can be used to determine the likelihood of parental contamination of a sample, and that can be used to identify ploidy abnormalities of progeny chromosomes in a sample.
Summary of The Invention
The inventor provides a method for identifying whether a sample to be detected has parental DNA tendency by constructing parental tendency statistics which can eliminate interference information (such as ADO interference) caused by Single cell whole genome amplification by utilizing a genetic variation polymorphic site, such as a Single Nucleotide Polymorphism (SNP) molecular marker. The method is suitable for rapidly, highly sensitively and specifically determining the father source pollution and the mother source pollution in a trace DNA sample (especially an embryo culture solution); and can rapidly, highly sensitively and highly specifically determine the ploidy variation of chromosomes of offspring, such as father trisomy, father triploid (a set of more copies of father chromosome), maternal trisomy, maternal triploid (a set of more copies of maternal chromosome), and uniparental diploid (father and maternal) in a trace DNA sample (for example from PGT biopsy).
Thus, in one aspect, the invention provides a method for identifying a parental predisposition to a test sample.
In yet another aspect, the present invention provides a method for detecting the presence or absence of parental contamination in a test sample by parental predisposition identification.
In yet another aspect, the invention provides a method for identifying ploidy abnormalities in progeny DNA in a subject sample by parental predisposition identification.
In yet another aspect, the invention also provides products, including but not limited to, devices, systems and apparatus, that can be used in any of the above methods of the invention or combinations thereof.
In a further aspect, the invention also provides the use of the device, system and apparatus of the invention for identifying a predisposition of a parent of a test sample, or for detecting a parental contamination of a test sample, or for identifying a ploidy abnormality of progeny DNA in a test sample; and in the preparation of products for said use.
Drawings
FIG. 1 shows the LRR and BAF distribution of 30% maternal-contaminated gDNA samples.
FIG. 2A shows a comparison of BAF distribution of the gDNA sample and single cell amplification products at selected sites of unequal homozygosity in the parents.
Figure 2B shows a comparison of BAF distribution for a non-contaminated SEM sample and a 30% parent contaminated sample.
FIG. 3 schematically shows an example of an assay for the detection of predisposition to a parent of a chip platform.
FIG. 4 schematically shows an example of an analytical method for the detection of parent propensity in a next generation sequencing platform.
FIG. 5 shows S of SEM samples with known no parental tendenciesMCDensity distribution.
FIG. 6 shows S under different parent ratio blending conditionsMCDistribution of (2).
FIG. 7 schematically shows an example of the method of detecting and analyzing the offspring heteroploids.
FIG. 8 shows S obtained in a reference frame without parental predispositionunevenAnd (4) distribution.
FIG. 9 shows S obtained in a reference frame without parental predispositionlohAnd (4) distribution.
FIG. 10 shows S of chromosome monomer or trisomy samplesunevenAnd SlohAnd (4) distribution.
FIG. 11 shows S of a maternal embryo tripler sampleunevenAnd SlohAnd (4) distribution.
Detailed Description
Before the present invention is described in detail, it is to be understood that this invention is not limited to the particular methodology and experimental conditions set forth herein as such may vary. In addition, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
Definition of
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. For the purposes of the present invention, the following terms are defined below.
The term "about," when used in conjunction with a numerical value, is intended to encompass a numerical value within a range having a lower limit that is 5% less than the stated numerical value and an upper limit that is 5% greater than the stated numerical value.
The term "and/or" when used to connect two or more selectable items should be understood to mean either one of the selectable items or any two or more of the selectable items.
As used herein, the term "comprising" or "comprises" is intended to mean including the stated elements, integers or steps, but not excluding any other elements, integers or steps. When the term "comprising" or "includes" is used herein, unless otherwise specified, it also encompasses the presence of stated elements, integers or steps. For example, when referring to an antibody variable region "comprising" a particular sequence, it is also intended to encompass antibody variable regions consisting of that particular sequence.
In the present invention, the test sample and the reference sample are both progeny trace DNA samples. The terms "progeny trace DNA sample" or "trace DNA sample" are used interchangeably herein to refer to a sample comprising DNA from progeny, wherein said DNA from progeny is present in the sample in trace amounts, e.g., in an amount of less than 1ng, 500pg, 100pg, or 10pg or even less (e.g., about 1-6 pg). As used herein, "DNA from progeny" or "progeny-derived DNA" are used interchangeably to refer to DNA from, for example, a progeny cell, progeny bodily fluid, or culture of progeny cells, whose genotype is substantially equivalent to the genotype of the progeny genome. Thus, in some embodiments, the trace DNA sample may be an IVF embryo culture (SEM) sample in which there is a trace amount of fetal genomic DNA released by the cultured embryos. In other embodiments, the sample is an embryonic or fetal biopsy cell sample, e.g., IVF blastotrophoblast cells. In still other embodiments, the sample is blood and/or plasma from a pregnant female, which contains minute amounts of cell-free dna (cfdna) from the fetus.
The term "progeny" as used herein includes, but is not limited to, progeny of a mammal, e.g., a human, meaning either born or unborn progeny. The unborn offspring comprise an embryo (embryo) or fetus (fetus). Embryo generally refers to the product of the cleavage of a fertilized egg by the eighth week after fertilization and before the end of the embryonic period. The cleavage stage of the embryo is present in the first three days of culture. "embryo transfer" is the process of placing one or more embryos and/or blastocysts into the uterus or fallopian tubes. A fetus generally refers to an unborn offspring of a mammal, particularly an unborn human infant, eight weeks after pregnancy.
The term "blastocyst" is a 5-or 6-day post-fertilization embryo having an internal cell mass, an outer cell layer called trophectoderm, and a fluid-filled blastocoel that contains the internal cell mass from which the embryo is derived in its entirety. Trophectoderm is a precursor of the placenta.
The terms "related individuals" or "pedigree individuals" of a progeny are used interchangeably and refer to any individual that is genetically related to the target progeny individual, e.g., any individual that is genetically related to the target progeny individual and therefore shares a haplotype with it. In one instance, the related individual may be the gene parents of the target individual or any genetic material derived from the parents, such as sperm, polar bodies, other embryos, or fetuses. It may also refer to a sibling, a parent or grandparent, or an outlying grandparent. In this application, a parent refers to the genetic father or mother of an individual. An individual offspring typically has two parents (a female parent and a male parent). The sibling refers to any individual whose gene parents are the same as the offspring individual in question. In some embodiments, a sibling may refer to a born child, embryo or fetus, or one or more cells derived from an embryo or fetus, a born child; siblings may also refer to haploid individuals derived from the parent side, such as sperm, polar bodies or any other haplotype genetic material. In one embodiment, the methods of the invention comprise determining genomic sequence information, e.g., genetic variation information (e.g., SNP information), of the genetic parents of the offspring. In some embodiments, parental genomic sequence information can be determined from large amounts of genomic DNA extracted from parental tissues (e.g., peripheral blood). In other embodiments, the genomic sequence information of the parent may also be known by the related individual or the family individual.
The term "SNP (single nucleotide polymorphism)" refers to a polymorphism at a site in a chromosomal DNA sequence due to a single nucleotide change, with the frequency of SNPs in a population generally being > 1%. On average, there is one SNP of 300-1000bp across the human whole genome. SNP databases are currently available from a variety of public databases, including, for example, http:// cgap. ncbi. nih. gov/GAI; http:// www.ncbi.nlm.nih.gov/SNP; human SNP database http:// hgbase. cgr. ki. sei or http:// hgbase. interactiva. de/.
The term "genotype" refers to the type of allele an individual possesses at a locus, referred to as the individual's genotype at that locus. For humans, in addition to sex chromosomes, each pair of homologous chromosomes has a pair of allelic types at the same locus, referred to as the genotype of the locus. Genotyping refers to the process of determining the genotype of an individual.
The term "mendelian genetic law" relates to two basic laws of genetics, namely the law of segregation and the law of free combination, collectively known as mendelian genetic laws. According to Mendelian genetic law, in meiosis, alleles can be separated along with separation of homologous chromosomes, enter two gametes respectively and are independently inherited to offspring along with the gametes; in addition, non-alleles on non-homologous chromosomes appear to combine freely while alleles segregate.
The term "nucleic acid chip", for example, "SNP chip", is a chip that can determine the genotype of a certain site using a signal (usually, a fluorescent signal) obtained after hybridization of the chip. In actual research, the SNP chip will contain different SNP sites according to the chip manufacturer, model, etc. For example, the human chips manufactured by Affymetrix and Illumina contain different sets of SNPs.
In this context, the term "parental predisposition" is used to mean that the genotype of a progeny detected in a trace DNA sample shows an increased or decreased frequency of parent (or maternal) alleles deviating from the expected mendelian inheritance pattern. Such a deviation of the allele frequency towards one of the parents can be caused, for example, by contamination of the parental DNA by the sample, or by ploidy variations in the progeny DNA itself in the sample. For example, in trace DNA samples from TE cells, the deviation may be caused by a genetic abnormality, such as duplication of a parent source chromosome/fragment, deletion of a parent source chromosome/fragment, and/or deletion of a parent source chromosome/fragment. Thus, in some embodiments of the invention, parental predisposition reflects, for example, the likelihood of progeny having genotypic variation for the progeny, such as parental triploids, maternal triploids, and uniparental diploids. In other cases, the genotype deviations detected in the trace DNA samples may be due to contaminating DNA from parental sources present in the sample. For example, a deviation in an SEM sample where there is parent or maternal DNA contamination. Thus, in other embodiments of the invention, parental predisposition may reflect the likelihood of a sample developing paternal or maternal contamination.
Parental predisposition statistic (S)POR) In this context, it is intended to mean a measure constructed according to the method of the invention for measuring the parental predisposition of a sample to be tested. In the method of the invention for sample contamination determination, the parental predisposition statistic is also denoted "S" hereinMC"; in the method of the present invention for the detection of progeny heteroploids, the parental predisposition statistic is also denoted "Suneven”。
The term "reference frame" is used herein to establish a parental predisposition threshold (i.e., S)PORThreshold) of the reference sample set. The skilled person knows how to select a suitable reference sample based on the test sample. Preferably, the reference sample and the test sample are of the same sample type containing trace amounts of progeny DNA. Thus, in some embodiments, the test sample and the reference sample are SEM samples collected in the same manner and containing trace amounts of progeny DNAThe method is as follows. In other embodiments, the test sample and the reference sample are biopsy cells of the same type that contain progeny trace DNA, e.g., IVF blastotrophoblast cells. In establishing the threshold using the reference system, preferably, each reference sample in the reference system is subjected to single cell whole genome amplification and parental propensity statistics determination in the same manner as the test specimen.
As used herein, the term "biallelic sites" refers to a pair of homologous chromosomes having two alleles at a particular locus in a diploid cell. In this context, the letters a and B denote the two alleles. Thus, the homozygous biallelic gene may be AA or BB; the heterozygous biallelic gene is AB. Herein, a bi-allelic site where parents of an embryo or fetus have different homozygous alleles is referred to as a "parental non-equal homozygous allelic site" or a "parental non-equal homozygous allelic site", e.g., where, if the father has an AA genotype, the mother has a BB genotype and, correspondingly, the a allele in the offspring is the paternal allele; and the B allele is a maternal allele; or if the father has a BB genotype, the mother has an AA genotype and correspondingly the B allele in the offspring is the paternal allele; and the a allele is a maternal allele. In the method of the invention, preferably, sites of parents having different homozygous alleles are selected for parent propensity value calculation.
Herein, the term "SEM" (spent medium) refers to a used embryo culture fluid collected during the in vitro embryo culture period of IVF. The IVF in vitro culture process may or may not be subjected to a fluid change prior to said harvesting, and the fluid change may be one or more times. For example, the SEM may be the SEM collected at D5 after D3 days of fluid change; or SEM collected hours (e.g., 6-7 hours) after the change of fluid on days D3 and D5, etc.
In this context, terms such as D1, D3, D4, D5, D6, and the like, when referring to SEM culture medium, refer to culture medium in which day 1, day3, day 4, day 5, and day 6 embryos are cultured in vitro by IVF. Similarly, SEM culture fluid of D1-D6 refers to SEM culture fluid of any day between day 1 and day 6 of in vitro culture, including, for example, but not limited to, SEM culture fluid of day 1 to day3 (D1-D3), day3 to day 5 (D3-D5), day3 to day 6 (D3-D6), day 4 to day 5 (D4-D5), day 4 to day 6 (D4-D6), and day 5 to day 6 (D5-D6) of in vitro culture. As used herein, the term "module" refers to software objects or routines (e.g., as separate threads) that may be executed centrally on a single computing system (e.g., a computer program, a PAD, one or more processors). Programs embodying the methods of the present invention may be stored on computer-readable media having computer program logic or code portions embodied therein for implementing the described system modules and methods. While the system modules and methods described herein are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated by those skilled in the art.
The following describes aspects of the present invention in detail.
The method of the invention
For trace amounts of DNA samples, especially on the picogram scale, single cell whole genome amplification is often required before genotyping, and this amplification technique can introduce a significant amount of genetic variation artifacts. This has led to the occurrence of mendelian genetic errors in these genotyping data, which may reflect data quality problems caused by WGA, sample contamination problems, or chromosomal DNA copy number abnormalities present in the sample. The inventor finds that when the pollution problem of the filial generation trace DNA sample is identified or the copy number of the filial generation is abnormal in the filial generation trace DNA, the parent tendency statistic of the sample can be constructed by using the specific gene variation site in the sample, and the background noise interference problem caused by WGA is effectively eliminated. On this basis, the invention provides a method for identifying the pollution problem of a trace DNA sample (especially SEM), and a method for identifying the copy number abnormality of filial generation (especially the heteroploid of the filial generation) in the trace DNA sample.
Preferably, the method of the invention comprises the steps of:
(1) carrying out gene variation locus (preferably SNP locus) analysis on the single cell whole genome amplification product of the progeny trace DNA sample;
(2) obtaining a Maternal Allele Frequency (MAF) and/or Paternal Allele Frequency (PAF) of a sample at a parental unequal homozygous allele site of a selected DNA segment (e.g., at the whole genome level, or chromosome segment level);
(3) classifying the locus as a paternally-predisposed locus, a maternally-predisposed locus, or non-paternally-and maternally-predisposed locus based on the MAF and/or PAF determined in step (2) based on classification thresholds a and b, preferably, classification is performed using allele frequency thresholds of 0.4 and 0.6;
(4) counting the number of maternal tendency sites (N) of the sample on the selected DNA segmentMAF) And number of parent Source orientation sites (N)PAF);
(5) Number of dominant sites (N) based on parent sourceMAF) And number of parent Source orientation sites (N)PAF) Determining the parental tendency statistic (S) of the sample at the level of said DNA segmentPOR);
(6) Comparing the statistic of parental tendency determined in step (5) with SPORThreshold values are compared to determine the parental predisposition of the sample at the level of the DNA segment.
In one embodiment, the detection on the selected DNA segment may be a whole genome level, a chromosome level, or a chromosome fragment level detection.
In one embodiment, the method of the invention is used to identify parental DNA contamination problems of progeny trace DNA samples. The sample may be in any possible sample format requiring the identification of parental DNA contamination. Preferably, the sample is an SEM, and wherein the parental predisposition of the sample indicates whether the sample SEM is likely to have parent or parent contamination. Still more preferably, the sample is a biopsy of an IVF embryo, and wherein the parental predisposition of the sample indicates whether the IVF embryo sample is likely to have paternal or maternal contamination.
In yet another embodiment, the method of the present invention is used to identify chromosomal ploidy abnormalities in progeny in a trace DNA sample from the progeny, wherein the sample is a biopsy, such as an IVF blastocyst trophoblast biopsy, associated with the progeny (e.g., an embryo or fetus), and wherein the parental predisposition of the sample indicates whether ploidy abnormalities exist in the progeny.
In some embodiments, the methods of the invention for identifying ploidy abnormalities of progeny chromosomes further comprise:
(i) counting the number of non-parent and parent prone sites (N) of the MAF and/PAF samples over the selected DNA segment between classification thresholds a and bLOH);
(ii) Calculating the total number of non-father-source and mother-source tendency loci (N) of the parent-unequal homozygous loci in the sectiontotal) Determining the heterozygous site rate (S) of the sample at the level of said DNA segmentloh);
(iii) (iii) comparing the heterozygous site rate determined in step (ii) with SlohComparing the threshold values, preferably SlohThe threshold is predetermined based on the frame of reference,
(iv) and (6) determining the parental tendency of the sample at the level of the DNA segment, and judging whether the ploidy abnormality (namely CNV) is caused by duplication (dup) or deletion (del).
In some preferred embodiments of the method, S is determined based on a parental predisposition-free reference framePORThreshold value sum SlohThreshold value, S by samplePORValue sum SlohThe degree to which the values deviate from the threshold indicates the likelihood that the offspring is a uniparental diploid or haploid.
In one embodiment, the method of the present invention further comprises: and amplifying a trace DNA sample by using a single cell whole genome amplification technology. The Amplification method adopts a single cell Amplification strategy, and the specific method is not limited, and includes, but not limited to, Primer extension PCR (Primer extension PCR, PEP-PCR) before Amplification, Degenerate oligonucleotide Primer PCR (DOP-PCR), Multiple Displacement Amplification (MDA), Multiple Annealing and circular circulation Amplification (Multiple Annealing and Looping base)ed Amplification Cycles, MALBAC). In a preferred embodiment, the amplification of a minute amount of DNA sample is performed using MALBAC. More preferably, use is made of
Figure BDA0003412626970000071
(US20190106738A1) whole genome amplification of a sample is performed.
In the method of the present invention, the genetic variation site and the corresponding allele frequency can be obtained from the amplification product of a trace amount of DNA sample and/or parental gDNA such as parents by any technique known in the art suitable for obtaining sequence information of a genetic variation. For example, techniques selected from nucleic acid chips and sequencing may be used. The nucleic acid chip and the sequencing technology can be a single nucleotide polymorphism site microarray nucleic acid chip, a MassARRAY flight mass spectrum chip, second-generation sequencing, third-generation sequencing or a combination thereof; for example, the single nucleotide polymorphism site microarray nucleic acid chip is an SNP genotyping chip; for example, the second generation sequencing includes whole genome sequencing, whole exome sequencing, and sequencing of targeted genomic regions, preferably whole genome sequencing, which may be 1X, 2X, 5X, etc.
In one embodiment, the site of genetic variation is a Single Nucleotide Polymorphism (SNP) site. In a preferred embodiment, the SNP sites are detected on the basis of a nucleic acid chip. In another preferred embodiment, the SNP site is detected based on sequencing.
In one embodiment, the micro-DNA sample may be any sample that can be used to detect genetic information in progeny, such as, for example, fetal cell-free DNA (cfdna) in embryo culture fluid, blastocyst fluid, blastocoel fluid, maternal plasma, or other types of bodily fluids, and/or fetal cells in blastocyst trophoblast cells, blastomere embryonic cells, maternal blood, or other types of bodily fluids. In some embodiments, preferably, the micro DNA sample is a non-invasively obtained fetal nucleic acid sample. For example, the embryonic nucleic acid sample may be an embryo culture solution or free DNA derived or obtained from an embryo culture solution.
In some embodiments, the trace DNA sample comprises about 0.1pg to 40ng DNA, e.g., progeny trace genomic DNA nucleic acid of 1 to 40ng DNA, 20 to 40ng DNA, preferably 0.1 to 40pg DNA, 1 to 40pg DNA, 5 to 10pg, 10 to 40pg, 40 to 100pg DNA. More preferably, the trace DNA sample comprises 5-10pg or 10-40pg progeny trace genomic DNA nucleic acid. In some embodiments, the progeny is an embryo or a fetus. In some preferred embodiments, the progeny are IVF (in vitro fertilization) embryos. For example, in some embodiments, preferably, the methods of the invention are used for parental contamination identification of biopsies or SEM cultures from IVF embryos. In still other preferred embodiments, the progeny are ICSI (intracytoplasmic sperm microinjection) embryos. For example, in still other embodiments, preferably, the methods of the invention are used to identify ploidy abnormalities of ICSI embryos.
In one embodiment, the method of the present invention further comprises: and determining the gene variation sites for determining the statistical value of the parent tendency based on the corresponding genealogy information of the trace DNA samples. In one embodiment, the pedigree genetic information is parental genetic information. For example, the information can be obtained by genetic testing of parental genomic DNA, such as a large amount (e.g., at least 100ng or more) of genomic DNA extracted from peripheral blood. In some embodiments, pedigree allele information may be obtained from a nucleic acid sample of the pedigree individual (particularly a parental individual) comprising at least about 100ng DNA (e.g., 100ng-1000ng DNA). For example, the pedigree individual nucleic acid sample is a nucleic acid sample from blood, saliva, buccal swab, urine, nails, hair follicles, dander, cells, tissue, body fluid of the pedigree individual.
Thus, in one aspect, the present invention provides a method for determining the parental predisposition of a test sample, wherein said test sample comprises progeny genomic DNA in an amount of no more than 1ng (and preferably about 1-500pg, more preferably 1-100pg), wherein said method comprises the steps of:
(1) performing gene variation site (preferably SNP site) analysis on the single-cell whole genome amplification product of the tested sample;
(2) obtaining a frequency of Maternal Alleles (MAF) and/or a frequency of Paternal Alleles (PAF) of the sample at parentally unequal homozygous loci over a selected DNA segment (e.g., at the whole genome level, or chromosome segment level);
(3) classifying the locus as a parent-derived predisposition locus, or non-parent and parent-derived predisposition locus based on the MAF and/or PAF determined in step (2) compared to classification thresholds a and b, wherein classification threshold a is a value from 0.1 to 0.4 (e.g., 0.1,0.2,0.3,0.4, or any other value between 0.1-0.4), classification threshold b is a value from 0.6 to 0.9 (e.g., 0.9,0.8,0.7,0.6, or any other value between 0-60.9), and a + b is 1;
wherein the content of the first and second substances,
-classifying said site as a parent source propensity site if MAF ≦ a and/or PAF ≧ b;
-classifying said site as a maternal predisposition site if PAF ≦ a and/or MAF ≧ b;
-classifying said site as non-parent and parent prone site if MAF and/or PAF has > a and < b value;
(4) counting the number of maternal tendency sites (N) of said test sample on said selected DNA segmentMAF) And number of parent Source orientation sites (N)PAF);
(5) Number of dominant sites (N) based on parent sourceMAF) And number of parent Source orientation sites (N)PAF) Determining the parental tendency statistic (S) of the sample at the level of said DNA segmentPOR);
(6) The sample S determined in the step (5) isPORValue, and parental predisposition threshold (i.e., S)PORThreshold) to determine the parental predisposition of the test sample at the level of said DNA segment.
In some embodiments, preferably, in step (3), the threshold a for site classification is any other value between 0.2 and 0.4, and the classification threshold b is any other value between 0.6-0.8, and a + b is 1; more preferably, the classification threshold a is any other value between 0.3 and 0.4, and the classification threshold b is any other value between 0.6-0.7, and a + b is 1.
More preferably, in step (3), the classification threshold a is 0.4; and b is 0.6, and
-classifying said site as a parent origin propensity site if MAF ≦ 0.4 and/or PAF ≧ 0.6;
-classifying said site as a maternal predisposition site if PAF ≦ 0.4 and/or MAF ≧ 0.6;
-classifying said site as non-parent and parent prone site if MAF and/or PAF has >0.4 and <0.6 value.
In some embodiments, the method, SPORThe threshold is established using a reference frame without parental predisposition,
preferably, the reference system consists of 1-40 or more reference samples without parental predisposition;
preferably, the statistical value S is based on the parental tendentiousness of the reference systemPOR1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the reference SPORA threshold value;
if S of the samplePORGreater than the reference SPORAn upper threshold indicating a maternal predisposition to the test sample on the DNA segment; if S of the samplePORIf the DNA segment is less than the lower threshold, the DNA segment is suggested to be subjected to the paternal tendency of the sample.
In some embodiments, the method wherein the single cell whole genome amplification is performed using a method selected from the group consisting of: primer extension PCR before amplification, degenerate oligonucleotide primer PCR, multiple displacement amplification technology, multiple annealing circular cycle (MALBAC) amplification technology, blunt end or cohesive end connection library construction,
Figure BDA0003412626970000091
(US20190106738a 1); wherein, preferably, the single cell whole genome amplification is performed using MALBAC, more preferably, using
Figure BDA0003412626970000092
Performing the whole genome amplification. In some cases, the quality and/or quantity of the amplification product may be affected by different single cellsInfluence of the selection of the whole genome amplification format. In these cases, it is preferred to use a reference sample amplification product obtained under the same amplification regime as the test sample, as understood by the skilled person, for determining the threshold.
In some embodiments, in the method, in step (2), BAF values of the subject sample at the parental non-isohomozygous locus are obtained based on, for example, nucleic acid chip analysis, and the paternal allele frequency and/or maternal allele frequency of the sample at the locus is determined based on the BAF values.
In some embodiments, the method comprises:
-determining the site of the father AA genotype and the mother BB genotype on said selected DNA segment, calculating the BAF value of the test sample at said site, and counting the BAF ≧ the classification threshold b (preferably 0.6) (N)MB) And BAF ≦ Classification threshold a (preferably 0.4) (NPA) The number of sites of (a); and, determining the site of the father as BB and the mother as AA genotype, calculating the BAF value of the sample at the site, and counting the BAF ≦ classification threshold a (preferably 0.4) (N)MA) And BAF ≧ classification threshold b (preferably 0.6) (N)PB) The number of sites of (a);
-calculating the number of maternal tendency sites (N) of the test sample on said DNA segmentMAF) And number of parent Source orientation sites (N)PAF) Wherein
NMAF=NMB+NMA;NPAF=NPA+NPB
-constructing a parental tendency statistic S of the test samplePORWherein
SPOR=NMAF/NPAF=(NMB+NMA)/(NPA+NPB)。
In some embodiments, in step (2), allele depth values (AD) of the parental and maternal alleles of the subject specimen at the parental non-isohomozygous locus are obtained on the selected DNA segment based on, for example, NGS sequencing analysis, i.e., ADFatherAnd ADFemaleAnd determining the paternal allele frequency and/or maternal allele frequency of the subject sample at the locus based on the AD value:
wherein the maternal allele frequency of the sample at the locus; MAF ═ ADFemale/(ADFather+ADFemale);
Wherein the paternal allele frequency of the sample at the locus; PAF ═ ADFather/(ADFather+ADFemale)。
Preferably, based on the MAF and/or PAF obtained by AD analysis, the MAF ≧ the classification threshold b (preferably 0.6) (N) of the sample to be tested on the chosen DNA segment is countedMAF) And PAF ≧ classification threshold b (preferably 0.6) (N)PAF) Number of sites of (a) to construct S of sample specimenPORWherein
SPOR=NMAF/NPAF
In preferred aspects, the invention provides methods for identifying parental predisposition to DNA contamination in progeny minimal DNA samples using the methods for identifying parental predisposition of the invention, wherein a parental predisposition of a sample indicates the likelihood of parental or maternal contamination of said sample,
preferably, the sample is SEM embryo culture fluid, the method comprising: obtaining, at the whole genome level, the frequency of the Maternal Allele (MAF) and/or the frequency of the Paternal Allele (PAF) of a test sample at all of the parental inequality homozygous loci of said segment; and constructing a parent contamination propensity value S of the sampleMCWherein
SMC=SPOR=NMAF/NPAF
In some embodiments of the method for the identification of parent contamination according to the invention, the reference frame consists of SEM samples without parental predisposition, e.g. 1-40 reference SEM samples,
preferably, a parental predisposition statistic S is established for each SEM reference frame sampleMCAnd based on a reference system SMCMean. + -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations), set parental predispositionSex SMCA threshold value, wherein,
s if tested SEM sampleMCIf the sample size is larger than the upper threshold, the tested SEM sample is prompted to have a parent source pollution tendency;
s if tested SEM sampleMCAnd if the lower limit is less than the lower threshold, indicating that the tested SEM sample has the father source pollution tendency.
In some embodiments of the method for the identification of a parental contamination according to the invention, preferably, SMCThe upper threshold is 1.26, SMCThe lower threshold is 0.80; s of SEM sampleMC>1.26, indicating that the SEM sample has a parent source pollution tendency; s of SEM sampleMC<0.80, the SEM sample is suggested to have a tendency to be contaminated by the parent source.
In some embodiments of the method for identifying paternity contamination according to the invention, the subject SEM sample has less than 50%, 40%, more preferably less than 10%, or less than 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% or 1%, or less than 0.1% or less of paternity DNA contamination (e.g., paternal or maternal DNA). In another embodiment, the ratio of fetal DNA to contaminating parent DNA in the SEM sample tested is 1:9 to 9: 1.
In preferred further aspects, the invention provides methods for identifying chromosomal ploidy abnormalities in progeny in trace amounts of DNA samples using the methods for identifying parental predisposition of the invention, wherein the parental predisposition of a sample indicates the likelihood of chromosomal ploidy abnormalities in the progeny, wherein preferably the sample is a sample obtained by comparison with biopsy cells, such as IVF blastocyst trophoblast cells,
preferably, the indication of predisposition by said parent is selected from: a parental trisomy, a parental triploid, a maternal trisomy, a maternal triploid, a parental diploid or haploid, a maternal parental diploid or haploid, and chromosomal ploidy variations of any combination thereof.
In some embodiments of the method for identifying abnormal ploidy in progeny according to the invention, the frequencies (MAF) and (MAF) of the maternal alleles of a test sample at the parental unequal homozygous loci of said segment are obtained on selected DNA segments (in particular, at the chromosomal level)/or frequency of Paternal Alleles (PAF); and constructing a parental tendency magnitude S of the sampleunevenWherein
Suneven=SPOR=NMAF/NPAF
In some embodiments of the method for identifying abnormal ploidy in progeny according to the invention, the method further comprises,
-counting the total number of parents-unequal homozygous sites (N) of the test sample on said selected DNA segmenttotal) And number of non-parent and parent dominant sites (N)LOH);
Based on NLOHAnd NtotalDetermining the heterozygous site rate (S) of the test sample at the level of said DNA segmentloh) (ii) a And
the determined heterozygous site ratio of the test specimen (i.e., sample S)lohValue) and heterozygous site rate threshold (i.e., S)lohThreshold) are compared.
In some embodiments, the degree of heterozygous deletion in the selected DNA segment can be reflected by the heterozygous site rate. Thus, in some embodiments, a monadic diploid or haploid can be distinguished from a chromosomal trisomy or triploid by the extent to which the heterozygous site rate deviates from the threshold. For example, in some cases, the monadic diploid or haploid may exhibit a heterozygous site rate of less than 0.2, preferably less than 0.15,0.1, 0.07, 0.05,0.02 or less.
In some embodiments of the method for identifying abnormal ploidy in progeny according to the invention, the method further comprises:
establishing S using a reference frame without parental predispositionunevenThreshold value and optionally SlohThreshold, preferably, the parental predisposition-free reference frame consists of 1-40 or more euploid biopsy samples.
Preferably, a reference system S is utilizedunevenMean. + -. 1-5 standard deviations (preferably 2-3, especially 3) to set the parental predisposition SunevenAnd (4) a threshold value. Preferably, a reference system S is utilizedlohMean. + -. 1-5 standard deviations (preferably)2-3, especially 3 standard deviations) to set SlohAnd (4) a threshold value.
In some embodiments of the method for identifying abnormal ploidy in progeny according to the invention, SunevenAn upper threshold of about 1.75 and a lower threshold of about 0.53; and/or SlohThe upper threshold is about 0.76 and the lower threshold is about 0.22.
In some embodiments of the method for identifying abnormal ploidy in progeny according to the invention, the method comprises:
construction of S at the chromosomal level of the test specimenunevenStatistics and predetermined S based on a parental predisposition-free reference frameunevenComparing the threshold values;
construction of S at the chromosome level of the test specimenlohA value, and S predetermined based on a parental predisposition-free reference framelohComparing the threshold values;
if receiving the sample book SlohIn a reference system SlohWithin a threshold value, SunevenGreater than the reference system SunevenPrompting the maternal trisomy or the maternal triploid if the upper limit of the threshold value is reached; otherwise, SunevenSmaller than the reference system SunevenA lower threshold value, prompting a father source trisomy or a father source triploid;
if SlohSmaller than the reference system SlohLower threshold, SunevenGreater than the reference system SunevenAn upper threshold, such as greater than 10,12,15,17,19 or 20, indicates a maternal unipolar diploid or haploid; otherwise, SunevenSmaller than the reference system SunevenAn upper threshold, such as less than 0.2,0.15,0.1,0.05,0.02, or 0.01, indicates a parental diploid or haploid. Preferably, if SlohSignificantly less than the lower threshold, e.g., any value from 0 to 0.2, or less than 0.15, or less than 0.1, suggests a haploid or haploid parental.
In some embodiments of the method for identifying abnormal ploidy of progeny according to the present invention, the amount of parental predisposition and heterozygous site rate of the test sample on each of a plurality of autosomes is constructed, the likelihood that the test sample is maternal or paternal triploid is determined, preferably,
if the test sample shows the parent tendency and the heterozygous site rate which indicate the maternal trisomy on more than 15 chromosomes, judging that the test sample is probably the maternal triploid;
and if the tested sample shows the parent tendency and the heterozygous site rate of the suggestive father-origin triploid on more than 15 chromosomes, judging that the tested sample is possibly the father-origin triploid.
In a further aspect, the invention also relates to the use of the device, apparatus or system of the invention for carrying out the method of the invention. Preferably, the apparatus, device or system of the invention is characterized in that,
-optionally, being able to perform a single cell whole genome amplification of the test sample and optionally of the reference sample;
optionally, detection of genetic variation (preferably SNP) information of the progeny genome of the obtained single cell whole genome amplification product can be performed, e.g., wherein said information is determined by nucleic acid chip or NGS sequencing;
-enabling to perform an embodiment of any of the methods according to the invention based on the genetic variation information of the test sample and optionally the genetic variation information of the reference line, to identify parental predisposition of the test sample, or to detect parental contamination of the test sample, or to identify ploidy abnormalities of progeny DNA in the test sample.
In a further aspect, the invention also provides the use of a device, apparatus or system of the invention in a method of the invention, preferably,
-for identifying parental predisposition of a test sample, or for detecting parental contamination of a test sample, or for identifying ploidy abnormalities in progeny DNA in a test sample, or
The application of the DNA ploidy-abnormality detecting agent in the preparation of products for identifying the parental tendency of a test sample, detecting the parental pollution of a test sample, or identifying the ploidy abnormality of filial generation DNA in the test sample.
Hereinafter, the method for identifying the parent contamination and the method for identifying the offspring heteroploid of the present invention are further described. Aspects and advantages of the invention will become apparent from the description.
SEM pollution identification method
Analysis of cfDNA (cell-free DNA) in the SEM collected at the IVF blastocyst stage has been proposed as a more promising sampling method than invasive sampling. Recently, blastocoel fluid and SEM samples were compared with Trophoblast (TE) cells as PGT (pre-embryo implantation genetic test) samples. The TE samples showed 100% amplification and high genotype compliance (99.8%). Blastocoel fluid samples showed high failure rates of amplification (72.6%) and low genotype compliance (13.3%). SEM samples showed better performance than blastocoel fluid but lower performance than TE samples, with low amplification failure rate (10.3%) and moderate genotype compliance (59.5%). It has been suggested that one factor that leads to low diagnostic accuracy of SEM is the problem of SEM broth contamination. Sources of contamination include, for example, maternal DNA contamination (e.g., cumulus cells not completely removed prior to ICSI), paternal DNA contamination, and contamination from exogenous DNA already present in the media supplement. Because contamination is a major risk factor for genetic diagnostic errors, there is a great need to optimize current SEM contamination identification protocols to determine whether the collected sample reflects the true genetic status of the embryo, to facilitate embryo-specific allele analysis, and to differentiate between embryonic and non-embryonic DNA.
SNP microarrays and NGS allow simultaneous acquisition of SNP genotype and chromosome copy number information. Thus, in principle, both techniques can provide aneuploidy, polyploid, and uniparental diploid information. In conventional multicellular genomic DNA analysis, two parameters are used to reflect copy number status: logR ratio (log 2 transformation of normalized SNP intensity) and B allele frequency (BAF, i.e., the ratio of B allele signal intensity to total SNP signal intensity). BAF values of 0, 0.5 and 1 represent normal copy number (n-2), whereas analysis of an abnormal condition of the sample will result in an increase or decrease in total intensity and allele frequency, with BAF deviating from the above values. For example, when a copy number deletion occurs, BAF has only 0 and 1 (genotypes AA and BB), no values around 0.5 (genotype AB); the LRR value is decreased; when the copy number increases, BAF takes values of 0,0.33,0.67 and around 1.0 (genotypes AAA, AAB, ABB and BBB), while its LRR value increases. Based on this regular change in BAF and LRR values, BAF values have been proposed to estimate abnormal copy number variation and proportion of foreign contaminating DNA in conventional genomic DNA (gdna) samples, as well as proportion of fetal DNA in maternal plasma cfDNA. For example, CN104640997A discloses methods of using BAF values to calculate the percentage of fetal DNA in plasma DNA and to diagnose fetal trisomy 21 risk in NIPD (non-invasive prenatal diagnosis) assays with pregnant maternal plasma DNA. In this document, it is pointed out that for the method described, the small amount of the original sample brings about a problem of statistical accuracy, and to avoid this problem, a target region capture combined with targeted amplification should be used to obtain the target region sample of the 21 chromosome to be detected.
SEM contamination identification is complicated by multiple factors such as low quality and very small amounts of embryonic DNA in SEM. Only a trace amount of cell-free embryonic DNA (about 10 to several tens of picograms) is present in the SEM, and therefore, cfDNA in the SEM needs to be amplified using single cell Whole Genome Amplification (WGA) techniques such as MALBAC and MDA before SNP microarray/NGS data analysis can be performed. It is known in the art that WGA amplification at the single cell level introduces a significant amount of technical noise, causing significant BAF distortion. The same problem exists with whole genome amplification by SEM. Furthermore, the presence of contaminating DNA in the SEM may further exacerbate the problem. For mutexample, it has been reported that amplification using MDA on the SEM, although resulting in 97% amplification rate, only 2% of the amplified samples yielded reliable PGT-a results. Furthermore, it has also been reported that there is a possibility that contamination of the maternal DNA occurs in the SEM even after the cumulus removal treatment and washing. This leads to the problem of contamination of SEM samples that are difficult to identify according to conventional BAF values.
In order to solve the problems, in the method, father source allele frequency (PAF) and/or mother source allele frequency (MAF) values of father-source and mother-source unequal homozygous allele sites of an SEM sample are used, father-source and mother-source inclination classification and counting are carried out on the sites, and the ratio of the number of the mother source inclination sites to the number of the father source inclination sites is adopted to construct a parent pollution inclination statistic S for effectively eliminating interference information caused by single-cell whole genome amplificationMC. By comparison with a predetermined SMCThreshold comparison (e.g., a threshold established by a reference frame), S constructed using the present inventionMCThe magnitude can specifically and sensitively identify the paternal/maternal contamination propensity of a sample.
In one embodiment, for SMCThe genetic variation sites calculated are biallelic sites where the father and mother have different homozygous alleles, respectively, i.e., sites where the father is an AA genotype and the mother is a BB genotype, and sites where the father is a BB genotype and the mother is an AA genotype. In a further preferred embodiment, the paternal allele frequency and/or maternal allele frequency of the SEM sample at the selected DNA fragment is determined at the site of the parental non-homozygosity allele.
In a preferred embodiment, BAF values are obtained for selected DNA fragments at parentally homozygous allelic sites for said loci in an SEM sample based on, for example, SNP array analysis, and the paternal and/or maternal allele frequencies of the sample at said loci are determined based on said BAF values:
-at a site where father is AA genotype and mother is BB genotype, the maternal allele frequency MAF of SEM samples at said site is the BAF value of said site;
-at a site where the father is the BB genotype and the mother is the AA genotype, the paternal allele frequency PAF of the SEM sample at said site is the BAF value of said site.
In yet another preferred embodiment, the allele depth values (AD) of the parental and maternal alleles of the SEM sample at the selected DNA segment are obtained at the parental and parental homozygous allele sites, i.e. AD, based on, for example, sequencing analysis of NGSFatherAnd ADFemaleAnd determining, based on the AD value, a paternal allele frequency and/or a maternal allele frequency of the sample at the locus:
wherein, the maternal allele frequency of the SEM sample at the site; MAF ═ ADFemale/(ADFather+ADFemale);
Wherein the SEM sample is at the parent source of the site(ii) allele frequency; PAF ═ ADFather/(ADFather+ADFemale)。
The loci may be classified according to the "method of the invention" section above, based on MAF and/or PAF allele frequencies determined from BAF values and/or AD values. In a preferred embodiment, in the SEM sample, at parentally different homozygous allelic sites of said selected DNA segment, sites with a Paternal Allele Frequency (PAF) ≧ b (e.g., 0.6) and/or a Maternal Allele Frequency (MAF). ltoreq.a (e.g., 0.4) are classified as paternal prone sites by comparison to classification thresholds a and b; sites with Paternal Allele Frequency (PAF). ltoreq.a (e.g., 0.4) and/or Maternal Allele Frequency (MAF). gtoreq.b (e.g., 0.6) are classified as maternal predisposition sites. The classification threshold values a and b may be defined as described in the "method of the invention" section above. Preferably, the classification threshold a may be any value between 0.1 and 0.4 or preferably between 0.2 and 0.4 or more preferably between 0.3 and 0.4, and the classification threshold b may be any value between 0.6 and 0.9 or preferably between 0.6 and 0.8 or more preferably between 0.6 and 0.7, provided that a + b is 1.
In a preferred embodiment, the parental predisposition statistic is a parental pollution predisposition value SMCCalculated as the ratio of the number of parent source propensity sites in the SEM sample to the number of parent source propensity sites in the SEM sample.
In one embodiment, SMCThe threshold was established using a SEM reference frame without parental predisposition. In one embodiment, the reference frame consists of 1-40 or more parental predisposition-free SEM samples, e.g., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more. Preferably, the reference SEM sample is collected at the same or similar time as the subject SEM sample, in the same or similar manner. Preferably, the reference SEM sample and the test SEM sample employ the same single cell whole genome amplification method to obtain amplification products for analysis.
In one embodiment, S of the SEM reference frameMCThe values are approximately normally distributed. In one embodiment, genomic allele information of the corresponding parent in conjunction with a reference SEM is used to establish a reference frame SEM sampleCalculating parent tendency statistic S of SEM reference system sample at different homozygous allele sites of parentsMC. In yet another embodiment, a reference system S is utilizedMCMean. + -. 1-5 standard deviations (preferably 2-3, especially 3) to set the parental predisposition SMCAnd (4) a threshold value. S of SEM sampleMCIf the sample is larger than the upper threshold, the SEM sample is prompted to have a parent source pollution tendency; s of SEM sampleMCAnd if the value is less than the upper threshold, indicating that the SEM sample has the father source pollution tendency.
In a preferred embodiment, SMCThe upper threshold is 1.26, SMCThe lower threshold is 0.80; s of SEM sampleMC>1.26, indicating that the SEM sample has a parent source pollution tendency; s of SEM sampleMC<0.80, the SEM sample is suggested to have a tendency to be contaminated by the parent source.
In some embodiments, the methods of the invention can detect 1-90% of the parental DNA contamination in SEM samples. Preferably, the contamination by the parental DNA is less than 50%, 40%, more preferably less than 10%, or less than 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% or 1%, or less than 0.1% or less. In a preferred embodiment, the parental DNA contamination in the SEM sample is maternal DNA contamination, and preferably the level of maternal contamination is no more than 10% or no more than 5%. In another preferred embodiment, the parental DNA contamination in the SEM sample is parental DNA contamination, and preferably the contamination level is no more than 10% or no more than 5%.
In some embodiments, the methods of the invention can detect parental DNA contamination in SEM samples containing trace amounts of fetal DNA. In some embodiments, the ratio of fetal DNA to affinity-contaminating DNA in the SEM sample is, for example, 1:9 to 9: 1. In other embodiments, the ratio of fetal DNA to affinity-contaminating DNA in the SEM sample is less than 1:5, less than 1:6, less than 1:7, less than 1:8, or less than 1: 9.
In one embodiment, the fetal trace DNA present in the SEM culture fluid sample is subjected to whole genome amplification prior to SEM sample data analysis. The amplification method adopts a single cell amplification strategy, and the specific method is not limited. In a preferred embodiment, SE is performed using MALBACAnd (5) M amplification. More preferably, use is made of
Figure BDA0003412626970000151
Carrying out whole genome amplification.
In one embodiment, genomic DNA (gdna), e.g., 1ug, e.g., 200ng or more, of the parent or family of SEM-corresponding embryos is extracted for determining sequence information (e.g., genetic variation site such as SNP site information) of the parent's genome. In one embodiment, gDNA is extracted from the peripheral blood of parents for genetic information analysis.
In one embodiment, the SEM sample is an SEM collected on any one of the days of culture selected from D1 to D6. In one embodiment, the IVF is not subjected to a broth change during in vitro culture prior to harvesting. In yet another embodiment, the IVF is cultured in vitro before harvesting, for example, by changing the culture medium once on day D3 or day D5, or twice on days D3 and D5. In one embodiment, the SEM is collected about 2-3 days, such as two days, or 1-24 hours, such as 5-10 hours or 6-7 hours, after the last exchange of fluid. In yet another embodiment, for extracting parental predisposition SMCSEM culture fluids of amounts are D1-D6, e.g., D1-D3 or D3-D5, or D4-D5, or D4-D6, or D5-D6 embryo culture fluids, e.g., blastocyst culture fluids, particularly D3-D5, D4-D6, D4-D5, or D5-D6 blastocyst culture fluids. In some embodiments, the SEM samples of the invention are a mixture of IVF blastocyst fluid and blastocoel fluid.
In some embodiments, the SEM samples of the present invention are SEM cultures of ICSI (intracytoplasmic single sperm microinjection) embryos. The culture fluid may be aspirated from an embryo culture fertilized with intracytoplasmic sperm injection microscopy (ICSI), preferably on days 3-10, preferably 5 of culture, as an SEM sample of the invention.
In some embodiments, after removing the zona pellucida, the embryos are cultured in 0.1ul to 1ml of culture medium using a single embryo culture system, and a small amount of culture medium (e.g., about 0.1ul to 1ml, e.g., about 0.1ul, 10ul, 20ul, 30ul, 40ul, 50ul, 100ul, 200ul, 500ul, 800ul, 1ml) is isolated from the culture as the SEM sample of the present invention. Preferably, the reference SEM sample is collected in the same or similar manner.
In other embodiments, a small amount of culture fluid (e.g., about 0.1ul to 1ml, e.g., about 0.1ul, 10ul, 20ul, 30ul, 40ul, 50ul, 100ul, 200ul, 500ul, 800ul, 1ml) is isolated from a culture of an embryo that has been surface washed from an ovum or fertilized egg prior to culturing as the SEM sample of the present invention. Preferably, the reference SEM sample is collected in the same or similar manner.
MCExemplary S calculation based on genotyping detection
In the method of the present invention, embryo-related genetic variation information can be obtained from single-cell whole genome amplicons of a sample to be tested (e.g., SEM culture or TE cells) by any genotyping detection method known in the art. As an example, the following describes the detection of genetic variation based on nucleic acid chips and the corresponding SMCCalculating; and NGS sequencing based detection of genetic variation and corresponding SMCAnd (4) calculating. While these exemplary methods are preferred, they should not be construed as limiting the invention. It will be appreciated by those skilled in the art that other genotyping assays may be used to obtain S for use in the inventionMCCalculated genetic variation information. Those skilled in the art will appreciate that the genetic analysis test methods and parent propensity values calculation methods described herein, suitably adapted, will also be applicable to the parent propensity values S of the present inventionPORAnd SunevenObtaining the product.
MCNucleic acid chip-based genetic variation detection and S-calculation
Any nucleic acid chip known in the art can be used for genetic variation detection.
After genetic testing of polymorphic sites on a nucleic acid chip (genotyping) platform, the baf (b ale frequency) information obtained for each polymorphic site can be analyzed using polymorphic site analysis algorithms or algorithms known in the art, e.g., GenomeStudio software by Illumina, etc. Preferably BAF is a continuous value, e.g. 0-1, wherein BAF ═ 1, indicating that the locus of the sample to be tested is homozygous B allele (BB); BAF ═ 0, indicating homozygous A Allele (AA); BAF ═ 0.5, indicating a heterozygous site (AB); BAFs that deviate from the classification threshold a (e.g., 0.4) and the classification threshold b (e.g., 0.6) represent different parental tendencies of the site.
For example, as shown in FIG. 3, a site with the father being AA genotype and the mother being BB genotype is selected, and the BAF value of the sample at the site is calculated, and the statistic BAF is more than or equal to 0.6 (N)MB) And BAF ≤ 0.4 (N)PA) The number of sites of (a); similarly, selecting a site with a father being BB and a mother being AA genotype, calculating the BAF value of the sample at the site, and counting the BAF value to be less than or equal to 0.4 (N)MA) And BAF ≥ 0.6 (N)PB) The number of sites of (a); construction of parent tendency statistics SMCThe following were used:
SMC=(NMB+NMA)/(NPA+NPB)。
if SMCIf the concentration is obviously more than 1, the maternal pollution tendency is prompted; if SMCIf the concentration is obviously less than 1, the father source pollution tendency is prompted; if SMCNo significant difference from 1 indicates no tendency of contamination.
Preferably, S is established by establishing a parental predisposition-free reference frameMCA threshold value; s of the sampleMCValue and SMCThreshold comparisons are made to determine the parent/parent contamination propensity of the sample.
MCGene variation detection and S calculation based on NGS sequencing
NGS sequencing can be performed according to methods known in the art, including using various available commercial platforms for NGS sequencing.
After NGS sequencing data is obtained, data analysis can be performed using any genetic variation detection method known in the art. In a preferred embodiment, the Genome Analysis Toolkit (GATK) optimization strategy is used for gene variation detection Analysis. More preferably, the analysis comprises the steps of: firstly, performing data quality control filtering on an original fastq file by using fastp software; secondly, aligning the sequence to a reference genome by using a BWA-MEM algorithm. Employed in the examples below is the reference genome of hg 19; thirdly, sorting and indexing the compared files by utilizing a Picard SortSam command and Samtools software to finally obtain a bam file; fourthly, removing the duplicate by using MarkDuplicates command of Picard; utilizing a BaseRecalibrator command and an AplyBQSR command of the GATK to acquire Base Quality Score Recalibration (BQSR) for Base mass re-correction; sixthly, detecting the gene variation of a single sample by using a Haplotpypecaller method of GATK; seventhly, carrying out multi-sample combined gene variation detection by using a CombineGVCF and genotypGVCFs method of GATK; utilizing a Variant Recalibrator and an AplyVQSR method to obtain a Variant Quality Score Recalibration (VQSR) and carrying out variation mass Recalibration.
In some embodiments, genetic variation SNP detection data is obtained from single cell whole genome amplicons of SEM samples, and paternal gDNA samples of their corresponding embryos, using NGS-based genetic variation detection methods.
After obtaining the SNP detection data, as shown in fig. 4, for the biallelic locus, a locus where the father is a homozygous genotype and the mother is another homozygous genotype is selected, the SEM sample corresponding to the embryo is calculated, and the allelic depths (AD, AD) of the two alleles at the locus, the allelic depth (AD, the allelic depth) of the two alleles at the locus, and the allelic depth of the two alleles at the locus are calculatedFatherAnd ADFemale. Calculating the Maternal Allele Frequency (MAF) and the Paternal Allele Frequency (PAF) of each selected locus: MAF ═ ADFemale/(ADFather+ADFemale);PAF=ADFather/(ADFather+ADFemale). Statistical MAF ≧ classification threshold b (e.g., 0.6) (N) for the sample to be testedMAF) Number of bits and PAF ≧ classification threshold b (e.g., 0.6) (N)PAF) Using the following formula, constructing a parental tendency statistic SMC
SMC=NMAF/NPAF
Based on calculated SMCValues, determine the parent and parent contamination tendencies of the SEM samples. If SMCApproximately equal to 1, no affinity pollution tendency is suggested; if SMCIf greater than 1, it will promptThere is a mother source propensity for contamination; if SMCLess than 1 indicates a parent contamination trend.
Preferably, S is established by establishing a parental predisposition-free reference frameMCA threshold value; s of the sampleMCValue and SMCThreshold comparisons are made to determine the parent/parent contamination propensity of the sample.
Detection method for filial generation heteroploid
At present, the second generation sequencing platform cannot detect ploidy abnormal variation of the whole genome. For amplification products of the single-cell whole genome, such as amplification products of MALBAC and the like, the chip platform can not directly acquire the information due to the factors of allele dominant amplification or allele tripping and the like caused by single-cell amplification.
With the parental predisposition statistic analysis strategy of the present invention, parental predisposition detection at the chromosome level (even lower levels, e.g. chromosome 10-100M fragments, such as 10M, 40M, 100M) can be performed, thus suggesting Copy Number Variation (CNV) information; and the whole genome ploidy abnormality of the sample can be known based on the 22 chromosome information. Thus, the present invention provides methods that can be used to analyze whether progeny embryos have abnormal ploidy, such as abnormal grape embryos.
In one embodiment, the method of the invention is used for progeny heteroploid detection, wherein the progeny heteroploid detection is a triploid and/or uniparental diploid or haploid detection. In yet another embodiment, the method of the invention is used to detect ploidy variations in progeny chromosomes selected from the group consisting of a parent trisomy (chromosome level), a parent triploid (parent chromosome polycopy set), a maternal trisomy (chromosome level), a maternal triploid (parent chromosome polycopy set), a parent uniparental diploid, a parent haploid, a maternal haploid, and any combination thereof.
In the method for analyzing ploidy of progeny of the present invention, S is similar to the previous SMCAnalysis strategy, but preferably at the chromosomal level, using the Paternal Allele Frequency (PAF) and/or maternal allele frequency of the parental unequal homozygous allele sites of the sample(MAF) value, performing father source and mother source tendency classification and counting on the loci, and constructing parent tendency statistic S according to the ratio of the number of mother source tendency loci to the number of father source tendency lociunevenAnd with a predetermined SunevenThreshold comparisons to determine parent or parent-source tendencies of children. Preferably, SunevenThe threshold is determined using a reference frame without parental predisposition. In some embodiments, sample SunevenA significant deviation from the threshold value indicates that progeny are likely to have an abnormal Copy Number (CNV) of genetic material from the parent source or genetic material from the parent source on the DNA segment. For example sample SunevenValues significantly above the threshold are likely to be caused by increased maternal genetic material or decreased paternal genetic material; otherwise, sample SunevenValues significantly below the threshold are likely to be caused by an increase in parent material or a decrease in parent material.
To further distinguish whether the detected CNV is caused by a duplication (dup) or deletion (del), the present inventors further inventively propose that, in the ploidy analysis method, the following steps are included: counting the total number of parental unequal homozygous allele sites (N) of the sampletotal) And total number of sites with non-parent and non-parent tendencies (N)LOH) Calculating NlohIn NtotalThe percentage of the hybrid locus S is constructedloh. By mixing a sample SunevenThe value is related to a predetermined SunevenThreshold value and sample SlohValue and SlohThreshold value comparison, the method can specifically and sensitively identify ploidy abnormality of the sample, and determine father/mother sources of filial generation ploidy abnormality; and can distinguish between uniparental diploids or haploids that involve loss of chromosomes of maternal or paternal origin.
In a preferred embodiment, the process of the invention comprises the steps of:
construction of S at the chromosomal level of the test specimenunevenStatistics and predetermined S based on a parental predisposition-free reference frameunevenComparing the threshold values;
construction of S at the chromosome level of the test specimenlohValues and pre-clustering based on parental predisposition-free reference framesPreviously determined SlohComparing the values; if receiving the sample book SlohIn a reference system SlohWithin a threshold value, SunevenGreater than the reference system SunevenPrompting the maternal trisomy or the maternal triploid if the upper limit of the threshold value is reached; otherwise, SunevenSmaller than the reference system SunevenA lower threshold value, prompting a father source trisomy or a father source triploid;
if SlohSmaller than the reference system SlohLower threshold, SunevenGreater than the reference system SunevenAn upper threshold, such as greater than 10,12,15,17,19 or 20, indicates a maternal unipolar diploid or haploid; otherwise, SunevenSmaller than the reference system SunevenAn upper threshold, such as less than 0.2,0.15,0.1,0.05,0.02, or 0.01, indicates a parental diploid or haploid.
In a preferred embodiment, BAF values are obtained for selected DNA segments at parentally non-homozygous allelic sites of said loci based on, for example, SPN array analysis, and the paternal and/or maternal allele frequencies of a sample at said loci are determined based on said BAF values:
-at a site where the father is of AA genotype and the mother is of BB genotype, the maternal allele frequency MAF of the sample at said site is the BAF value of said site;
-at a site where the father is the BB genotype and the mother is the AA genotype, the paternal allele frequency PAF of the sample at said site is the BAF value of said site.
In a further preferred embodiment, the allele depth values (AD) of the parent allele (P) and the maternal allele (M) of the sample at the site of genetic variation, i.e. AD, are obtained at the parentally non-homozygous allelic sites of the selected DNA segment, based on, for example, sequencing analysis of NGS, i.e. ADFatherAnd ADFemaleAnd determining, based on the AD value, a paternal allele frequency and/or a maternal allele frequency of the sample at the locus:
wherein the maternal allele frequency of the sample at the locus; MAF ═ ADFemale/(ADFather+ADFemale);
Wherein the paternal allele frequency of the sample at the locus; PAF ═ ADFather/(ADFather+ADFemale)。
The loci may be classified according to the "method of the invention" section above, based on MAF and/or PAF allele frequencies determined from BAF values and/or AD values. In a preferred embodiment, in a progeny sample, at selected parental variant homozygous allelic sites, a site with a Paternal Allele Frequency (PAF) ≧ b (e.g., 0.6) and/or a Maternal Allele Frequency (MAF). ltoreq.a (e.g., 0.4) is classified as a paternal predisposition site by comparison to classification thresholds a and b; sites with Paternal Allele Frequency (PAF). ltoreq.a (e.g., 0.4) and/or Maternal Allele Frequency (MAF). gtoreq.b (e.g., 0.6) are classified as maternal predisposition sites. The classification threshold values a and b may be defined as described in the "method of the invention" section above. Preferably, the classification threshold a may be any value between 0.1 and 0.4 or preferably between 0.2 and 0.4 or more preferably between 0.3 and 0.4, and the classification threshold b may be any value between 0.6 and 0.9 or preferably between 0.6 and 0.8 or more preferably between 0.6 and 0.7, provided that a + b is 1.
In a preferred embodiment, the parental predisposition statistic SunevenThe ratio of the number of parent source tendency sites in the child sample to the number of parent source tendency sites in the sample is calculated.
In some preferred embodiments, genomic polymorphic sites and their BAF values are obtained for the father and mother and for biopsy samples. In a further preferred embodiment, as shown in fig. 5, the method of the present invention may comprise the steps of:
-selecting SNP site with father AA genotype and mother BB genotype, determining BAF value (between 0-1) of corresponding biopsy sample at the site, and counting the number of sites with BAF greater than or equal to 0.6 (N)MB) And the number of sites with BAF less than or equal to 0.4 (N)PA);
-selecting a SNP site with a genotype BB in the father and AA in the mother, determining the BAF value (between 0 and 1) of the biopsy sample at this site, and counting the BAF ≦ classification threshold a (e.g. 0.4) (N)MA) And the number of bits (N) for BAF ≧ classification threshold b (e.g., 0.6)PB);
-counting the total number of all SNP sites selected (Ntotal), and the total number of sites with BAF between classification thresholds a and b (e.g. between 0.4-0.6) among these sites (N)LOH);
-using the formula: suneven=(NMB+NMA)/(NPA+NPB) Calculating a parental tendency statistic S of the sampleuneven(ii) a And optionally
-using the formula: sloh=NLOH/NtotalAnd calculating the heterozygous site ratio of the sample.
In one embodiment, the present invention comprises: creation of S Using a reference frame without parental predispositionunevenThreshold value and optionally SlohAnd (4) a threshold value. In one embodiment, the reference frame consists of 1-40 or more euploid biopsy cell samples without parental predisposition, e.g., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more. Preferably, the reference sample is collected in the same or similar manner at the same or similar time as the test sample. Preferably, the amplification product of the reference sample and the amplification product of the test sample are obtained by the same single cell whole genome amplification method.
In one embodiment, S of the reference frameunevenThe values are approximately normally distributed. In one embodiment, the genomic allele information of the corresponding parents in the reference biopsy cell sample is combined to select sites of different homozygous alleles of the parents for calculating the parental tendency statistic S of the reference systemunevenAnd optionally a heterozygous sites rate SlohThe value is obtained. In yet another embodiment, a reference system S is utilizedunevenMean. + -. 1-5 standard deviations (preferably 2-3, especially 3) to set the parental predisposition SunevenAnd (4) a threshold value. In yet another embodiment, a reference system S is utilizedlohMean. + -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set SlohAnd (4) a threshold value. In a preferred embodiment, SunevenThe upper threshold is about 1.75, SMCThe lower threshold is about 0.53. In a preferred embodiment,SlohThe upper threshold is about 0.76 and the lower threshold is about 0.22.
Products, systems and apparatus for carrying out the method of the invention
The invention also provides products, systems and devices for performing a test sample parental predisposition assay, for performing a test sample parental contamination predisposition assay, and/or for detecting progeny heteroploids in a test sample.
Product(s)
In one aspect, the present invention provides an apparatus for detecting a predisposition of a test sample parent, said apparatus comprising:
at least one processor and at least one memory, the at least one memory stored with code thereon, which when executed by the at least one processor, causes the apparatus to perform a parental predisposition detection method of the invention, or a parental contamination identification method of the invention, or a progeny ploidy abnormality detection method of the invention.
Preferably, the code, when executed by the at least one processor, causes the apparatus at least to perform:
receiving sequence information data, e.g. genetic variation information (e.g. SNP information) of the subject sample (and optionally of the reference sample),
-analyzing the parental predisposition of the test sample based on the received sequence information data.
In a preferred embodiment, the code, when executed by the at least one processor, causes the apparatus to further perform:
-determining whether there is parental contamination of the test sample based on the analyzed parental predisposition of the test sample.
In yet another preferred embodiment, the code, when executed by the at least one processor, causes the apparatus to further perform:
-determining whether a ploidy abnormality exists in the progeny DNA in the test sample based on the analyzed parental predisposition of the test sample.
In a preferred embodiment, the present invention also provides a non-transitory computer readable storage medium having instructions thereon for performing the parental predisposition detection method of the present invention, comprising:
one or more instructions for receiving input comprising genetic variation information (e.g., SNP information) of a test sample (and optionally a reference sample),
-one or more instructions for analyzing the inputted genetic variant sequence information to determine the parental predisposition of the subject sample;
-optionally, one or more instructions for outputting a parental predisposition to a test sample.
In a preferred embodiment, the medium further comprises:
-one or more instructions for determining whether there is parental contamination of the test sample based on the analyzed parental predisposition of the test sample.
In a further preferred embodiment, the medium further comprises:
-one or more instructions for determining whether a ploidy abnormality exists in the progeny DNA in the test sample based on the analyzed parental predisposition of the test sample. Preferably, the apparatus of the present invention comprises the following modules, or the computer readable storage medium of the present invention carries instructions for executing the following modules:
(1) a sequence information data acquisition module: for obtaining gene variation sequence information (e.g., SNP information) data of a test sample and/or a reference sample;
(2) allele frequency analysis module: sequence information data for analyzing module (1) to determine MAF and/or FAF of the sample at parental unequal homozygous allelic sites of the selected DNA segment;
(3) a parent tendency statistic value determination module: for analyzing the MAF and/or FAF data obtained by module (2) to determine a measure S of the parental predisposition of the sample on a selected DNA segmentPOR
(4) Optionally, parental predisposition SPORA threshold determination module for determining S of the reference sample from module (3)PORDetermining parental predispositionSPORA threshold value;
(5) parent propensity determination module: test sample S for determination by the comparison module (3)PORAnd tendency to parentage SPORThreshold value, determining parent tendentiousness of the tested sample;
(6) optionally, a report output module: and (5) processing and integrating the data obtained in the steps (1) to (5) to generate a report.
In a preferred embodiment, the device is used to report a predisposition to contamination by a sample (e.g., an SEM sample), such as a parent or parent source.
In yet another preferred embodiment, the apparatus is used for reporting ploidy abnormality of progeny DNA in a test sample, the apparatus further comprising the following modules:
(7) heterozygous site rate determination module: sequence information data for analysis module (1) to determine the total number of parental unequal homozygous allelic sites and the number of non-parental and maternal preferential loci of a sample in a selected DNA segment, thereby determining the heterozygous locus rate S of the sample in said segmentlohAnd the heterozygosity SlohComparing the threshold values;
(8) optionally, a heterozygosity SlohA threshold determination module for determining S of the reference sample from the module (7)lohDetermining the heterozygous site ratio SlohThe threshold value is set to a value that is,
(9) heterozygous site rate comparison module: for comparing the sample heterozygous site rate and heterozygous site rate S determined by module (7)lohComparing threshold values;
(10) a progeny heteroploid determination module: for determining offspring heteroploids from the analytical data of modules (5) and (7),
wherein, if the S of the test sample containing the progeny DNAlohReference system SlohIs equivalent to or slightly smaller than SunevenIs remarkable in that>Reference system SunevenIf so, prompting that the offspring is the maternal trisomy or the maternal triploid; otherwise, SunevenIs remarkable in that<Reference system SunevenPrompting that the offspring is a father source triploid or a father source triploid;
s of test sample containing progeny DNAlohIs remarkable in that<Reference system Sloh,Suneven>>Reference system SunevenIf so, the child generation is suggested to be a maternal uniparental diploid or haploid; otherwise, Suneven<<Reference system SunevenThen, the child generation is suggested to be a parent source uniparental diploid or haploid.
System for controlling a power supply
In one aspect, the present invention provides a system for the detection of a predisposition of a test sample of a test parent, said system comprising a device configured to enable the method for the detection of a predisposition of a test sample of a test parent according to the present invention, or the method for the identification of a parent contamination according to the present invention, or the method for the detection of a ploidy abnormality of progeny according to the present invention. In one embodiment, an apparatus is configured to:
-receiving an input comprising genetic variant sequence information of a test sample and/or a reference sample,
-performing a parental predisposition detection of the test sample based on the entered sequence information.
In a preferred embodiment, the apparatus is further configured to:
-determining whether there is parental contamination of the test sample based on the analyzed parental predisposition of the test sample; or
-determining whether a ploidy abnormality exists in the progeny DNA in the test sample based on the analyzed parental predisposition of the test sample.
In a preferred embodiment, the device is a device of the present invention as described above for use in the detection of predisposition to a test sample of a parent, identification of contamination, and/or detection of progeny heteroploids.
In the system of the present invention, it may further include:
-amplification means for single cell whole genome amplification, preferably MALBAC amplification, of a test sample and/or a reference sample;
sequence information detection means for performing gene variation sequence information detection on the amplification product, including but not limited to polymorphic site (e.g. SNP) detection, sequencing detection.
In a preferred embodiment, the system is used to determine the propensity for parental contamination, e.g., parent or parent contamination, by a test sample (e.g., an SEM sample).
In yet another preferred embodiment, the system is used to determine ploidy abnormalities in progeny DNA in a test sample.
In yet another aspect, the system of the present invention may comprise an apparatus comprising:
the amplification unit is used for performing single-cell whole genome amplification on the sample to be tested and/or the reference sample;
the detection and analysis unit is used for detecting and analyzing the gene variation information of the amplification product obtained by the amplification unit;
and the sample parent tendency determining unit is used for determining the parent tendency of the sample from the gene variation information obtained by the detecting and analyzing unit.
Preferably, the system of the invention will comprise a tool for genomic sequence information query, and a programmed memory or medium for allowing a computer to analyze the resulting data. Sequence information query data (including, for example, sequencing data sets, SNP data sets, gene mutation site data sets, genotyping data sets), may be stored data sets, or in "live-action or run-on (fly)" form. As used herein, "data set" encompasses both types of data sources.
The tool for the genomic sequence information query is not particularly limited. In a preferred embodiment, a high density SNP chip is used. In another preferred embodiment, high throughput sequencing equipment is used to obtain higher depth sequencing data for related individuals of progeny.
The present invention may be executed by a computer. The invention therefore also provides a computer programmed to carry out the above method. The computer typically includes: a CPU communicatively interfaced with the computer, a system memory (RAM), a non-transitory memory (ROM), and one or more other storage devices such as a hard drive, a floppy drive, and a CD ROM drive. The computer may also include a presentation device, such as a printer, CRT monitor or LCD presentation, and an input device, such as a keyboard, mouse, pen, touch screen or voice activated system. The input device may receive data, such as directly from a sequence information query tool through an interface.
Application of the computer product, system and device of the invention
In one aspect, the present invention also provides the use of a device, system and apparatus according to the present invention for performing a test sample parental predisposition detection, for a test sample parental contamination predisposition detection, and/or for detecting progeny heteroploids in a test sample.
In a further aspect, the present invention also provides the use of a device, system and apparatus according to the present invention in the preparation of a product for performing a test sample parental predisposition detection, for a test sample parental contamination predisposition detection, and/or for the detection of progeny heteroploids in a test sample.
Examples
Materials and methods
Materials:
obtaining aborted fetus tissues and peripheral blood of both parents of the aborted fetus tissues for extracting fetus gDNA and parent gDNA; and obtaining IVF blastotrophoblast cell biopsies and embryo culture fluid (SEM) for single cell genome amplification. All studies were approved by the institutional ethics committee and written informed consent prior to performance.
Blastocyst biopsy
Embryos were discarded on 5/6 days after in vitro fertilization and used for biopsy. Firstly, a laser membrane rupture instrument is used for carrying out auxiliary hatching on a small hole with the diameter of 10-15 mu m on a transparent belt on the opposite side of a cell mass in a blastocyst, the cell mass is continuously cultured for 4-6 h, after the trophoblast cells are hatched, partial cells are obtained by biopsy through laser cutting combined with a suction method, when the blastocyst is not hatched, partial trophoblast cells are sucked from the opening of the transparent belt under negative pressure through a biopsy needle for biopsy, and the blastocyst is continuously cultured after biopsy.
Obtaining of SEM culture solution
1) Excess waste embryos for intracytoplasmic single sperm injection (ICSI) were obtained under a line microscope.
2) The embryos, which had been stripped as cleanly as possible of the granulosa cells surrounding the oocytes, were cultured to Day3, the medium was changed, and the embryos were placed in a droplet of 25 μ l blastocyst medium and cultured in an incubator at 37 ℃ in 5% CO2, 5% O2. When a small number of granulosa cells remained in the zona pellucida of the D3 embryo, granulosa cells were removed. Straightening the pasteur tube and removing the particles again; in the blastocyst culture liquid drop, repeatedly blowing and sucking to remove all granular cells as much as possible.
3) Transferring the embryo without the granular cells into a new blastocyst culture solution drop for continuous culture, and when the culture is carried out to D4 afternoon, carrying out solution changing and cleaning for 2-3 times:
washing 3 drops per embryo, transferring them into marked blastocyst plates and continuing the culture.
In washing, each embryo uses a dedicated glass capillary, 3 dedicated washing droplets, and cannot be used alternately.
4) After 2-3 washes, the embryos are placed into 25 μ l blastula medium microdroplets. All embryos were cultured in a single microdroplet culture at 37 ℃ in an incubator with 5% CO2, 5% O2.
5) When the blastocyst developed to stage 4 (generally D5/D6), a culture medium of embryos rated at 4BC or higher with fully expanded blastocysts was collected.
Tissue gDNA preparation
Genome DNA extraction kit (Tiangen Biochemical, general genome DNA extraction kit, DP304) is adopted to extract genome DNA from 0.5g of aborted fetus tissue (not less than) and 2mL of father affinity/mother peripheral blood sample (not less than) according to the description of the product specification.
Extracted genomic DNA was quantified using a Qubit dsDNA HS Assay kit using a Qubit 3.0 (DNA concentration should be 30 ng/. mu.L or higher). Thereafter, gDNA fragmentation was performed using the yikang DNA fragmentation kit (KT100804248) according to the manufacturer's instructions.
Preparing the following DNA fragmentation reaction system, fully and uniformly mixing, and standing on ice:
Figure BDA0003412626970000231
the following reaction conditions were set on the PCR instrument and the sample was placed in the instrument to start the reaction:
Figure BDA0003412626970000241
after the reaction was complete, the sample was removed and 60 μ L of AMpureXP DNA purification beads (Agencourt) were added and the fragmented gDNA was purified according to the manufacturer's instructions. The purified product was quantified using the Qubit dsDNA HS Assay kit.
Preparation of mother source pollution simulation mixed sample
And (3) mixing the purified fragmented fetal gDNA and maternal gDNA in proportion after dilution or not according to a quantitative result to prepare a simulated mixed sample with maternal DNA in a specified proportion.
Single cell whole genome amplification preparation
3-5 IVF blastocyst trophoblast cells or 20ul of SEM culture medium were transferred to 5. mu.l of a lysate (30 mM Tris-Cl, 2mM EDTA, 20mM KCl, 0.2% Triton X-100, pH 7.8) and subjected to MALBAC two-step method using a ChromInst amplification kit (cat. No. XK-005-96) from Yikang Gene Co., Ltd. The operation steps are as follows:
the lysis system was prepared as follows:
Figure BDA0003412626970000242
the following reaction conditions were set on the PCR instrument and the sample was placed in the instrument to start the lysis reaction:
Figure BDA0003412626970000243
after the reaction was complete, pre-amplification reagents were added to the PCR tubes as follows:
Figure BDA0003412626970000244
the following reaction conditions were set on the PCR instrument and the sample was placed in the instrument to start the reaction:
Figure BDA0003412626970000245
Figure BDA0003412626970000251
after the reaction was complete, a second round of amplification reagents were added to the PCR tube as follows:
Figure BDA0003412626970000252
the following reaction conditions were set on the PCR instrument and the sample was placed in the instrument to start the reaction:
Figure BDA0003412626970000253
after the reaction was completed, 65 μ L of AMpureXP magnetic beads were added to the PCR tube, mixed well, and the amplified product was purified with magnetic beads according to the manufacturer's instructions. mu.L of the purified amplification product was taken and quantified using the Qubit dsDNA HS Assay Kit (Invitrogen, Q32584).
Genotyping assay
Sample processing and signal detection of Illumina Infinium ASA chips were performed using 200ng of purified fragmented gDNA and 100ng of single cell whole genome amplification product according to Standard Operating Protocol (SOP) provided by the manufacturer.
After genetic detection, SNP genotyping detection was performed using Illumina Infinium ASA chip (Illumina, product number: 20016317). After the chip scanning data is obtained, BAF information (value of 0-1) of each polymorphic site is obtained by analyzing with a command line program iaap-cli and gencall algorithm of Illumina. The following genotypic data quality control criteria were used: as long as the sample level Call Rate is greater than 30% for subsequent analysis.
Example 1 comparison of BAF distribution in gDNA samples with Single cell Whole genome amplification products
For identification of parental predisposition or contamination, for gDNA samples that do not require single cell whole genome amplification, such as detection samples from NIPT, it is common to identify whether there is contamination or parental predisposition by allele relative frequency, such as BAF (B allele frequency) of the chip.
The general calculation principle for the proportion of maternal contamination in a gDNA sample from a fetus is shown below.
Figure BDA0003412626970000261
Based on this general calculation principle, 7 different cases can be distinguished according to the genotyping information of the fetus and mother. The calculation principle is based on the assumption that: the maternal and paternal allele counts of the fetus fit a binomial distribution. Thus, assuming a maternal contamination ratio in the sample of r, the B genotype frequency (BAF) in the fetal sample is (1-r) x (ratio of B alleles in fetal genotype) + r x (ratio of B alleles in maternal genotype). For example, if the genotype of both the fetus and mother is AA, BAF in the sample is 0; if the genotype of the fetus is AA and the genotype of the mother is AB, BAF in the sample is 0.5 r; and so on. Similarly, the same computational principles can be used to estimate whether a fetus has a particular parental predisposition.
The BAF patterns of the following different samples were examined to investigate the applicability of the above general calculation principles to the different samples:
fetal gDNA samples: extracting 200ng of a purified fragmented gDNA sample from a productive fetal tissue;
maternal-contaminated gDNA samples: purified fragmented gDNA sample from aborted fetal tissue with 30% maternal contamination, 200ng (wherein fetal gDNA is 140ng and maternal gDNA is 60 ng);
single cell amplification product samples: single cell amplification products of 3-5 IVF blastotrophoblast biopsy cells, 100 ng.
The sample is subjected to sample processing and signal detection of the Illumina Infinium ASA chip according to Standard Operation Protocol (SOP) provided by Illumina manufacturers, and chip scanning data is obtained. And then carrying out data analysis by utilizing iaap-cli and gencall algorithm of Illumina to obtain an LRR (log R ratio) value and a BAF (B allow frequency) distribution mode.
As shown in fig. 1, the BAF distribution pattern of a sample of contaminant gDNA formed by adding 30% maternal gDNA to fetal gDNA showed substantial agreement with that expected from the general calculation principles described above, divided into 7 clusters: the BAF center values for each cluster were 0, 0.15, 0.35, 0.5, 0.65, 0.85, and 1, respectively. This demonstrates that maternal contamination of fetal gDNA can be judged by routine BAF profiling.
However, comparing the fetal gDNA sample and the amplification product sample, both were found to exhibit very different BAF distributions. FIG. 2A shows the BAF distribution of fetal gDNA samples (panel A of FIG. 2A) and amplification product samples (panel B of FIG. 2A) at selected sites where the parents are different homozygous alleles, such as where the parents are AA and the mothers are BB, or where the parents are BB and the mothers are AA. In principle, in a non-contaminating embryonic sample, the genotype at these selected sites should be AB (BAF ═ 0.5) according to mendelian genetic rules. As shown in fig. 2A, the BAF distribution of the fetal gDNA samples substantially conformed to this expectation. However, as shown in fig. 2B, for single cell whole genome amplification products from trophoblast cells, genotypes between 0-0.5 and 0.5-1 for AA (BAF ═ 0), BB (BAF ═ 1), and BAF appeared at many sites. Similar events occur with the amplification of SEM culture. FIG. 2B shows a BAF profile of ChromInst amplification products from embryo blastocyst broth, wherein (A) panel shows BAF profile without maternal contamination; (B) the panel shows the BAF profile with a parent contamination ratio of about 30%.
These results indicate that for single cell amplification products, the relative allele frequencies do not reflect its original biological status due to problems such as allele amplification bias and even Allele Dropout (ADO), which is affected by allele amplification bias. In this case, it is clear that the conventional BAF approach would make it difficult to achieve identification of parental predisposition/parental contamination.
Therefore, there is a need to develop a method for identifying parental tendencies or contaminations efficiently by using the single cell genome-wide amplification product. Example 2SEM identification of contamination of culture solution
MCParental predisposition statistic S determination
Adopting a gene variation detection method based on a nucleic acid chip, and obtaining SNP detection data for a product obtained by single-cell whole genome amplification of an SEM sample by using a Yikang ChromInst amplification kit and a father gDNA sample of a corresponding embryo.
After genetic testing, BAF values for each polymorphic site were obtained for the father and mother as well as for SEM samples using iaap-cli and gencall algorithms. The BAF value is a value between 0 and 1; wherein, homozygous B allele site (BB), BAF ═ 1; homozygous a allele site (AA), BAF ═ 0; heterozygous allelic site (AB), BAF ═ 0.5.
According to the chart 1, selecting SNP site with father AA genotype and mother BB genotype, determining BAF value (between 0-1) of SEM sample at the site, and counting BAF greater than or equal to 0.6 (N)MB) The sum of the number of sites of (B) and BAF is less than or equal to 0.4 (N)PA) The number of sites. Similarly, selecting SNP locus with BB as father and AA as mother, determining BAF value (between 0-1) of biopsy sample at the locus, and counting BAF less than or equal to 0.4 (N)MA) The number of sites and BAF of (N) is not less than 0.6PB) The number of sites.
Using the formula: sMC=(NMB+NMA)/(NPA+NPB)
Constructing a parental predisposition statistic S for an SEM sampleMC. If SMCIf the number is obviously more than 1, the maternal tendency is prompted; if SMCIf the value is obviously less than 1, the father source tendency is prompted; if SMCNo significant difference from 1 suggests no parental bias (fig. 1).
Establishing a contamination-free orientation reference system
Selecting 35 samples of embryo culture solution which are specially processed and confirmed to have no maternal pollution; and simultaneously obtaining a peripheral blood sample of the father mother of the corresponding embryo.
The manner and amount of embryo culture fluid and peripheral blood samples collected are as described in the materials methods.
After the embryo culture solution is collected, the following treatment is carried out and the absence of maternal pollution is confirmed by an STR identification mode:
confirmation of contamination-free culture broth sample collection:
at day 4 of embryo culture, the zona pellucida can be removed by the following procedure when the morula stage is reached: embryos were transferred from culture to incubation medium containing 0.5% pronase, covered with oil for 1-2min, 5% CO2 at 37 ℃. Under an optical microscope system, the zona pellucida was completely dissolved. After washing 2-3 times with blastocyst culture medium, the embryos were placed in 25 μ l blastocyst culture medium. Cultured at 37 ℃ in an incubator containing 5% CO2 and 5% O2.
When the blastocyst developed to stage 4 (generally D5/D6) and rated at 4BC or higher, the blastocyst was fully expanded, and the culture broth was collected.
STR identification:
1) taking 5 mu L of SEM sample, adding 0.5 mu L of proteinase K solution (with the concentration of 20 mu g/mu L), mixing uniformly, incubating at 55 ℃ for 15min, and treating at 95 ℃ for 1 min;
2) STR identification methods refer to patents: CN107557481A, CN 110157812A.
Single cell whole genome amplification was performed on a validated 10 μ L SEM using the Yikang ChromInst amplification kit in the manner described in materials and methods. 100ng of the amplified product and 200ng of parental gDNA were taken for nucleic acid chip detection to obtain SNP data. S was calculated for each of the 35 SEM samples as described aboveMCQuantity, construction of a reference System SMCAnd (4) density distribution graph.
As shown in fig. 5, a reference frame SMCThe distribution follows approximately a normal distribution. Thus, S is set using the mean. + -. 3X standard deviationMCThe upper limit and the lower limit are respectively 1.26 and 0.80. If SMC>1.26, suggesting that genetic materials have maternal pollution tendency; if SMC<0.80, it indicates that the genetic material isParent source contamination tendencies.
Sample blending experiment I:
samples of aborted fetuses, corresponding father peripheral blood and mother peripheral blood were obtained as described in materials and methods, and genomic DNA was extracted to obtain purified fragmented gDNA. After quantification with the Qubit dsDNA HS Assay kit, each sample was diluted to 1 ng/. mu.L.
Mixing the diluted samples according to the following proportion, shaking and uniformly mixing for 2-3s to prepare a simulated mixed sample with the parent source and the parent source DNA in a certain proportion:
Figure BDA0003412626970000281
all mock pooled samples were diluted 10-fold to a final concentration of 100 pg/. mu.L. Thus, for example, M9 and F9 will have a fetal DNA amount of 10pg/uL and a maternal or paternal DNA amount of 90 pg/uL.
1 μ L of the diluted mock pooled sample was taken and subjected to whole genome amplification using the Yikang ChromInst amplification kit, using the procedures described in materials and methods. After the amplification product is purified by magnetic beads, 2. mu.L of the purified amplification product is taken, and the amount of the Qubit is quantified. Detecting the residual amplification product of 100ng and the corresponding parent gDNA of 200ng by using an Illumina Infinium ASA chip; performing data analysis to calculate S of the simulated mixed sampleMC
The parent source sample blend experiment results are shown in fig. 6. As shown, all analog mixed samples SMCThe values of the statistics are all larger than a parent source pollution threshold value 1.26 determined according to a reference system; and shows a correlation with the amount of incorporation. S at 10%, 30%, 50%, 70% maternal genetic material incorporationMCStatistics are 1.32, 2.15, 5.64, and 18.16, respectively; 90% post-parent source blend SMCThe value was 3648.
The blending experiment shows that the constructed parent tendency statistic and the reference threshold value can effectively eliminate the interference information (such as ADO interference) caused by the amplification of the whole genome of the single cell and effectively determine the parent tendency of the sampleSex; and S obtained based on this blending experimentMCAnd obtaining the parent tendency proportion of the sample approximately, and reflecting the parent source pollution degree of the sample.
Sample blending experiment II:
according to the method described in the sample blending experiment I, a simulated mixed sample (with a concentration of 1 ng/. mu.L) with the parent source and the parent source DNA in a certain proportion is prepared. The mock pooled samples were diluted 10-fold with blank (i.e., unused) embryo culture medium to a final concentration of 100 pg/. mu.L.
1 μ L of the diluted mock pooled sample was taken and subjected to whole genome amplification using the Yikang ChromInst amplification kit, using the procedures described in materials and methods. After the amplification product is purified by magnetic beads, 2. mu.L of the purified amplification product is taken, and the amount of the Qubit is quantified. Detecting the residual amplification product of 100ng together with the corresponding parent gDNA200ng on an Illumina Infinium ASA chip; performing data analysis to calculate S of the simulated mixed sampleMC
Example 3 identification of embryo ploidy Using ICSI blastocyst trophoblast TE cells
uneven lohParental predisposition magnitude S and heterozygous site rate S determination
SNP detection data are obtained from a unicellular whole genome amplification product of a blastocyst trophoblast cell biopsy sample and a father gDNA sample of a corresponding embryo by adopting a nucleic acid chip-based gene variation detection method.
After genetic detection, using iaap-cli and gencall algorithms to obtain BAF values (values between 0 and 1) of father and mother and each polymorphic site of a biopsy sample; wherein, homozygous B allele site (BB), BAF ═ 1; homozygous a allele site (AA), BAF ═ 0; heterozygous allelic site (AB), BAF ═ 0.5.
As shown in FIG. 5, SNP sites with AA genotype in father and BB genotype in mother were selected, BAF values (between 0 and 1) at the sites corresponding to the biopsy samples were determined, and the number of sites with BAF of 0.6 or more (N) was countedMB) And the number of sites with BAF less than or equal to 0.4 (N)PA). Similarly, the parents are selected to be BB, and the mothers are selected to be AA genotypesThe BAF value (between 0 and 1) of the biopsy sample at the position is determined, and the statistic BAF is less than or equal to 0.4 (N)MA) The number of sites of (c) and the number of sites with BAF of 0.6 or more (N)PB). Counting the total number of SNP sites selected (N)total) And the total number of sites with BAF between 0.4-0.6 (N) in these sitesLOH)。
Using the formula: suneven=(NMB+NMA)/(NPA+NPB) Calculating a parental tendency statistic S of the sampleuneven
Using the formula: sloh=NLOH/NtotalAnd calculating the heterozygous site ratio of the sample.
Establishing a parental tendency-free reference system
31 known euploid (46, XX or 46, XY) biopsy ICSI blastocyst trophoblast cell samples were obtained. The manner and amount of TE biopsy samples and their parent peripheral blood samples were collected as described in the materials methods.
Whole genome amplification was performed on the aneuploid biopsy samples using the MALBAC amplification procedure described in materials and methods. And (3) carrying out gene chip platform detection on the sample amplification product (100ng) and the extracted peripheral blood gDNA (200ng) of the father mother of the sample amplification product according to the materials and the method to obtain SNP genotype information. As described above, S was calculated for each of 31 biopsy samplesunevenAnd SlohQuantity, construction of S at chromosome levelunevenAnd SlohThe distribution of the reference frame.
FIG. 8 shows SunevenDistribution of reference frame values on each chromosome (A) and its density distribution (B). Based on S in FIG. 8BunevenDensity distribution, using mean +3x standard deviation to set SunevenAn upper threshold of 1.75; and set S using the mean-2 standard deviationunevenThe lower threshold is 0.53.
FIG. 9 shows SlohDistribution of reference frame values on each chromosome (A) and its density distribution (B). Based on S in FIG. 9BlohDensity distribution, S is set by mean. + -. 3X standard deviationlohThe upper and lower limits of the threshold are 0.76 and 0.22, respectively.
Ploidy analysis of biopsy cell samples
Ploidy analysis was performed on samples of embryonic biopsy cells with known chromosomal abnormalities (duplications and deletions) of CNV.
The biopsy cell samples used in this experiment included:
5 embryo chromosome monomer biopsy samples;
4 embryonic chromosome trisomy biopsy samples;
embryo father origin triploid biopsy samples 1.
ICSI blastotrophoblast biopsy samples were subjected to whole genome amplification using the MALBAC amplification procedure described in materials and methods. Dividing the amplification product into two parts, establishing a library by one part (not less than 100ng), and detecting Copy Number Variation (CNV) by using a CNV-Seq method of next generation sequencing NGS (next generation sequencing), wherein the specific method is disclosed in CN105574361B patent; and (3) carrying out gene chip platform detection on the amplification product (100ng) of the other part of the sample and the extracted peripheral blood gDNA (200ng) of the father mother of the sample according to the materials and the method to obtain BAF information of the polymorphic sites.
The sequencing CNV assay results for each sample were as follows:
sample(s) Karyotype
dup1 47,XN,+22(×3)
dup2 47,XN,+7(×3)
dup3 47,XN,+15(×3)
dup4 47,XN,+22(×3)
del1 45,XN,-21(×1)
del2 45,XN,-2(×1)
del3 45,XN,-18(×1)
del4 45,XN,-13(×1)
del5 45,XN,-22(×1)
The amplified product (100ng) of the sample and parental peripheral blood gDNA (200ng) corresponding to the sample were subjected to gene chip analysis as described in the materials and methods to obtain SNP genetic variation data. In the same manner as the reference system, the parental predisposition statistic S at the chromosome level for each biopsy cell sample was calculatedunevenAnd heterozygous site rate Sloh(ii) a And S determined based on the above-mentioned reference systemunevenThreshold (0.53-1.75) and SlohThe thresholds (0.22-0.76) are compared.
If sample SlohReference system SlohIs equivalent to or slightly smaller than SunevenIs remarkable in that>Reference system SunevenThen the sample is suggested to be maternal trisomy (chromosome level) or maternal triploid (maternal chromosome is copied more than one set); otherwise, SunevenIs remarkable in that<Reference system SunevenIt suggests that the father-derived trisomy (chromosome level) or the father-derived triploid (the father-derived chromosome is duplicated in one set).
If SlohIs remarkable in that<Reference systemSloh,Suneven>>Reference system SunevenPrompting the maternal uniparental diploid or haploid; otherwise, Suneven<<Reference system SunevenAnd prompting the parent source monad diploid or haploid.
Through parental predisposition statistic SunevenAnd heterozygous site rate SlohThe determined ploidy result is basically in line with the expectation, which shows that the ploidy abnormality of the embryo can be effectively determined in a small number of biopsy cells by using the statistic value of parental tendency.
As shown in FIG. 10, in the known maternal trisomy embryo biopsy cell (dup 1-4), SlohReference system SlohRather, all are within the threshold range; and SunevenIs remarkable in that>Reference system Suneven
As shown in FIG. 10, in the known chromosome monomer embryo biopsy cell (del 1-5), SlohAre all significant<Reference systemSlohAnd the maternal haploid del2 has Suneven>>Reference system Suneven(ii) a Parent haploid del 1 and 3-5 with Suneven<<Reference system Suneven
As shown in FIG. 11, in the known triploid embryo biopsy cells of paternal origin, 22 autosomes of SlohAre all in the reference system SlohIs equal to, and SunevenIs remarkable in that<Reference system Suneven
Some embodiments of the invention
1. A method for determining the parental predisposition of a test sample, wherein said test sample comprises progeny genomic DNA in an amount of no more than 1ng (and preferably about 1-500pg, more preferably 1-100pg), wherein said method comprises the steps of:
(1) performing gene variation site (preferably SNP site) analysis on the single-cell whole genome amplification product of the tested sample;
(2) obtaining a frequency of Maternal Alleles (MAF) and/or a frequency of Paternal Alleles (PAF) of the sample at parentally unequal homozygous loci over a selected DNA segment (e.g., at the whole genome level, or chromosome segment level);
(3) classifying the genetic locus as a paternal predisposition locus, a maternal predisposition locus, or a non-paternal and maternal predisposition locus based on the MAF and/or PAF determined in step (2) compared to classification thresholds a and b, wherein classification threshold a is a number from 0.1 to 0.4, classification threshold b is a number from 0.6-0.9, and a + b is 1;
wherein the content of the first and second substances,
-classifying said site as a parent source propensity site if MAF ≦ a and/or PAF ≧ b;
-classifying said site as a maternal predisposition site if PAF ≦ a and/or MAF ≧ b;
-classifying said site as non-parent and parent prone site if MAF and/or PAF has > a and < b value;
preferably, the classification threshold a is 0.4; and b is 0.6, and
-classifying said site as a parent origin propensity site if MAF ≦ 0.4 and/or PAF ≧ 0.6;
-classifying said site as a maternal predisposition site if PAF ≦ 0.4 and/or MAF ≧ 0.6;
-classifying said site as non-parent and parent prone site if MAF and/or PAF has a >0.4 and <0.6 value;
(4) counting the number of maternal tendency sites (N) of said test sample on said selected DNA segmentMAF) And number of parent Source orientation sites (N)PAF);
(5) Number of dominant sites (N) based on parent sourceMAF) And number of parent Source orientation sites (N)PAF) Determining the parental tendency statistic (S) of the sample at the level of said DNA segmentPOR);
(6) The sample S determined in the step (5) isPORValue, and parental predisposition threshold (i.e., S)PORThreshold) to determine the parental predisposition of the test sample at the level of said DNA segment.
2. The method of embodiment 1, wherein SPORThreshold using no parental predispositionThe establishment of the reference system is carried out,
preferably, the reference system consists of 1-40 or more reference samples without parental predisposition;
preferably, the statistical value S is based on the parental tendentiousness of the reference systemPOR1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the reference SPORA threshold value;
if S of the samplePORGreater than the reference SPORAn upper threshold indicating a maternal predisposition to the test sample on the DNA segment; if S of the samplePORIf the DNA segment is less than the lower threshold, the DNA segment is suggested to be subjected to the paternal tendency of the sample.
3. The method of embodiment 1, wherein said single cell whole genome amplification is performed using a method selected from the group consisting of: primer extension PCR (PEP-PCR) before Amplification, Degenerate oligonucleotide Primer PCR (DOP-PCR), Multiple Displacement Amplification (MDA), Multiple Annealing circular Amplification (MALBAC), flat-end or cohesive-end ligation library construction, PCR Amplification, and PCR Amplification,
Figure BDA0003412626970000321
(US20190106738A1);
Wherein, preferably, the single cell whole genome amplification is performed using MALBAC, more preferably, using
Figure BDA0003412626970000322
Performing the whole genome amplification.
4. The method of embodiment 1, wherein, in step (2), BAF values of said test sample at said parental non-isohomozygous locus are obtained based on, for example, nucleic acid chip analysis, and the paternal allele frequency and/or maternal allele frequency of the sample at said locus is determined based on said BAF values.
5. The method of embodiment 4, wherein the method comprises:
in the selection ofDetermining the site with father AA genotype and mother BB genotype on the DNA segment, calculating BAF value of the sample at the site, and counting BAF is greater than or equal to classification threshold b (preferably 0.6) (N)MB) And BAF ≦ Classification threshold a (preferably 0.4) (NPA) The number of sites of (a); and, determining the site of the father as BB and the mother as AA genotype, calculating the BAF value of the sample at the site, and counting the BAF ≦ classification threshold a (preferably 0.4) (N)MA) And BAF ≧ classification threshold b (preferably 0.6) (N)PB) The number of sites of (a);
-calculating the number of maternal tendency sites (N) of the test sample on said DNA segmentMAF) And number of parent Source orientation sites (N)PAF) Wherein
NMAF=NMB+NMA;NPAF=NPA+NPB
-constructing a parental tendency statistic S of the test samplePORWherein
SPOR=NMAF/NPAF=(NMB+NMA)/(NPA+NPB)。
6. The method of embodiment 1, wherein in step (2) allele depth values (AD) of the parental and maternal alleles of the test specimen at the parental non-isozygous genomic sites are obtained on said DNA segment selected, based on analysis such as NGS sequencing, i.e. ADFatherAnd ADFemaleAnd determining the paternal allele frequency and/or maternal allele frequency of the subject sample at the locus based on the AD value:
wherein the maternal allele frequency of the sample at the locus; MAF ═ ADFemale/(ADFather+ADFemale);
Wherein the paternal allele frequency of the sample at the locus; PAF ═ ADFather/(ADFather+ADFemale)。
7. The method of embodiment 6, wherein the MAF ≧ sorting threshold b (preferably 0.6) (N) of the sample to be tested on the selected DNA segment is countedMAF) And PAF ≧ classification threshold b (preferably 0.6) (N)PAF) Number of sites of (a) to construct S of sample specimenPORWherein
SPOR=NMAF/NPAF
8. The method according to any of embodiments 1-7, wherein the method is used to identify parental DNA contamination in a progeny micro DNA sample, wherein a parental predisposition of a sample indicates the likelihood of the sample having parental or maternal contamination,
preferably, the sample is SEM embryo culture fluid, the method comprising: obtaining, at the whole genome level, the frequency of the Maternal Allele (MAF) and/or the frequency of the Paternal Allele (PAF) of a test sample at all of the parental inequality homozygous loci of said segment; and constructing a parent contamination propensity value S of the sampleMCWherein
SMC=SPOR=NMAF/NPAF
9. The method of embodiment 8, wherein the reference frame consists of SEM samples without parental predisposition, such as 1-40 reference SEM samples,
preferably, a parental predisposition statistic S is established for each SEM reference frame sampleMCAnd based on a reference system SMCMean. + -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations), set the parental predisposition SMCA threshold value, wherein,
s if tested SEM sampleMCIf the sample size is larger than the upper threshold, the tested SEM sample is prompted to have a parent source pollution tendency;
s if tested SEM sampleMCIf the sample is smaller than the lower threshold, the SEM sample to be tested is prompted to have a father source pollution tendency;
preferably, SMCThe upper threshold is 1.26, SMCThe lower threshold is 0.80; s of SEM sampleMC>1.26, indicating that the SEM sample has a parent source pollution tendency; s of SEM sampleMC<0.80, the SEM sample is suggested to have a tendency to be contaminated by the parent source.
10. The method of embodiments 8-9, wherein the subject SEM sample has less than 50%, 40%, more preferably less than 10%, or less than 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% or 1%, or less than 0.1% or less of parental DNA contamination (e.g., parental or maternal DNA contamination).
11. The method according to embodiments 8-10, wherein the ratio of fetal DNA to affinity-contaminating DNA in the subject SEM sample is 1:9 to 9: 1.
12. The method of embodiment 1, wherein said method is used for identifying chromosomal ploidy abnormalities in progeny in a trace DNA sample from the progeny, wherein a parental predisposition of a sample indicates the likelihood of chromosomal ploidy abnormalities in the progeny, wherein preferably said sample is obtained by comparison with biopsy cells, such as IVF blastocyst trophoblast cells,
preferably, the indication of predisposition by said parent is selected from: a parental trisomy, a parental triploid, a maternal trisomy, a maternal triploid, a parental diploid or haploid, a maternal parental diploid or haploid, and chromosomal ploidy variations of any combination thereof.
13. The method of embodiment 12, wherein, on said DNA segment (in particular, at the chromosomal level), the frequency of the Maternal Allele (MAF) and/or the frequency of the Paternal Allele (PAF) of a test sample on the parental unequal homozygous locus of said segment is obtained; and constructing a parental tendency magnitude S of the sampleunevenWherein
Suneven=SPOR=NMAF/NPAF
14. The method of embodiments 12-13, wherein the method further comprises,
-counting the total number of parents-unequal homozygous sites (N) of the test sample on said selected DNA segmenttotal) And number of non-parent and parent dominant sites (N)LOH);
Based on NLOHAnd NtotalDetermining the heterozygous site rate (S) of the test sample at the level of said DNA segmentloh) (ii) a And
the determined heterozygous site ratio of the test specimen (i.e., sample S)lohValue) and heterozygous site rate threshold (i.e., S)lohThreshold) are compared.
15. The method of embodiments 12-14, wherein the method further comprises:
establishing S using a reference frame without parental predispositionunevenThreshold value and optionally SlohA threshold, preferably, the parental predisposition-free reference frame consists of 1-40 or more euploid biopsy samples;
preferably, a reference system S is utilizedunevenMean. + -. 1-5 standard deviations (preferably 2-3, especially 3) to set the parental predisposition SunevenThreshold value, preferably, SunevenAn upper threshold of about 1.75 and a lower threshold of about 0.53;
preferably, a reference system S is utilizedlohMean. + -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set SlohThreshold value, preferably, SlohThe upper threshold is about 0.76 and the lower threshold is about 0.22.
16. The method of embodiments 12-15, wherein the method comprises:
construction of S at the chromosomal level of the test specimenunevenStatistics and predetermined S based on a parental predisposition-free reference frameunevenComparing the threshold values;
construction of S at the chromosome level of the test specimenlohA value, and S predetermined based on a parental predisposition-free reference framelohComparing the threshold values;
if receiving the sample book SlohIn a reference system SlohWithin a threshold value, SunevenGreater than the reference system SunevenPrompting the maternal trisomy or the maternal triploid if the upper limit of the threshold value is reached; otherwise, SunevenSmaller than the reference system SunevenA lower threshold value, prompting a father source trisomy or a father source triploid;
if SlohSmaller than the reference system SlohLower threshold, SunevenGreater than the reference system SunevenAn upper threshold, such as greater than 10,12,15,17,19 or 20, indicates a maternal unipolar diploid or haploid; otherwise, SunevenSmaller than the reference system SunevenAn upper threshold, e.g., less than 0.2,0.15,0.1,0.05,0.02 or 0.01, indicates that the parental haploid or diploid or haploid is of parent origin。
17. The method of embodiments 12-16, wherein the parental predisposition measurement and heterozygous site rate for a test sample on each of a plurality of autosomes is constructed to determine the likelihood that the test sample is a maternal or paternal triploid, preferably,
if the test sample shows parental tendency and heterozygous locus rate which indicate the maternal trisomy on more than 15 chromosomes (for example, 15, 16,17,18,19,20,21 or 22), judging that the test sample is possibly the maternal triploid;
if the test sample shows parent tendencies and heterozygous site rates suggesting the parent origin of the triploid over 15 chromosomes (e.g., 15, 16,17,18,19,20,21, or 22), the test sample is judged to be possibly the parent origin of the triploid.
18. An apparatus, device or system, characterized in that,
-optionally, being able to perform a single cell whole genome amplification of the test sample and optionally of the reference sample;
optionally, detection of genetic variation (preferably SNP) information of the progeny genome of the obtained single cell whole genome amplification product can be performed, e.g., wherein said information is determined by nucleic acid chip or NGS sequencing;
-being able to perform the method according to any one of embodiments 1 to 17, based on the genetic variation information of the test sample and optionally of the reference line, to identify parental predisposition of the test sample, or to detect parental contamination of the test sample, or to identify ploidy abnormalities of progeny DNA in the test sample.
19. Use of the device, apparatus or system of embodiment 18,
the application of the DNA ploidy abnormality identification kit in the detection of the parental tendency of a test sample, the detection of the parental pollution of the test sample, the identification of the ploidy abnormality of the filial generation DNA in the test sample, and the preparation of products for identifying the parental tendency of the test sample, the detection of the parental contamination of the test sample, or the identification of the ploidy abnormality of the filial generation DNA in the test sample.

Claims (10)

1. A method for determining the parental predisposition of a test sample, wherein said test sample comprises progeny genomic DNA in an amount of no more than 1ng (and preferably about 1-500pg, more preferably 1-100pg), wherein said method comprises the steps of:
(1) performing gene variation site (preferably SNP site) analysis on the single-cell whole genome amplification product of the tested sample;
(2) obtaining a frequency of Maternal Alleles (MAF) and/or a frequency of Paternal Alleles (PAF) of the sample at parentally unequal homozygous loci over a selected DNA segment (e.g., at the whole genome level, or chromosome segment level);
(3) classifying the genetic locus as a paternal predisposition locus, a maternal predisposition locus, or a non-paternal and maternal predisposition locus based on the MAF and/or PAF determined in step (2) compared to classification thresholds a and b, wherein classification threshold a is a number from 0.1 to 0.4, classification threshold b is a number from 0.6-0.9, and a + b is 1;
wherein the content of the first and second substances,
-classifying said site as a parent source propensity site if MAF ≦ a and/or PAF ≧ b;
-classifying said site as a maternal predisposition site if PAF ≦ a and/or MAF ≧ b;
-classifying said site as non-parent and parent prone site if MAF and/or PAF has > a and < b value;
preferably, the classification threshold a is 0.4; and b is 0.6, and
-classifying said site as a parent origin propensity site if MAF ≦ 0.4 and/or PAF ≧ 0.6;
-classifying said site as a maternal predisposition site if PAF ≦ 0.4 and/or MAF ≧ 0.6;
-classifying said site as non-parent and parent prone site if MAF and/or PAF has a >0.4 and <0.6 value;
(4) counting the number of maternal tendency sites (N) of said test sample on said selected DNA segmentMAF) And number of parent Source orientation sites (N)PAF);
(5) Based on mother source is inclinedNumber of tropism sites (N)MAF) And number of parent Source orientation sites (N)PAF) Determining the parental tendency statistic (S) of the sample at the level of said DNA segmentPOR);
(6) The sample S determined in the step (5) isPORValue, and parental predisposition threshold (i.e., S)PORThreshold) to determine the parental predisposition of the test sample at the level of said DNA segment.
2. The method of claim 1, wherein SPORThe threshold is established using a reference frame without parental predisposition,
preferably, the reference system consists of 1-40 or more reference samples without parental predisposition;
preferably, the statistical value S is based on the parental tendentiousness of the reference systemPOR1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the reference SPORA threshold value;
if S of the samplePORGreater than the reference SPORAn upper threshold indicating a maternal predisposition to the test sample on the DNA segment; if S of the samplePORIf the DNA segment is less than the lower threshold, the DNA segment is suggested to be subjected to the paternal tendency of the sample.
3. The method of claims 1-2, wherein the method comprises:
-determining the site of the father AA genotype and the mother BB genotype on said selected DNA segment, calculating the BAF value of the test sample at said site, and counting the BAF ≧ the classification threshold b (preferably 0.6) (N)MB) And BAF ≦ Classification threshold a (preferably 0.4) (NPA) The number of sites of (a); and, determining the site of the father as BB and the mother as AA genotype, calculating the BAF value of the sample at the site, and counting the BAF ≦ classification threshold a (preferably 0.4) (N)MA) And BAF ≧ classification threshold b (preferably 0.6) (N)PB) The number of sites of (a);
-calculating the number of maternal tendency sites (N) of the test sample on said DNA segmentMAF) And number of parent Source orientation sites (N)PAF) Wherein
NMAF=NMB+NMA;NPAF=NPA+NPB
-constructing a parental tendency statistic S of the test samplePORWherein
SPOR=NMAF/NPAF=(NMB+NMA)/(NPA+NPB)。
4. Method according to claim 1-2, wherein in step (2) allele depth values (AD) of the parental and maternal alleles of the test sample at the parental non-isohomozygous locus are obtained on said DNA segment selected, based on e.g. NGS sequencing analysis, i.e. ADFatherAnd ADFemaleAnd determining the paternal allele frequency and/or maternal allele frequency of the subject sample at the locus based on the AD value:
wherein the maternal allele frequency of the sample at the locus; MAF ═ ADFemale/(ADFather+ADFemale);
Wherein the paternal allele frequency of the sample at the locus; PAF ═ ADFather/(ADFather+ADFemale)。
Preferably, wherein the MAF ≧ classification threshold b (preferably 0.6) (N) of the sample under test on said selected DNA segment is countedMAF) And PAF ≧ classification threshold b (preferably 0.6) (N)PAF) Number of sites of (a) to construct S of sample specimenPORWherein
SPOR=NMAF/NPAF
5. The method of any one of claims 1 to 4, wherein the method is used to identify parental DNA contamination in a progeny micro DNA sample, wherein a parental predisposition of a sample indicates the likelihood of the sample having parental or maternal contamination,
preferably, the sample is SEM embryo culture fluid, the method comprising: obtaining a test sample on said DNA segment (in particular, at the whole genome level)(ii) the frequency of the Maternal Allele (MAF) and/or the frequency of the Paternal Allele (PAF) present on all parental inequality homozygous loci of said segment; and constructing a parent contamination propensity value S of the sampleMCWherein
SMC=SPOR=NMAF/NPAF
Preferably, in the method, the reference frame consists of SEM samples without parental predisposition, such as 1-40 reference SEM samples,
preferably, a parental predisposition statistic S is established for each SEM reference frame sampleMCAnd based on a reference system SMCMean. + -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations), set the parental predisposition SMCA threshold value, wherein,
s if tested SEM sampleMCIf the sample size is larger than the upper threshold, the tested SEM sample is prompted to have a parent source pollution tendency;
s if tested SEM sampleMCIf the sample is smaller than the lower threshold, the SEM sample to be tested is prompted to have a father source pollution tendency;
preferably, SMCThe upper threshold is 1.26, SMCThe lower threshold is 0.80; s of SEM sampleMC>1.26, indicating that the SEM sample has a parent source pollution tendency; s of SEM sampleMC<0.80, the SEM sample is suggested to have a tendency to be contaminated by the parent source.
6. The method of any one of claims 1 to 4, wherein the method is used for identifying chromosomal ploidy abnormalities in progeny in a trace DNA sample, wherein a parental predisposition of a sample indicates the likelihood of a chromosomal ploidy abnormality in the progeny, wherein preferably the sample is a sample obtained by comparison with biopsy cells, such as IVF blastocyst trophoblast cells,
preferably, the indication of predisposition by said parent is selected from: chromosome ploidy variation of a father-derived trisomy, a father-derived triploid, a mother-derived trisomy, a mother-derived triploid, a father-derived uniparental diploid or haploid, a mother-derived uniparental diploid or haploid, and any combination thereof;
preferably, in the method, the DNA sequence is determined in the DNA segment (in particular,chromosome level), obtaining the frequency (MAF) of the maternal allele and/or the frequency (PAF) of the paternally derived allele of the subject sample at the parental unequal homozygous locus of said segment; and constructing a parental tendency magnitude S of the sampleunevenWherein
Suneven=SPOR=NMAF/NPAF
More preferably, wherein the method further comprises,
-counting the total number of parents-unequal homozygous sites (N) of the test sample on said selected DNA segmenttotal) And number of non-parent and parent dominant sites (N)LOH);
Based on NLOHAnd NtotalDetermining the heterozygous site rate (S) of the test sample at the level of said DNA segmentloh) (ii) a And
the determined heterozygous site ratio of the test specimen (i.e., sample S)lohValue) and heterozygous site rate threshold (i.e., S)lohThreshold) are compared.
7. The method of claim 6, wherein the method further comprises:
establishing S using a reference frame without parental predispositionunevenThreshold value and optionally SlohA threshold, preferably, the parental predisposition-free reference frame consists of 1-40 or more euploid biopsy samples;
preferably, a reference system S is utilizedunevenMean. + -. 1-5 standard deviations (preferably 2-3, especially 3) to set the parental predisposition SunevenThreshold value, preferably, SunevenAn upper threshold of about 1.75 and a lower threshold of about 0.53;
preferably, a reference system S is utilizedlohMean. + -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set SlohThreshold value, preferably, SlohThe upper threshold is about 0.76 and the lower threshold is about 0.22.
8. The method of claims 6-7, wherein the method comprises:
construction of S at the chromosomal level of the test specimenunevenStatistics and predetermined S based on a parental predisposition-free reference frameunevenComparing the threshold values;
construction of S at the chromosome level of the test specimenlohA value, and S predetermined based on a parental predisposition-free reference framelohComparing the threshold values;
if receiving the sample book SlohIn a reference system SlohWithin a threshold value, SunevenGreater than the reference system SunevenPrompting the maternal trisomy or the maternal triploid if the upper limit of the threshold value is reached; otherwise, SunevenSmaller than the reference system SunevenA lower threshold value, prompting a father source trisomy or a father source triploid;
if SlohSmaller than the reference system SlohLower threshold, SunevenGreater than the reference system SunevenAn upper threshold, such as greater than 10,12,15,17,19 or 20, indicates a maternal unipolar diploid or haploid; otherwise, SunevenSmaller than the reference system SunevenAn upper threshold, such as less than 0.2,0.15,0.1,0.05,0.02, or 0.01, indicates a parental diploid or haploid.
9. An apparatus, device or system, characterized in that,
-optionally, being able to perform a single cell whole genome amplification of the test sample and optionally of the reference sample;
optionally, detection of genetic variation (preferably SNP) information of the progeny genome of the obtained single cell whole genome amplification product can be performed, e.g., wherein said information is determined by nucleic acid chip or NGS sequencing;
-enabling the method according to any one of claims 1-8 to be performed based on genetic variation information of the test sample and optionally genetic variation information of a reference system, for identifying parental predisposition of the test sample, or for detecting parental contamination of a test sample, or for identifying ploidy abnormalities in progeny DNA in the test sample.
10. Use of the device, apparatus or system of claim 9,
-for identifying parental predisposition of a test sample, or for detecting parental contamination of a test sample, or for identifying ploidy abnormalities in progeny DNA in a test sample, or
The application of the DNA ploidy-abnormality detecting agent in the preparation of products for identifying the parental tendency of a test sample, detecting the parental pollution of a test sample, or identifying the ploidy abnormality of filial generation DNA in the test sample.
CN202111536093.6A 2020-12-23 2021-12-15 Method and device for identifying parent tendency of nucleic acid sample Active CN114214425B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020115423682 2020-12-23
CN202011542368 2020-12-23

Publications (2)

Publication Number Publication Date
CN114214425A true CN114214425A (en) 2022-03-22
CN114214425B CN114214425B (en) 2024-01-19

Family

ID=80702562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536093.6A Active CN114214425B (en) 2020-12-23 2021-12-15 Method and device for identifying parent tendency of nucleic acid sample

Country Status (1)

Country Link
CN (1) CN114214425B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115433777A (en) * 2022-10-26 2022-12-06 北京中仪康卫医疗器械有限公司 Integrated identification method for CNV, SV and SGD abnormalities and abnormal sources of embryos
TWI807861B (en) * 2022-06-15 2023-07-01 中國醫藥大學 Method for identifying affinity of taiwanese population and system thereof
CN116497106A (en) * 2023-06-30 2023-07-28 北京大学第三医院(北京大学第三临床医学院) Identification method for maternal pollution in prenatal diagnosis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130059733A1 (en) * 2011-02-24 2013-03-07 The Chinese University Of Hong Kong Molecular testing of multiple pregnancies
US20130196862A1 (en) * 2009-07-17 2013-08-01 Natera, Inc. Informatics Enhanced Analysis of Fetal Samples Subject to Maternal Contamination
CN107557481A (en) * 2017-10-10 2018-01-09 苏州绘真医学检验所有限公司 The detection trace mixing reagent of people's source DNA 13 CODIS str locus seats of sample, kit and its apply
CN110157812A (en) * 2019-05-29 2019-08-23 苏州市公安局刑事科学技术研究所 Composite amplification reagent kit that is a kind of while detecting autosome and Y chromosome str locus seat

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130196862A1 (en) * 2009-07-17 2013-08-01 Natera, Inc. Informatics Enhanced Analysis of Fetal Samples Subject to Maternal Contamination
US20130059733A1 (en) * 2011-02-24 2013-03-07 The Chinese University Of Hong Kong Molecular testing of multiple pregnancies
CN107557481A (en) * 2017-10-10 2018-01-09 苏州绘真医学检验所有限公司 The detection trace mixing reagent of people's source DNA 13 CODIS str locus seats of sample, kit and its apply
CN110157812A (en) * 2019-05-29 2019-08-23 苏州市公安局刑事科学技术研究所 Composite amplification reagent kit that is a kind of while detecting autosome and Y chromosome str locus seat

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KARIN SASAKI等: "Salvage of fetal karyotype information from SNP array data obtained from products of conception with maternal cell contamination", PRENATAL DIAGNOSIS, pages 1 - 7 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI807861B (en) * 2022-06-15 2023-07-01 中國醫藥大學 Method for identifying affinity of taiwanese population and system thereof
CN115433777A (en) * 2022-10-26 2022-12-06 北京中仪康卫医疗器械有限公司 Integrated identification method for CNV, SV and SGD abnormalities and abnormal sources of embryos
CN116497106A (en) * 2023-06-30 2023-07-28 北京大学第三医院(北京大学第三临床医学院) Identification method for maternal pollution in prenatal diagnosis
CN116497106B (en) * 2023-06-30 2024-03-12 北京大学第三医院(北京大学第三临床医学院) Identification method for maternal pollution in prenatal diagnosis

Also Published As

Publication number Publication date
CN114214425B (en) 2024-01-19

Similar Documents

Publication Publication Date Title
CN114214425B (en) Method and device for identifying parent tendency of nucleic acid sample
US9639657B2 (en) Methods for allele calling and ploidy calling
CN113436680B (en) Method for simultaneously identifying chromosome structural abnormality and carrier state of pathogenic gene of embryo
WO2021073604A1 (en) Method and system for clearing noisy genetic data, phasing haplotype, and reconstructing offspring genome, and use thereof
US20100160717A1 (en) In vitro fertilization
AU2020296108B2 (en) Systems and methods for determining pattern of inheritance in embryos
US20160371432A1 (en) Methods for allele calling and ploidy calling
WO2023246949A1 (en) Non-invasive method for determining parentage before birth by using microhaplotypes
Soler et al. Rescuing monopronucleated-derived human blastocysts: a model to study chromosomal topography and fingerprinting
JP7446343B2 (en) Systems, computer programs and methods for determining genome ploidy
Tian et al. Preimplantation genetic testing in the current era, a review
US20240185957A1 (en) Methods for allele calling and ploidy calling
CN114480611A (en) Method for identifying diseased embryo and normal embryo of CNV microdeletion and microdropping syndrome
WO2017124214A1 (en) Method for detecting chromosome robertsonian translocation
CN117925820A (en) Method for detecting variation before embryo implantation
Luo et al. Pre-implantation genetic diagnosis for a family with Usher syndrome through targeted sequencing and haplotype analysis
CN117238375A (en) Detection system, device and method for analyzing chromosome aneuploidy and parental pollution of embryo
CN115287369A (en) Single cell sequencing based non-single sperm determination method
Liebaers et al. Preimplantation genetic diagnosis: risks and complications
Johnson A critical discussion of current preimplantation genetic screening strategies for improving assisted reproduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant