CN114214425B - Method and device for identifying parent tendency of nucleic acid sample - Google Patents

Method and device for identifying parent tendency of nucleic acid sample Download PDF

Info

Publication number
CN114214425B
CN114214425B CN202111536093.6A CN202111536093A CN114214425B CN 114214425 B CN114214425 B CN 114214425B CN 202111536093 A CN202111536093 A CN 202111536093A CN 114214425 B CN114214425 B CN 114214425B
Authority
CN
China
Prior art keywords
sample
parent
parental
por
predisposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111536093.6A
Other languages
Chinese (zh)
Other versions
CN114214425A (en
Inventor
邹央云
万成
姚雅馨
陆思嘉
任军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yikon Genomics Shanghai Co ltd
Original Assignee
Yikon Genomics Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yikon Genomics Shanghai Co ltd filed Critical Yikon Genomics Shanghai Co ltd
Publication of CN114214425A publication Critical patent/CN114214425A/en
Application granted granted Critical
Publication of CN114214425B publication Critical patent/CN114214425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Physiology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Ecology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a method or device for the detection of a parent predisposition to a nucleic acid sample, in particular a method or device for detecting a parent predisposition to a sample comprising only trace amounts of progeny DNA. The invention also relates to the use of said method or device for identifying a parent contamination of a sample or for identifying ploidy abnormalities of progeny DNA in a sample by detecting the parent predisposition of the sample.

Description

Method and device for identifying parent tendency of nucleic acid sample
Technical Field
The present invention relates to a method or device for the detection of a parent predisposition to a nucleic acid sample, in particular a method or device for detecting a parent predisposition to a sample comprising only trace amounts of progeny DNA. The invention also relates to the use of said method or device for identifying a parent contamination of a sample or for identifying ploidy abnormalities of progeny DNA in a sample by detecting the parent predisposition of the sample.
Background
Pre-implantation genetic testing (Preimplantation Genetic Test, PGT) refers to a method of pre-implantation genetic analysis of embryos of patients with high genetic risk during in vitro fertilization-embryo transfer, and selection of embryos with normal genetic material for implantation into the maternal uterine cavity, thereby obtaining healthy offspring. The clinical application of PGT is currently primarily genetic testing by taking cells through embryo biopsies. However, more and more studies have shown that this invasive cell biopsy procedure can adversely affect the embryonic development potential and subsequent ontogenesis. In recent years, various studies have found that embryo culture solutions contain free DNA (cfDNA) fragments of embryo origin, which make it possible to perform genetic testing before embryo implantation noninvasively. The successful application of cfDNA in embryo culture solution in PGT-A (chromosome aneuploidy detection), PGT-M (monogenic genetic disease) and PGT-SR (chromosome structural abnormality) further shows that the method has good application prospect in genetic detection before embryo implantation. However, both invasive and noninvasive embryo culture solution detection methods are susceptible to interference from parent sources (sperm) and parent sources (cumulus cells) when genetic testing is performed.
The clinical options for in vitro fertilization are generally both IVF (In Vitro Fertilization) and ICSI (Intracytoplasmic Sperm Injection, intracytoplasmic sperm injection method). First generation test tube Infants (IVF) used a large number of sperm co-cultures with ova to accomplish in vitro fertilization. For each ovum, only one sperm can be fertilized effectively, so that the ovum becomes a fertilized ovum, and other redundant ineffective sperms can adhere to the surface of the fertilized ovum and are released into in vitro culture fluid after death. Therefore, when performing genetic testing on IVF embryo-line cell biopsies, the results are often affected by the parental sperm. ICSI is the fertilization of a single sperm by injection into an egg by means of a micromanipulation system, which avoids the interference of parent DNA of the sperm, but the granulocytic cells on the surface of the egg bring about the interference of parent DNA. Although some methods are currently used to reduce the parental interference of embryo biopsy cells or culture fluid samples by washing the surfaces of the ova or fertilized ova (patent number: HK1229368A 1), sometimes they are not completely cleaned, with the potential for contamination risks, and thus may introduce false negatives to PGT results. Therefore, it is necessary to identify whether an embryo biopsy cell or culture fluid sample is contaminated with a parent prior to genetic analysis.
In PGT assays, whole genome triploid or uniphilic diploid embryos are also common, particularly for some patients with a family history of grape embryo. Specific PCR or methylation PCR analysis and the like have been proposed to determine the parental source of embryo ploidy variation, e.g., the parental (male or female) source of triploid, the parental source of uniparent diploid. However, these assays have limitations due to the requirement for the assay site. Conventional second generation sequencing platforms can provide sample genome-wide sequenced nucleic acid information on PGT biopsy samples. However, this technique makes it difficult to achieve detection of whole genome triploid or uniphilic diploid embryos.
For gDNA samples, STRs (Short Tandem Repeat, short tandem repeats) are currently often used for parental predisposition identification. However, for micro-DNA samples, such as embryo biopsy cells, or culture cfDNA, a sufficient amount is required to be amplified by single cell whole genome for genetic testing. WGS amplification often gives interfering information such as dominant amplification of alleles (dominant amplification of one of the two alleles) or Allele tripping (ADO, where only one of the two alleles is amplified). This effect presents difficulties in the detection of parental predisposition to trace DNA samples.
Thus, there is a need in the art for a method of detecting a parent predisposition in a trace DNA sample, as well as a method that can be used to determine the likelihood of parent contamination of the sample, and that can be used to identify progeny chromosomal ploidy abnormalities in the sample.
Summary of The Invention
The present inventors have intensively studied to propose a method for identifying whether a sample to be tested is prone to parental DNA by constructing a parental propensity statistic that can eliminate interference information (e.g., ADO interference) caused by single-cell whole genome amplification using a genetic variation polymorphic site, e.g., a single nucleotide polymorphism (Single Nucleotide Polymorphism, SNP) molecular marker. The method is suitable for rapidly, sensitively and specifically determining the parent source pollution and the mother source pollution in a trace DNA sample (especially embryo culture solution); and can rapidly, sensitively and specifically determine the chromosomal ploidy variation of offspring such as parent trisomy, parent trisomy (parent chromosome replication set), maternal trisomy (parent chromosome replication set), and uniparent diploid (parent and parent) in trace DNA samples (e.g., from PGT biopsies).
Accordingly, in one aspect, the present invention provides a method for identifying a parent predisposition to a test sample.
In yet another aspect, the invention provides a method for detecting the presence or absence of parental contamination in a test sample by parental predisposition identification.
In yet another aspect, the invention provides methods for identifying progeny DNA ploidy abnormalities in a test sample by parental predisposition.
In yet another aspect, the invention also provides articles of manufacture, including but not limited to, devices, systems, and apparatus, that can be used in any of the above methods of the invention or combinations thereof.
In yet another aspect, the present invention also provides the use of the devices, systems and apparatus of the present invention for identifying a parent predisposition to a test sample, or for detecting a parent contamination of a test sample, or for identifying progeny DNA ploidy abnormalities in a test sample; and to the use thereof in the preparation of products for said use.
Drawings
Figure 1 shows LRR and BAF distribution of 30% maternal contamination gDNA samples.
Figure 2A shows BAF distribution comparison of gDNA samples with single cell amplification products at selected maternal non-homozygous allele sites.
Fig. 2B shows BAF distribution comparison of non-contaminating SEM samples with 30% parent source contaminated samples.
Fig. 3 schematically shows an example of an analysis method for parental propensity detection of a chip platform.
FIG. 4 schematically shows an example of an analytical method for parent bias detection for a second generation sequencing platform.
FIG. 5 shows S of a known parent-less prone SEM sample MC Density distribution.
FIG. 6 shows S under blending conditions of different parent source ratios MC Is a distribution of the (b).
Fig. 7 schematically shows an example of a child parent detection analysis method.
FIG. 8 shows S obtained in a reference frame without a parental propensity uneven Distribution.
FIG. 9 shows S obtained in a reference frame without parental propensity loh Distribution.
FIG. 10 shows S of chromosome monomer or trisomy sample uneven And S is loh Distribution.
FIG. 11 shows S of embryo parent triploid samples uneven And S is loh Distribution.
Detailed Description
Before describing the present invention in detail, it is to be understood that this invention is not limited to particular methodology and experimental conditions described herein, as such methods and conditions may vary. In addition, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
Definition of the definition
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. For the purposes of the present invention, the following terms are defined below.
The term "about" when used in conjunction with a numerical value is intended to encompass numerical values within a range having a lower limit of 5% less than the specified numerical value and an upper limit of 5% greater than the specified numerical value.
The term "and/or" when used in connection with two or more selectable items is understood to mean any one of the selectable items or any two or more of the selectable items.
As used herein, the terms "comprises" or "comprising" are intended to include the stated elements, integers or steps but do not exclude any other elements, integers or steps. In this document, the terms "comprises" or "comprising" when used herein, unless otherwise indicated, also encompass the circumstance of consisting of the recited elements, integers or steps. For example, when referring to an antibody variable region "comprising" a particular sequence, it is also intended to encompass antibody variable regions consisting of that particular sequence.
In the present invention, both the test sample and the reference sample are daughter trace DNA samples. The terms "progeny trace DNA sample" or "trace DNA sample" are used interchangeably herein to refer to a sample comprising DNA from progeny that is present only in trace amounts in the sample, e.g., at a level of less than 1ng, 500pg, 100pg, or 10pg or even less (e.g., about 1-6 pg). "DNA from progeny" or "progeny-derived DNA" is used interchangeably herein to refer to DNA from, for example, a progeny cell, a progeny fluid, or a culture of progeny cells that is substantially identical in genotype to the progeny genome genotype. Thus, in some embodiments, the trace DNA sample may be an IVF embryo culture fluid (SEM) sample in which trace embryo genomic DNA released from the cultured embryo is present. In other embodiments, the sample is an embryonic or fetal biopsy cell sample, e.g., IVF blastula trophoblast cells. In still other embodiments, the sample is blood and/or plasma from a pregnant female, comprising a trace amount of cell-free DNA (cfDNA) from the fetus.
The term "progeny" herein includes, but is not limited to, mammalian progeny, such as human, meaning born or unborn progeny. The unborn offspring comprise embryos (embryo) or fetuses (fetus). Embryos generally refer to the products of fertilized egg cleavage after fertilization to week eight, before the end of the embryo period. The embryo cleavage phase was present in the first three days of culture. "embryo transfer" is the manipulation of placing one or more embryos and/or blastocysts into the uterus or oviduct. Fetuses generally refer to the unborn offspring of mammals, particularly unborn human infants, eight weeks after gestation.
The term "blastocyst" is an embryo 5 or 6 days after fertilization that has an internal cell mass, an outer cell layer called the trophectoderm, and a liquid-filled blastocyst chamber that contains the internal cell mass from which the embryo is entirely derived. Trophectoderm is a precursor of placenta.
The term "related individual" or "family individual" of a progeny is used interchangeably to refer to any individual that is genetically related to the subject progeny individual, e.g., any individual that is genetically related to the subject progeny individual and thus shares a haplotype therewith. In one instance, the relevant individual may be a genetic parent of the target individual or any genetic material derived from a parent, such as sperm, polar bodies, other embryos or fetuses. It may also refer to siblings, parents or grandparents, external grandparents. In this application, a parent refers to the genetic father or mother of an individual. The offspring individuals typically have two parents (female parent and male parent). By sibling is meant any individual whose genetic parent is identical to the offspring individual in question. In some embodiments, siblings may refer to a born child, embryo or fetus, or one or more cells derived from an embryo or fetus, a born child; siblings may also refer to haploid individuals derived from a parent party, such as sperm, polar bodies, or any other haplotype genetic material. In one embodiment, the methods of the invention include determining genomic sequence information, e.g., genetic variation information (e.g., SNP information), of parents of a gene of a progeny. In some embodiments, parental genomic sequence information may be determined from a large amount of genomic DNA extracted from a parental tissue (e.g., peripheral blood). In other embodiments, the genomic sequence information of the parent may also be known by the relevant individual or family individual.
The term "SNP (single nucleotide polymorphism)" refers to a polymorphism at a certain site in a chromosomal DNA sequence due to a single nucleotide change, and the frequency of SNPs in a population is generally >1%. There is an SNP on average 300-1000bp across the human whole genome. SNP databases are currently available from a number of public databases, including, for example, http:// cgap. Ncbi. Nih. Gov/GAI; http:// www.ncbi.nlm.nih.gov/SNP; human SNP database http:// hgbas. Cgr. Ki. Sei or http:// hgbase. Interactiva/.
The term "genotype" refers to the type of allele that an individual possesses at a locus, referred to as the genotype of the individual at that locus. For humans, each pair of homologous chromosomes has a pair of allele types at the same locus, called the genotype of the locus, in addition to the sex chromosome. Genotyping refers to the process of determining the genotype of an individual.
The term "mendelian's law of inheritance" relates to two basic laws of genetics, separation and free combination, collectively known as mendelian's law of inheritance. According to Mendelian's genetic law, during meiosis, alleles separate with segregation of homologous chromosomes, enter two gametes respectively, and inherit with gametes independently to offspring; furthermore, non-alleles on non-homologous chromosomes appear as free combinations at the same time as allele separation.
The term "nucleic acid chip", for example, "SNP chip", is a chip that can determine the genotype of a site by using a signal (usually a fluorescent signal) obtained after hybridization of the chip. In practical studies, SNP chips contain different SNP sites depending on the chip manufacturer, model, etc. For example, human chips produced by Affymetrix and Illumina contain different SNP sets.
In this context, the term "parental predisposition" is used to refer to an increase or decrease in the frequency of parent (or maternal) alleles that exhibit deviations from the expected mendelian genetic law in the offspring genotypes detected in the microscale DNA samples. Such deviations in progeny allele frequency towards one of the parents may be caused, for example, by contamination of the parent DNA with the sample, or by a fold variation of the progeny DNA itself in the sample. For example, in a trace DNA sample from TE cells, the deviation may be caused by a factor genotype abnormality, such as replication of the parent chromosome/fragment, deletion of the parent chromosome/fragment, and/or deletion of the parent chromosome/fragment. Thus, in some embodiments of the invention, the parental predisposition reflects, for example, the likelihood that progeny will exhibit a variation in the genotype of the progeny of a parent triploid, a uniparent diploid, or the like. In other cases, the genotypic deviation detected in a trace DNA sample may be caused by the presence of parental-derived contaminating DNA in the sample. For example, deviations occur in SEM samples in the presence of parent DNA or maternal DNA contamination. Thus, in other embodiments of the invention, the parental predisposition may reflect the likelihood of a sample having a parent or parent source of contamination.
Parental propensity statistics (S) POR ) In this context, it is meant that the magnitude of the parent propensity for a test sample is measured as constructed in accordance with the methods of the present invention. In the present methods for sample contamination determination, the parental propensity statistics are also denoted herein as "S MC "; in the methods of the invention for progeny heteroploidy detection, the parental predisposition statistic is also denoted "S" uneven ”。
The term "reference frame" refers herein to a reference frame used to establish a threshold of parent predisposition (i.e., S POR Threshold value). The person skilled in the art knows how to select an appropriate reference sample based on the sample to be tested. Preferably, the reference sample and the test sample are of the same sample type containing minor amounts of progeny DNA. Thus, in some embodiments, the test sample and the reference sample are SEM samples comprising progeny trace DNA collected in the same manner. In other embodiments, the test sample and the reference sample are the same type of biopsy cells that contain a trace amount of progeny DNA, e.g., IVF blasts. In establishing the threshold using the reference frame, each reference sample in the reference frame is preferably subjected to single cell whole genome amplification and parental propensity statistical magnitude determination in the same manner as the test sample.
As used herein, the term "biallelic loci" refers to a pair of homologous chromosomes having two alleles at a particular locus in a diploid cell. Herein, the two alleles are denoted by letters a and B. Thus, the homozygous biallelic gene may be AA or BB; the heterozygous biallelic gene is AB. Herein, a bi-allelic locus of a embryo or fetus having different homozygous alleles is referred to as a "parental unequal homozygous allele locus" or a "parental unequal homozygous locus", e.g., at which, if the father has an AA genotype, the mother has a BB genotype, and accordingly the a allele in the offspring is the parent allele; and the B allele is the maternal allele; or if the father has a BB genotype, the mother has an AA genotype, and accordingly the B allele in the offspring is the parent allele; while the a allele is the maternal allele. In the method of the invention, preferably, the loci of parents with different homozygous alleles are selected for the calculation of the parental predisposition value.
Herein, the term "SEM" (spent embryo culture medium) refers to spent embryo culture fluid collected during the in vitro embryo culture period of IVF. The IVF in vitro culture process may or may not be subjected to a fluid exchange prior to said collection, and the fluid exchange may be one or more times. For example, the SEM may be an SEM collected D5 after a D3 day change; or SEM collected several hours (e.g., 6-7 hours) after the change of fluid on days D3 and D5, etc.
Herein, terms such as D1, D3, D4, D5, D6, and the like, when referring to SEM culture broth, refer to culture broth of embryos for IVF in vitro culture on day 1, day 3, day 4, day 5, day 6. Similarly, SEM broth of D1-D6 refers to SEM broth of any day between day 1 and day 6 of in vitro culture, including, for example, but not limited to, SEM broth of day 1 to day 3 (D1-D3), day 3 to day 5 (D3-D5), day 3 to day 6 (D3-D6), day 4 to day 5 (D4-D5), day 4 to day 6 (D4-D6), and day 5 to day 6 (D5-D6) of in vitro culture. As used herein, the term "module" refers to a software object or routine (e.g., as a separate thread) that may be executed on a single computing system (e.g., a computer program, a tablet (PAD), one or more processors). The program for carrying out the methods of the present invention may be stored on a computer readable medium having computer program logic or code portions embodied therein for carrying out the system modules and methods. While the system modules and methods described herein are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated by those skilled in the art.
The following describes aspects of the invention in detail.
The method of the invention
For micro-DNA samples, especially DNA samples on the order of picograms, single-cell whole genome amplification is often required before genotyping, and this amplification technique can lead to a large number of genetic variation artefacts. This makes it possible that in these genotyping data, the locus of occurrence of a mendelian genetic error may reflect the data quality problem caused by WGA, the sample contamination problem, or the chromosomal DNA copy number abnormality present in the sample. The inventor finds that when identifying pollution problem of the offspring micro DNA sample or identifying offspring copy number abnormality in the offspring micro DNA, the parent tendency statistic of the sample can be constructed by using specific genetic variation sites in the sample, so that the background noise interference problem caused by WGA can be effectively eliminated. On this basis, the present invention provides methods for identifying contamination problems of trace DNA samples (especially SEM), and methods for identifying copy number anomalies (especially progeny aneuploidy) in trace DNA samples.
Preferably, the method of the present invention comprises the steps of:
(1) Analyzing the single-cell whole genome amplified product of the progeny trace DNA sample for a genetic variation site (preferably SNP site);
(2) Obtaining Maternal Allele Frequencies (MAFs) and/or Paternal Allele Frequencies (PAFs) of the sample at the parental unequal homozygous allele sites of the segments at selected DNA segments (e.g., at the whole genome level, or at the chromosome segment level);
(3) Classifying the sites as parent predisposition sites, parent predisposition sites or non-parent and parent predisposition sites based on the MAF and/or PAF determined in step (2) based on classification thresholds a and b, preferably using allele frequency thresholds of 0.4 and 0.6;
(4) On the selected DNA segment, count the number of parent source tendency sites of the sample on the segment (N MAF ) And parent tendency bit number (N) PAF );
(5) Based on the number of parent source tendency sites (N) MAF ) And parent tendency bit number (N) PAF ) Determining a statistical magnitude of parent tendencies of the sample at the DNA segment level (S POR );
(6) Comparing the statistical value of the parent tendency determined in the step (5) with S POR Threshold values are compared to determine the parental propensity of the sample at the DNA segment level.
In one embodiment, the detection on the selected DNA segment may be a detection at the whole genome level, at the chromosome level, or at the chromosome fragment level.
In one embodiment, the method of the invention is used to identify parent DNA contamination problems for progeny trace DNA samples. The sample may be in any possible sample form requiring identification of parental DNA contamination. Preferably, the sample is an SEM, and wherein the parent propensity of the sample indicates whether the sample SEM is likely to be contaminated with a parent source or contaminated with a parent source. Still preferably, the sample is a biopsy of an IVF embryo, and wherein the parental predisposition of the sample indicates whether the IVF embryo sample is likely to be contaminated with a parent source or a maternal source.
In yet another embodiment, the methods of the invention are used to identify chromosomal ploidy abnormalities in progeny in a minute DNA sample of the progeny, wherein the sample is a biopsy associated with the progeny (e.g., embryo or fetus), such as an IVF blastula trophoblast biopsy, and wherein the parental predisposition of the sample indicates whether the progeny is present with ploidy abnormalities.
In some embodiments, the methods of the invention for progeny chromosome ploidy abnormality recognition further comprise:
(i) On the selected DNA segment, the number of non-parent and parent predisposition sites (N) of the sample for which MAF and/or PAF lie between the classification threshold values a and b is counted LOH );
(ii) Calculating the total of parental unequal homozygosity sites of the non-parent source and parent source tendentiousness sites in the section Number (N) total ) Determining the heterozygous site rate of the sample at the DNA segment level (S loh );
(iii) Combining the heterozygous site rate determined in step (ii) with S loh Threshold value is compared, preferably S loh The threshold value is predetermined based on a reference frame,
(iv) Combining step (6), determining the parental propensity of the sample at the DNA segment level and determining that the ploidy abnormality (i.e., CNV) is caused by a repeat (dup) or a deletion (del).
In some preferred embodiments of the method, S is determined based on a parental-free-tendency reference frame POR Threshold and S loh Threshold value, S of sample POR Value sum S loh The extent to which the value deviates from the threshold value indicates the likelihood that the child generation is a uniparent diploid or haploid.
In one embodiment, the method of the present invention further comprises: and amplifying the trace DNA sample by using a single-cell whole genome amplification technology. The amplification method adopts a single cell amplification strategy, and specific methods are not limited, and include, but are not limited to, pre-amplification primer extension PCR (Primer extension preamplification PCR, PEP-PCR), degenerate oligonucleotide primer PCR (Degenerate oligonucleotide primer-PCR, DOP-PCR), multiple displacement amplification technology (Multiple Displacement Amplification, MDA), multiple annealing loop cycle amplification technology (Multiple Annealing and Looping Based Amplification Cycles, MALABAC). In a preferred embodiment, MALDBAC is used for micro-DNA sample amplification. More preferably, use is made of (US 20190106738A 1) whole genome amplification of samples was performed.
In the methods of the invention, the site of genetic variation and corresponding allele frequencies may be obtained from the minor DNA sample amplicons and/or parental or other family gDNA by any technique known in the art that is suitable for the acquisition of genetic variation sequence information. For example, techniques selected from the group consisting of nucleic acid chips and sequencing may be used. The nucleic acid chip and the sequencing technology can be a single nucleotide polymorphism site microarray nucleic acid chip, a Massarray flight mass spectrometry chip, second generation sequencing, third generation sequencing or a combination thereof; for example, the single nucleotide polymorphism site microarray nucleic acid chip is a SNP genotyping chip; for example, the second generation sequencing includes whole genome sequencing, whole exome sequencing, and sequencing of targeted genomic regions, preferably whole genome sequencing, the sequencing depth may be 1X, 2X, 5X, and the like.
In one embodiment, the genetic variation site is a Single Nucleotide Polymorphism (SNP) site. In a preferred embodiment, the SNP site is detected based on a nucleic acid chip. In another preferred embodiment, the SNP site is detected based on sequencing.
In one embodiment, the micro-DNA sample may be any sample that can be used to detect progeny genetic information, such as, for example, fetal cell-free DNA (cfDNA) in embryo culture fluid, blastocyst lumen fluid, maternal plasma, or other types of body fluids, and/or fetal cells in blastocyst cells, maternal blood, or other types of body fluids. In some embodiments, preferably, the micro DNA sample is a non-invasively obtained embryonic nucleic acid sample. For example, the embryo nucleic acid sample may be embryo culture broth or free DNA from or obtained from embryo culture broth.
In some embodiments, the trace DNA sample comprises about 0.1pg to 40ng of DNA, e.g., 1 to 40ng of DNA, 20 to 40ng of DNA, preferably, 0.1 to 40pg of DNA, 1 to 40pg of DNA, 5 to 10pg, 10 to 40pg, 40 to 100pg of progeny trace genomic DNA nucleic acid of DNA. More preferably, the trace DNA sample comprises 5-10pg or 10-40pg of progeny trace genomic DNA nucleic acid. In some embodiments, the progeny is an embryo or a fetus. In some preferred embodiments, the progeny are IVF (in vitro fertilized) embryos. For example, in some embodiments, preferably, the methods of the invention are used for parental contamination identification of biopsies or SEM culture solutions from IVF embryos. In still other preferred embodiments, the progeny are ICSI (intracytoplasmic sperm injection) embryos. For example, in still other embodiments, preferably, the methods of the invention are used to identify ploidy abnormalities in ICSI embryos.
In one embodiment, the method of the present invention further comprises: and determining the genetic variation sites for determining the parent tendency statistical value based on the corresponding genealogy gene information of the trace DNA sample. In one embodiment, the pedigree gene information is parental gene information. For example, the information may be obtained by genetic testing of a parent's genomic DNA, such as a large amount (e.g., at least 100ng or more) of genomic DNA extracted from peripheral blood. In some embodiments, pedigree allele information may be obtained from a nucleic acid sample of the family individual (particularly a parent individual) comprising at least about 100ng DNA (e.g., 100ng-1000ng DNA). For example, the family individual nucleic acid sample is a nucleic acid sample from blood, saliva, oral swab, urine, nail, hair follicle, dander, cells, tissue, body fluid of the family individual.
Accordingly, in one aspect, the present invention provides a method for determining the parental predisposition of a test sample, wherein said test sample comprises progeny genomic DNA in an amount of not more than 1ng (and preferably about 1-500pg, more preferably 1-100 pg), wherein said method comprises the steps of:
(1) Performing a genetic variation site (preferably SNP site) analysis on single-cell whole genome amplification products of the test sample;
(2) Obtaining the frequency of Maternal Alleles (MAF) and/or the frequency of maternal alleles (PAF) at the parental unequal homozygous loci of the sample at the selected DNA segment (e.g., at the whole genome level, or at the chromosome segment level);
(3) Classifying the genetic locus as a parent predisposition locus, or a non-parent and parent predisposition locus, based on the comparison of MAF and/or PAF determined in step (2) with classification thresholds a and b, wherein classification threshold a is a value of 0.1 to 0.4 (e.g., 0.1,0.2,0.3,0.4, or any other value between 0.1-0.4), classification threshold b is a value of 0.6 to 0.9 (e.g., 0.9,0.8,0.7,0.6, or any other value between 0.—60.9), and a+b = 1;
wherein,
-classifying said locus as a parent predisposition locus if MAF +.a and/or PAF +.b;
-classifying said sites as maternal-predisposed sites if PAF +.a and/or MAF +.b;
-classifying the sites as non-parent and parent predisposition sites if MAF and/or PAF have > a and < b values;
(4) Counting the number of parent source bias sites (N MAF ) And parent tendency bit number (N) PAF );
(5) Based on the number of parent source tendency sites (N) MAF ) And parent tendency bit number (N) PAF ) Determining a statistical magnitude of parent tendencies of the sample at the DNA segment level (S POR );
(6) Sample S determined in step (5) POR Value, and parent tendency threshold (i.e., S POR Threshold value) to determine the parental propensity of the test sample at the level of the DNA segment.
In some embodiments, preferably, in step (3), the threshold a for site classification is any other value between 0.2 and 0.4, and the classification threshold b is any other value between 0.6-0.8, and a+b=1; more preferably, the classification threshold a is any other value between 0.3 and 0.4, and the classification threshold b is any other value between 0.6-0.7, and a+b=1.
More preferably, in step (3), the classification threshold a is 0.4; and b is 0.6, and
-classifying said locus as a parent predisposition locus if MAF +.0.4 and/or PAF +.0.6;
-classifying said sites as maternal-predisposed sites if PAF +.0.4 and/or MAF +.0.6;
-classifying the locus as a non-parent and parent predisposed locus if MAF and/or PAF have >0.4 and <0.6 value.
In some embodiments, in the method, S POR The threshold is established using a reference frame without a parent propensity,
preferably, the reference frame consists of 1-40 or more reference samples without a parent predisposition;
preferably, the statistical magnitude of the parental tendency S based on the reference frame POR 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the reference S POR A threshold value;
if S of sample POR Greater than the reference S POR An upper threshold value, suggesting a substantial parent propensity for the test sample to be present on the DNA segment; if S of sample POR Less than the lower threshold, then an indication is made that the subject sample has a parent propensity on the DNA segment.
In some embodiments, the single cell whole genome amplification is performed using a method selected from the group consisting of: amplification pre-primer extension PCR, annealing oligonucleotide primer PCR, multiple displacement amplification technique, multiple annealing loop cycle (MALDBAC) amplification technique, blunt-ended or cohesive-end ligation library construction, (US 20190106738 A1); wherein, preferably, MALDBAC is used for said single cell whole genome amplification, more preferably +.>The whole genome amplification is performed. In some cases, the quality and/or amount of amplification product may be affected by the choice of different single cell whole genome amplification modes. In these cases, it is preferred to use a reference sample amplification product obtained in the same amplification manner as the test sample for determining the threshold value, as understood by those skilled in the art.
In some embodiments, the method, in step (2), obtains BAF values of the test sample at the parental non-homozygous genetic locus based on, for example, nucleic acid chip analysis, and determines the parent allele frequency and/or maternal allele frequency of the sample at the locus based on the BAF values.
In some embodiments, the method comprises:
-determining, on the selected DNA segment, that the father is AA genotype, motherIs the locus of BB genotype, the BAF value of the sample under test at the locus is calculated, and the BAF is counted to be equal to or larger than the classification threshold b (preferably 0.6) (N MB ) And BAF. Ltoreq.class threshold a (preferably 0.4) (N PA ) The number of sites in (a); and, determining the locus of BB as father and AA as mother, calculating BAF value of the tested sample at said locus, and calculating BAF less than or equal to classification threshold a (preferably 0.4) (N MA ) And BAF. Gtoreq. Classification threshold b (preferably 0.6) (N PB ) The number of sites in (a);
-calculating the number of parent source propensity sites (N MAF ) And parent tendency bit number (N) PAF ) Wherein
N MAF =N MB +N MA ;N PAF =N PA +N PB
-construction of a statistical quantity S of the parental predisposition of the test sample POR Wherein
S POR =N MAF /N PAF =(N MB +N MA )/(N PA +N PB )。
In some embodiments, the method, in step (2), obtains allele depth values (AD) for the parent and maternal alleles of the test sample at the parental non-homozygous locus on the selected DNA segment based on analysis such as NGS sequencing, i.e., AD Father And AD (analog to digital) Mother and mother And determining the parent allele frequency and/or the maternal allele frequency of the test sample at the site based on the AD value:
wherein the sample is at maternal allele frequencies at the locus; maf=ad Mother and mother /(AD Father +AD Mother and mother );
Wherein the sample is at the paternal allele frequency of the locus; paf=ad Father /(AD Father +AD Mother and mother )。
Preferably, based on MAF and/or PAF obtained by AD analysis, MAF of the test sample on the selected DNA segment is counted to be equal to or greater than the classification threshold b (preferably 0.6) (N MAF ) The number of bits and PAF. Gtoreq. Classification threshold b (preferably 0.6) (N PAF ) Construction of the testS of sample POR Wherein
S POR =N MAF /N PAF
In preferred aspects, the invention provides methods for identifying parent DNA contamination in a trace DNA sample of progeny using the parent predisposition identification methods of the invention, wherein the parent predisposition of the sample indicates the likelihood of parent contamination or parent contamination of the sample,
preferably, the sample is SEM embryo culture broth, the method comprising: obtaining, at the DNA segment (in particular, at the whole genome level), the frequency of Maternal Alleles (MAF) and/or the frequency of Paternal Alleles (PAF) of the test sample at all parental non-equal homozygous loci of the segment; and constructing a parent pollution tendency value S of the sample MC Wherein
S MC =S POR =N MAF /N PAF
In some embodiments of the parental pollution identification method according to the present invention, the reference frame consists of SEM samples without a parental predisposition, such as 1-40 reference SEM samples,
preferably, a parental tendency statistic S is established for each SEM reference frame sample MC And based on reference system S MC Mean ± 1-5 standard deviations (preferably 2-3, especially 3 standard deviations), the parental predisposition S is set MC A threshold value, wherein,
if S of tested SEM sample MC If the threshold value is larger than the upper limit of the threshold value, prompting that the tested SEM sample has a parent source pollution tendency;
If S of tested SEM sample MC Less than the lower threshold, the tested SEM sample is prompted to have a propensity for parent contamination.
In some embodiments of the method for identifying a parental pollution according to the present invention, preferably S MC The upper threshold is 1.26, S MC The lower threshold is 0.80; if S of SEM sample MC >1.26, indicating that the SEM sample has a parent source pollution tendency; if S of SEM sample MC <0.80, the SEM sample is suggested to have a parent contamination propensity.
In some embodiments of the methods of identifying parental contamination according to the present invention, the parental DNA contamination (e.g., parental DNA contamination or maternal DNA) in the test SEM sample is less than 50%,40%, more preferably less than 10%, or less than 9%,8%,7%,6%,5%,4%,3%,2%, or 1%, or less than 0.1% or less. In another embodiment, the ratio of embryo DNA to parent contaminating DNA in the test SEM sample is 1:9 to 9:1.
In preferred further aspects, the invention provides methods for identifying chromosomal ploidy abnormalities in progeny in a trace DNA sample of the progeny using the parental predisposition identification methods of the invention, wherein the parental predisposition of the sample indicates the likelihood of the progeny being present with a chromosomal ploidy abnormality, wherein preferably the sample is isolated from a biopsy cell, such as an IVF blastula trophoblast cell,
Preferably, the indication of the propensity to be by the parent is selected from: chromosomal ploidy variations of parent trisomy, parent uniparent diploid or haploid, and any combination thereof.
In some embodiments of the progeny abnormal ploidy identification method according to the invention, the frequency of Maternal Alleles (MAF) and/or the frequency of Paternal Alleles (PAF) of the test sample at the parental non-equal homozygous loci of the segment is obtained on a selected DNA segment (in particular, chromosomal level); and constructing a parental predisposition value S of the sample uneven Wherein
S uneven =S POR =N MAF /N PAF
In some embodiments of the progeny abnormal ploidy identification method according to the present invention, the method further comprises,
-counting the total number of parental non-equal homozygous sites on said selected DNA segment (N total ) And the number of non-parent and parent tendencies bits (N LOH );
-based on N LOH And N total Determining the heterozygosity site rate (S) of the test sample at the DNA segment level loh ) The method comprises the steps of carrying out a first treatment on the surface of the And
determining the heterozygous site rate of the test sample (i.e., sample S loh Value) and heterozygous site rate threshold (i.e., S loh Threshold value) is compared.
In some embodiments, the degree of heterozygosity loss on a selected DNA segment can be reflected by the heterozygosity site rate. Thus, in some embodiments, a uniparent diploid or haploid can be distinguished from a chromosomal trisomy or triploid by the extent to which the heterozygosity locus rate deviates from the threshold. For example, in some cases, a uniparent diploid or haploid may exhibit a heterozygous site rate of less than 0.2, preferably less than 0.15, 0.1, 0.07, 0.05, 0.02 or less.
In some embodiments of the progeny abnormal ploidy identification method according to the present invention, the method further comprises:
establishing S using a reference frame without a parental propensity uneven Threshold and optionally S loh The threshold, preferably, the parental-less predisposition reference frame consists of 1-40 or more whole ploidy biopsy samples.
Preferably, reference frame S is used uneven Mean.+ -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the parental predisposition S uneven A threshold value. Preferably, reference frame S is used loh Mean.+ -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set S loh A threshold value.
In some embodiments of the progeny abnormal ploidy identification methods according to the invention, S uneven The upper threshold is about 1.75 and the lower threshold is about 0.53; and/or S loh The upper threshold is about 0.76 and the lower threshold is about 0.22.
In some embodiments of the progeny abnormal ploidy identification method according to the present invention, the method comprises:
construction of the sample under test at the chromosomal level S uneven Statistics and S predetermined based on a parental-less predisposition reference frame uneven Comparing the threshold values;
construction of the chromosome of the test sampleHorizontal S loh Value and S predetermined based on a parental-less predisposition reference frame loh Comparing the threshold values;
if the sample S is tested loh In reference frame S loh Within the threshold value, S uneven Greater than reference S uneven Prompting a parent source trisome or a parent source triploid if the threshold is upper limit; on the contrary, S uneven Less than reference S uneven A threshold lower limit, prompting a parent source trisomy or a parent source triploid;
if S loh Less than reference S loh Threshold lower limit, S uneven Greater than reference S uneven An upper threshold, such as greater than 10,12,15,17,19 or 20, indicative of maternal monoploid diploid or haploid; on the contrary, S uneven Less than reference S uneven An upper threshold, e.g., less than 0.2,0.15,0.1,0.05,0.02 or 0.01, indicates that the parent is monoploid or haploid. Preferably, if S loh Significantly less than the lower threshold, e.g., any value from 0 to 0.2, or less than 0.15, or less than 0.1, suggests a uniparent diploid or haploid.
In some embodiments of the progeny abnormal ploidy identification method according to the present invention, the parent predisposition value and heterozygous locus rate of the test sample on each of the plurality of autosomes is constructed, and the likelihood that the test sample is a parent or parent triploid is determined, preferably,
if the tested sample shows parent tendentiousness and heterozygous site rate of prompting maternal trisomy on more than 15 chromosomes, judging that the tested sample is likely to be maternal trisomy;
if the test sample shows parent tendencies and heterozygous site rates on more than 15 chromosomes indicating parent trisomy, then the test sample is judged to be likely to be parent triploid.
In yet another aspect, the invention also relates to a method for implementing the invention using the apparatus, device or system of the invention. Preferably, the apparatus, device or system of the present invention is characterized in that,
-optionally, single cell whole genome amplification of the test sample and optionally of the reference sample can be performed;
-optionally, detection of genetic variation (preferably SNP) information of the progeny genome of the obtained single cell whole genome amplification product can be performed, e.g. wherein said information is determined by nucleic acid chip or NGS sequencing;
-an embodiment of any of the methods according to the invention can be performed to identify a parental predisposition of the test sample, or to detect a parental contamination of the test sample, or to identify progeny DNA ploidy abnormalities in the test sample, based on the genetic variation information of the test sample and optionally the genetic variation information of the reference frame.
In a further aspect, the invention also provides the use of the apparatus, device or system of the invention in a method of the invention, preferably,
-use for identifying a parental predisposition to a test sample, or detecting parental contamination of a test sample, or identifying progeny DNA ploidy abnormalities in a test sample, or
Use in the manufacture of a product for identifying a parent predisposition to a test sample, or detecting parent contamination of a test sample, or identifying progeny DNA ploidy abnormalities in a test sample.
The method for identifying parental contamination and the method for identifying progeny heteroploidy according to the present invention will be described further below. The aspects and advantages of the present invention will become more apparent from the description.
SEM contamination identification method
Analysis of cfDNA (cell-free DNA) in SEM collected during IVF blasts has been proposed as a more promising sampling method than invasive sampling. More recently, blastocyst fluid and SEM samples were compared with Trophoblast (TE) cells as samples for PGT (pre-embryo implantation genetic testing). TE samples showed 100% amplification and high genotyping compliance (99.8%). Blastocyst fluid samples showed high failure rate of amplification (72.6%) and low genotyping compliance (13.3%). SEM samples showed better performance than blastula but lower than TE samples, with low failure rate of amplification (10.3%) and moderate genotyping compliance (59.5%). One factor that has been proposed to lead to low diagnostic accuracy of SEM is SEM broth contamination problems. Sources of contamination include, for example, maternal DNA contamination (e.g., cumulus cells that were not completely removed prior to ICSI), maternal DNA contamination, and contamination from exogenous DNA already present in the medium supplement. Because contamination is a major risk factor for genetic diagnostic errors, there is a need to optimize current SEM contamination identification protocols to determine whether collected samples reflect the true genetic status of embryos, facilitate embryo-specific allele analysis, and distinguish between embryo and non-embryo DNA.
SNP microarrays and NGS allow for the simultaneous acquisition of SNP genotype and chromosome copy number information. Thus, in principle, both techniques can provide aneuploidy, polyploid, and uniphilic diploid information. In conventional multicellular genomic DNA analysis, two parameters are used to reflect the copy number status: log r ratio (log 2 transformation value of normalized SNP intensity) and B allele frequency (BAF, i.e., the ratio of B allele signal intensity to total SNP signal intensity). BAF values 0, 0.5 and 1 represent normal copy numbers (n=2), whereas analysis of samples for anomalies will result in an increase or decrease in total intensity and allele frequency, with BAFs deviating from the above values. For example, BAF has only 0 and 1 (genotypes AA and BB) when copy number deletions occur, with no values around 0.5 (genotype AB); the LRR value is reduced; when the copy number increases, BAF takes on values of 0,0.33,0.67 and around 1.0 (genotypes AAA, AAB, ABB and BBB), while its LRR value increases. Based on this regular variation of BAF and LRR values, BAF values have been proposed for estimating abnormal copy number variation and foreign contaminating DNA ratios in conventional genomic DNA (gDNA) samples, as well as fetal DNA ratios in maternal plasma cfDNA. For example, CN104640997a discloses methods for calculating the percentage of fetal DNA in plasma DNA and diagnosing the risk of fetal trisomy 21 using BAF values in NIPD (non-invasive prenatal diagnosis) tests with maternal plasma DNA of pregnancy. It is pointed out in this document that for the method described, a small initial sample size gives rise to a problem of statistical accuracy, and to avoid this problem, a targeted acquisition of 21 chromosome target region samples to be detected should be performed using a target region capture combined targeted amplification approach.
SEM contamination identification is complicated by factors such as low quality and very small numbers of embryo DNA in the SEM. Only trace amounts of cell-free embryonic DNA (about 10 to several tens of picograms) are present in the SEM, and therefore, it is necessary to amplify cfDNA in the SEM using single cell Whole Genome Amplification (WGA) techniques such as MALBAC and MDA prior to SNP microarray/NGS data analysis. It is known in the art that WGA amplification at the single cell level introduces a significant amount of technical noise, resulting in significant BAF distortion. The same problem exists for whole genome amplification by SEM. Furthermore, the presence of contaminating DNA in the SEM may further exacerbate the problem. For example, it has been reported that using MDA amplification on SEM, while resulting in 97% amplification, only 2% of the amplified samples produced reliable PGT-A results. Moreover, it has also been reported that even after the cumulus removal treatment and washing, there is still a possibility that contamination with maternal DNA occurs in SEM. This results in contamination problems that make it difficult to identify SEM samples according to conventional BAF values.
In order to solve the above problems, in the method of the present invention, parent pollution tendency statistics S effective for eliminating interference information caused by single cell whole genome amplification are constructed by classifying and counting parent and maternal tendencies of parent and maternal alleles using Parent Allele Frequency (PAF) and/or Maternal Allele Frequency (MAF) values of parent and unequal homozygote alleles of SEM samples, and using the ratio of the number of maternal tendencies to the number of maternal tendencies MC . By and with a predetermined S MC Threshold comparison (e.g., threshold established by reference frame), S constructed using the present invention MC The magnitude can specifically and sensitively identify the parent/parent contamination propensity of the sample.
In one embodiment, for S MC The calculated genetic variation loci are the bi-allelic loci of the father and mother having respectively different homozygous alleles, i.e., the locus where the father is the AA genotype and the mother is the BB genotype, and the locus where the father is the BB genotype and the mother is the AA genotype. In a further preferred embodiment, at a parental non-homozygous allele locus of a selected DNA fragment, the parent allele frequency and/or parent allele frequency of the SEM sample at said locus is determined.
In a preferred embodiment, BAF values for selected DNA fragments are obtained at parental non-homozygous allele sites of said loci of SEM samples based on, for example, SNP array analysis, and the parent allele frequencies and/or maternal allele frequencies of the samples at said loci are determined based on said BAF values:
-at a locus where the father is AA genotype and the mother is BB genotype, the maternal allele frequency MAF of the SEM sample at said locus is the BAF value of said locus;
At a locus where the father is of BB genotype and the mother is of AA genotype, the father allele frequency PAF of the SEM sample at said locus is the BAF value of said locus.
In a further preferred embodiment, at a selected parental different homozygous allele locus of the DNA segment, based on, for example, a sequencing analysis of NGS, an allelic depth value (AD) of the SEM sample at the parent and parent alleles of said locus is obtained, i.e. AD Father And AD (analog to digital) Mother and mother And determining the parent allele frequency and/or the maternal allele frequency of the sample at the locus based on the AD value:
wherein SEM samples are at maternal allele frequencies at the locus; maf=ad Mother and mother /(AD Father +AD Mother and mother );
Wherein SEM samples are at a parent allele frequency of the locus; paf=ad Father /(AD Father +AD Mother and mother )。
The loci can be classified according to the foregoing "methods of the invention" section, based on MAF and/or PAF allele frequencies determined by BAF values and/or AD values. In a preferred embodiment, in an SEM sample, at a parental different homozygous allele locus of the selected DNA segment, the locus of the Parent Allele Frequency (PAF). Gtoreq.b (e.g., 0.6) and/or the parent allele frequency (MAF). Gtoreq.a (e.g., 0.4) is classified as a parent predisposition locus by comparison with classification thresholds a and b; the sites with a Parent Allele Frequency (PAF). Ltoreq.a (e.g.0.4) and/or a parent allele frequency (MAF). Gtoreq.b (e.g.0.6) are classified as parent predisposition sites. The classification thresholds a and b may be defined as described in the "methods of the invention" section. Preferably, the classification threshold a may be any value between 0.1-0.4 or preferably 0.2-0.4 or more preferably 0.3-0.4, and the classification threshold b may be any value between 0.6-0.9 or preferably 0.6-0.8 or more preferably 0.6-0.7, provided that a+b=1.
In a preferred embodiment, the parental propensity statistics are, i.e., the parental pollution propensity values S MC The ratio of the number of parent source predisposition sites in the SEM sample to the parent source predisposition sites in the SEM sample was calculated.
In one embodiment, S MC The threshold was established using SEM reference frames without parental propensity. In one embodiment, the reference frame consists of 1-40 or more parent-less prone SEM samples, e.g., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or more. Preferably, the reference SEM samples are collected at the same or similar times in the same or similar manner as the test SEM samples. Preferably, the reference SEM sample and the test SEM sample are obtained using the same single cell whole genome amplification method for analysis.
In one embodiment, S of SEM reference frame MC The values are approximately normally distributed. In one embodiment, parental bias statistics S for SEM reference frame samples are calculated by combining genomic allele information of corresponding parents of the reference SEM, establishing parental different homozygous allele loci for the reference frame SEM samples MC . In yet another embodiment, reference frame S is utilized MC Mean.+ -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the parental predisposition S MC A threshold value. If S of SEM sample MC If the threshold value is larger than the upper limit of the threshold value, the SEM sample is prompted to have a parent source pollution tendency; if S of SEM sample MC If the threshold value is smaller than the upper limit, the SEM sample is indicated to have a father source pollution tendency.
In a preferred embodiment S MC The upper threshold is 1.26, S MC The lower threshold is 0.80; if S of SEM sample MC >1.26, indicating that the SEM sample has a parent source pollution tendency; if S of SEM sample MC <0.80, then suggest that the SEM sample is prone to parent contamination.
In some embodiments, the methods of the invention can detect 1-90% of the contamination of the parent DNA in an SEM sample. Preferably, the parent DNA contamination is less than 50%,40%, more preferably less than 10%, or less than 9%,8%,7%,6%,5%,4%,3%,2% or 1%, or less than 0.1% or less. In a preferred embodiment, the parent DNA contamination in the SEM sample is parent DNA contamination, and preferably the parent contamination level is no more than 10% or no more than 5%. In another preferred embodiment, the parent DNA contamination in the SEM sample is parent DNA contamination, and preferably the contamination level is no more than 10% or no more than 5%.
In some embodiments, the methods of the invention can detect contamination with parental DNA in SEM samples containing trace amounts of fetal DNA. In some embodiments, the ratio of embryo DNA to parent contaminating DNA in the SEM sample is, for example, 1:9 to 9:1. In other embodiments, the ratio of embryo DNA to parent contaminating DNA in the SEM sample is less than 1:5, less than 1:6, less than 1:7, less than 1:8, or less than 1:9.
In one embodiment, whole genome amplification is performed on the fetal micro-DNA present in the SEM broth sample prior to analysis of the SEM sample data. The amplification method adopts a single-cell amplification strategy, and the specific method is not limited. In a preferred embodiment, SEM amplification is performed using MALDBAC. More preferably, use is made ofWhole genome amplification was performed.
In one embodiment, genomic DNA (gDNA), e.g., 1ug, e.g., 200ng or more, of a parent or family of an SEM-corresponding embryo is extracted for determining sequence information (e.g., genetic variation sites such as SNP site information) of the parent genome. In one embodiment, gDNA is extracted from the peripheral blood of parents for genetic information analysis.
In one embodiment, the SEM samples are SEM collected on any one of the culture days selected from D1 to D6. In one embodiment, the IVF is cultured in vitro prior to collection The culture solution is not subjected to the treatment of changing the culture solution. In yet another embodiment, the IVF is subjected to a medium change treatment during in vitro culture prior to collection, for example, a change on day D3 or day D5, or a secondary change on days D3 and D5. In one embodiment, the SEM is collected about 2-3 days, such as two days, or 1-24 hours, such as 5-10 hours or 6-7 hours, after the last change of liquid. In yet another embodiment, for extracting a parental predisposition S MC The SEM broth of the amount is a D1-D6, such as D1-D3 or D3-D5, or D4-D6, or D5-D6 embryo broth, such as a blastocyst broth, especially a D3-D5, D4-D6, D4-D5, or D5-D6 blastocyst broth. In some embodiments, the SEM sample of the invention is a mixture of IVF blastula culture fluid and blastula cavity fluid.
In some embodiments, the SEM sample of the invention is SEM broth of ICSI (intracytoplasmic sperm injection) embryos. The culture broth may be aspirated from embryo cultures fertilized using intracytoplasmic sperm injection (ICSI), preferably on days 3-10, preferably 5 of culture, as SEM samples of the invention.
In some embodiments, following zona pellucida removal, embryos are cultured in 0.1ul-1ml of culture medium using a single embryo culture system, and a small amount of culture medium (e.g., about 0.1ul-1ml, e.g., about 0.1ul, 10ul, 20ul, 30ul, 40ul, 50ul, 100ul, 200ul, 500ul, 800ul, 1 ml) is isolated from the culture as an SEM sample of the invention. Preferably, the reference SEM samples are collected in the same or similar manner.
In other embodiments, a small amount of culture fluid (e.g., about 0.1ul-1ml, e.g., about 0.1ul, 10ul, 20ul, 30ul, 40ul, 50ul, 100ul, 200ul, 500ul, 800ul, 1 ml) is isolated from a culture of embryos subjected to surface cleaning of ova or fertilized ova prior to culturing as an SEM sample of the present invention. Preferably, the reference SEM samples are collected in the same or similar manner.
MC Exemplary S-calculation based on genotyping detection
In the process of the present invention, one can resort to the artAny genotyping assay is known to obtain embryo-related genetic variation information from single-cell whole-genome amplicons of a sample to be tested (e.g., SEM broth or TE cells). As an example, the following describes nucleic acid chip-based detection of genetic variation and corresponding S MC Calculating; and detection of genetic variation based on NGS sequencing and corresponding S MC And (5) calculating. While these exemplary methods are preferred, they should not be construed as limiting the invention. Those skilled in the art will appreciate that other genotyping assays may be used in the S of the invention MC Calculated genetic variation information. Those skilled in the art will appreciate that the genetic analysis detection methods and methods of calculating the parental propensity to measure described herein, after suitability adjustment, will also be applicable to the parental propensity to measure S of the present invention POR And S is uneven Is obtained.
MC Gene variation detection and S calculation based on nucleic acid chip
Genetic variation can be detected using any nucleic acid chip known in the art.
After genetic detection of polymorphic sites on a nucleic acid chip (genotyping) platform, the BAF (B Allele Frequency) information obtained for each polymorphic site may be analyzed using polymorphic site analysis algorithms or calculations known in the art, e.g., genome studio software of Illumina, etc. Preferably, BAF is a continuous value, e.g. 0-1, wherein baf=1, indicating that the locus of the sample to be tested is homozygous B allele (BB); baf=0, indicating homozygous A Allele (AA); baf=0.5, indicating a heterozygous site (AB); BAFs that deviate from the classification threshold a (e.g., 0.4) and classification threshold b (e.g., 0.6) represent different parental tendencies of the site.
For example, as shown in FIG. 3, a site with father being AA genotype and mother being BB genotype may be selected, the BAF value of the sample at the site may be calculated, and the BAF value is calculated to be not less than 0.6 (N MB ) And BAF.ltoreq.0.4 (N) PA ) The number of sites in (a); similarly, selecting a locus with father being BB and mother being AA genotype, calculating BAF value of the sample at the locus, and counting BAF less than or equal to 0.4 (N MA ) And BAF is greater than or equal to 0.6 (N) PB ) Is of the order of (2)Counting points; construction of parental predisposition statistic S MC The following are provided:
S MC =(N MB +N MA )/(N PA +N PB )。
if S MC If the pollution is obviously more than 1, prompting that the pollution is prone to the parent source; if S MC If the pollution is obviously less than 1, prompting that father source pollution is prone; if S MC No significant difference from 1 suggests no propensity for parent contamination.
Preferably, S is established by establishing a parental-free predisposition reference frame MC A threshold value; s of the sample MC Value and S MC Threshold comparisons determine the parent/parent contamination propensity of the sample.
MC Genetic variation detection and S calculation based on NGS sequencing
NGS sequencing can be performed according to methods known in the art, including using various commercially available NGS sequencing platforms.
After the NGS sequencing data is obtained, the data analysis can be performed using any genetic variation detection method known in the art. In a preferred embodiment, the genetic variation detection assay is performed using a Genome Analysis Toolkit (GATK) optimization strategy. More preferably, the analysis comprises the steps of: (1) performing data quality control filtering on the original fastq file by using fastp software; (2) sequences were aligned to the reference genome using BWA-MEM algorithm. The reference genome for hg19 was used in the following examples; (3) sequencing and indexing the compared files by utilizing SortSam command and Samtools software of Picard, and finally obtaining a bam file; (4) performing deduplication by using a markdulicates command of Picard; (5) base matrix weight correction was performed using the baserecabinder and ApplyBQSR command fetch Base Quality Score Recalibration (BQSR) of GATK; (6) detecting genetic variation of a single sample by using a GATK Haplotypeller method; (7) the method comprises the steps of performing multi-sample combined gene mutation detection by using the CombineGVCF and GenotypeGCFs methods of GATK; (8) variable heterogeneous weight correction was performed using the variant recalibrator and ApplyVQSR method to obtain Variant Quality Score Recalibration (VQSR).
In some embodiments, genetic variation SNP detection data is obtained from single-cell whole-genome amplicons of SEM samples, and parent gDNA samples of their corresponding embryos, using NGS-based genetic variation detection methods.
After obtaining SNP detection data, a site with a homozygous parent and another homozygous parent can be selected for a bi-allele site as shown in FIG. 4, and SEM samples of the corresponding embryos are calculated, and the Allele Depths (AD) of the two alleles at the site are calculated Father And AD (analog to digital) Mother and mother . Maternal allele frequencies (Maternal Allele Frequency, MAF) and paternal allele frequencies (Paternal Allele Frequency, PAF) were calculated for each site selected: maf=ad Mother and mother /(AD Father +AD Mother and mother );PAF=AD Father /(AD Father +AD Mother and mother ). Counting MAF of the sample to be tested is larger than or equal to a classification threshold b (e.g. 0.6) (N MAF ) The number of bits and the PAF size of the PAF size class threshold b (e.g., 0.6) (N PAF ) Using the following formula to construct a parent tendency statistic S MC
S MC =N MAF /N PAF
S based on calculation MC Values, determine parent and parent contamination tendencies of SEM samples. If S MC About 1, then no parent contamination propensity is indicated; if S MC If the value is greater than 1, prompting that the parent source pollution is prone to occur; if S MC If the value is less than 1, the father source is indicated to be polluted.
Preferably, S is established by establishing a parental-free predisposition reference frame MC A threshold value; s of the sample MC Value and S MC Threshold comparisons determine the parent/parent contamination propensity of the sample.
Offspring heteroploidy detection method
The current second generation sequencing platform cannot detect the ploidy abnormal variation of the whole genome. For single-cell whole genome amplification products, such as MALDBAC amplification products, the chip platform cannot directly acquire the information due to factors such as allele dominant amplification or allele tripping caused by single-cell amplification.
Using the parental predisposition statistic analysis strategy of the present invention, parental predisposition detection at chromosome level (even lower level, e.g. chromosome 10-100M fragment, such as 10M, 40M, 100M) can be performed, suggesting copy number variation (Copy Number Variation, CNV) information; and based on 22 pieces of chromosome information, the whole genome ploidy abnormality of the sample can be known. Thus, the present invention provides methods that can be used to analyze progeny embryos for abnormalities, such as grape embryo abnormalities.
In one embodiment, the methods of the invention are used for progeny heteroploidy detection, wherein the progeny heteroploidy detection is triploid and/or uniphilic diploid or haploid detection. In yet another embodiment, the methods of the invention are used to detect progeny chromosomal ploidy variations selected from the group consisting of a parent trisomy (chromosomal level), a parent trisomy (parent chromosome replication set), a parent uniparent diploid, a parent haploid, and any combination thereof.
In the offspring ploidy analysis method of the present invention, the method is similar to the previous S MC Analysis strategy, but preferably at the chromosomal level, using the Paternal Allele Frequency (PAF) and/or Maternal Allele Frequency (MAF) values of the parental non-homozygous allele loci of a sample, classifying and counting the paternal and maternal tendencies of the loci, constructing a parental tendentiousness statistic S by the ratio of the number of maternal tendentiousness bits to the number of paternal tendentiousness bits uneven And is combined with a predetermined S uneven Threshold comparisons to determine parent or parent tendencies of the child. Preferably S uneven The threshold is determined using a reference frame without a parent propensity. In some embodiments, sample S uneven Significant deviations from the threshold value indicate that progeny are likely to have copy number abnormalities (CNV) in the DNA segment, either from the parent source or from the parent source. For example sample S uneven Values significantly above the threshold value, possibly caused by an increase in maternal genetic material or a decrease in maternal genetic material; otherwise, sample S uneven Values significantly below the threshold value may be caused by an increase in parent material or a decrease in parent material.
In order to further distinguish whether the detected CNV is caused by repetition (dup) or deletion (del), the present inventors have further creatively proposed that in the ploidy analysis method, the steps include: counting the total number of parental non-homozygous alleles (N total ) And the total number of sites predisposed to non-parent and non-parent sources (N LOH ) Calculate N loh At N total The ratio of (3) to (2) to construct a heterozygous site rate S loh . By combining sample S uneven Value and predetermined S uneven Threshold and sample S loh Value and S loh The method can specifically and sensitively identify the ploidy abnormality of the sample and determine the father/mother source of the ploidy abnormality of the filial generation by comparing the threshold values; and can distinguish between uniparent diploids or haploids involved in maternal or paternal chromosome loss.
In a preferred embodiment, the method of the invention comprises the steps of:
construction of the sample under test at the chromosomal level S uneven Statistics and S predetermined based on a parental-less predisposition reference frame uneven Comparing the threshold values;
construction of the chromosome level S of the test sample loh Value and S predetermined based on a parental-less predisposition reference frame loh Comparing the values; if the sample S is tested loh In reference frame S loh Within the threshold value, S uneven Greater than reference S uneven Prompting a parent source trisome or a parent source triploid if the threshold is upper limit; on the contrary, S uneven Less than reference S uneven A threshold lower limit, prompting a parent source trisomy or a parent source triploid;
if S loh Less than reference S loh Threshold lower limit, S uneven Greater than reference S uneven An upper threshold, such as greater than 10,12,15,17,19 or 20, indicative of maternal monoploid diploid or haploid; on the contrary, S uneven Less than reference S uneven An upper threshold, e.g., less than 0.2,0.15,0.1,0.05,0.02 or 0.01, indicates that the parent source is monoproploid or monoproploidPloidy.
In a preferred embodiment, BAF values for selected DNA segments of a sample are obtained at parental non-homozygous allele sites of said loci based on, for example, SPN array analysis, and the parent allele frequency and/or maternal allele frequency of the sample at said loci is determined based on said BAF values:
-at a locus where the father is AA genotype and the mother is BB genotype, the maternal allele frequency MAF of the sample at said locus is the BAF value of said locus;
at a locus where the father is of BB genotype and the mother is of AA genotype, the father allele frequency PAF of the sample at said locus is the BAF value of said locus.
In a further preferred embodiment, at the parental non-homozygous allele sites of the selected DNA segments, allele depth values (AD), i.e.AD, of the parent allele (P) and the parent allele (M) of the sample at the genetic variation sites are obtained based on, for example, sequencing analysis of the NGS Father And AD (analog to digital) Mother and mother And determining the parent allele frequency and/or the maternal allele frequency of the sample at the locus based on the AD value:
Wherein the sample is at maternal allele frequencies at the locus; maf=ad Mother and mother /(AD Father +AD Mother and mother );
Wherein the sample is at the paternal allele frequency of the locus; paf=ad Father /(AD Father +AD Mother and mother )。
The loci can be classified according to the foregoing "methods of the invention" section, based on MAF and/or PAF allele frequencies determined by BAF values and/or AD values. In a preferred embodiment, among the offspring samples, loci having a Parent Allele Frequency (PAF). Gtoreq.b (e.g., 0.6) and/or a parent allele frequency (MAF). Ltoreq.a (e.g., 0.4) are classified as being fatally predisposed loci by comparison with classification thresholds a and b at selected parent different homozygous allele loci; the sites with a Parent Allele Frequency (PAF). Ltoreq.a (e.g.0.4) and/or a parent allele frequency (MAF). Gtoreq.b (e.g.0.6) are classified as parent predisposition sites. The classification thresholds a and b may be defined as described in the "methods of the invention" section. Preferably, the classification threshold a may be any value between 0.1-0.4 or preferably 0.2-0.4 or more preferably 0.3-0.4, and the classification threshold b may be any value between 0.6-0.9 or preferably 0.6-0.8 or more preferably 0.6-0.7, provided that a+b=1.
In a preferred embodiment, the parental predisposition statistic S uneven Calculated as the ratio of the number of parent predisposition sites in the child sample to the number of parent predisposition sites in the sample.
In some preferred embodiments, the genomic polymorphic site and its BAF value of the father and mother as well as the biopsy sample are obtained. In a further preferred embodiment, the method of the invention may comprise the following steps, as illustrated in fig. 5:
-selecting SNP loci of which father is AA genotype and mother is BB genotype, determining BAF value (between 0 and 1) of corresponding biopsy sample at the loci, and counting the number of loci (N) of BAF not less than 0.6 MB ) And BAF.ltoreq.0.4 site number (N PA );
-selecting SNP loci of which father is BB and mother is AA genotype, determining BAF value (between 0 and 1) of biopsy sample at the loci, and counting BAF less than or equal to classification threshold a (e.g. 0.4) (N MA ) And the number of sites (N) of BAF. Gtoreq. Classification threshold b (e.g., 0.6) PB );
Counting the total number of all SNP loci selected (Ntotal), and the total number of loci in these loci (N) having BAF between the classification thresholds a and b (e.g., between 0.4 and 0.6) LOH );
-using the formula: s is S uneven =(N MB +N MA )/(N PA +N PB ) Calculating parent tendency statistic S of sample uneven The method comprises the steps of carrying out a first treatment on the surface of the And optionally
-using the formula: s is S loh =N LOH /N total The heterozygous site ratio of the sample was calculated.
In one embodiment, the invention comprises: creation of S using reference frame without parental predisposition uneven Threshold and optionally S loh A threshold value. In one placeIn one embodiment, the reference frame consists of 1-40 or more samples of euploid biopsy cells that are devoid of a parent predisposition, e.g., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more. Preferably, the reference sample is collected at the same or similar time in the same or similar manner as the test sample. Preferably, the amplification product of the reference sample is obtained by the same single cell whole genome amplification method as the amplification product of the test sample.
In one embodiment, S of the reference frame uneven The values are approximately normally distributed. In one embodiment, the loci of the different homozygous alleles of the parents are selected in combination with genomic allele information of the corresponding parents of the reference biopsy cell sample for use in calculating the reference frame parental predisposition statistic S uneven And optionally heterozygous site-rate S loh Values. In yet another embodiment, reference frame S is utilized uneven Mean.+ -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the parental predisposition S uneven A threshold value. In yet another embodiment, reference frame S is utilized loh Mean.+ -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set S loh A threshold value. In a preferred embodiment S uneven The upper threshold is about 1.75, S MC The lower threshold is about 0.53. In a preferred embodiment S loh The upper threshold is about 0.76 and the lower threshold is about 0.22.
Products, systems and apparatus for practicing the methods of the invention
The invention also provides products, systems and devices for performing the parent predisposition test of a test sample, for detecting the predisposition of a parent source of a test sample, and/or for detecting progeny aneuploidy in a test sample.
Product(s)
In one aspect, the invention provides an apparatus for detecting the propensity of a subject parent to be tested, the apparatus comprising:
at least one processor and at least one memory having code stored thereon that, when executed by the at least one processor, causes the apparatus to be capable of performing the parental predisposition detection method of the present invention, or the parental pollution identification method of the present invention, or the child ploidy abnormality detection method of the present invention.
Preferably, the code, when executed by the at least one processor, causes the apparatus to at least perform:
Receiving sequence information data, for example, genetic variation information (e.g., SNP information) of a test sample (and optionally a reference sample),
-analyzing the parental propensity of the test sample based on the received sequence information data.
In a preferred embodiment, the code, when executed by the at least one processor, causes the apparatus to further perform:
-determining whether the sample is contaminated with an parent source based on the analyzed parent propensity of the sample.
In yet another preferred embodiment, the code, when executed by the at least one processor, causes the apparatus to further perform:
-determining whether there is a ploidy abnormality in the progeny DNA in the test sample based on the analyzed parent predisposition of the test sample.
In a preferred embodiment, the present invention also provides a non-transitory computer readable storage medium having instructions thereon for performing the parental predisposition detection method of the present invention, comprising:
one or more instructions for receiving an input comprising genetic variation information (e.g., SNP information) of a test sample (and optionally a reference sample),
-one or more instructions for analyzing the input genetic variation sequence information to determine a parental predisposition to the test sample;
-optionally, one or more instructions for outputting the parental predisposition of the test sample.
In a preferred embodiment, the medium further has thereon:
-determining, based on the analyzed parent tendencies of the test sample, one or more instructions for the presence or absence of parent contamination of the test sample.
In a further preferred embodiment, the medium further has thereon:
-determining, based on the analyzed parental predisposition of the test sample, whether or not there are one or more instructions for ploidy abnormality of the progeny DNA in the test sample. Preferably, the apparatus of the present invention comprises the following means, or the computer readable storage medium of the present invention carries instructions for executing the following means:
(1) The sequence information data acquisition module: genetic variation sequence information (e.g., SNP information) data for obtaining test samples and/or reference samples;
(2) Allele frequency analysis module: analyzing the sequence information data of module (1) to determine MAF and/or FAF of the sample at the parental non-equal homozygous allele sites of the selected DNA segment;
(3) Parent tendency statistical value determining module: for analysis of MAF and/or FAF data obtained in module (2) to determine a parental propensity value S of a sample over a selected DNA segment POR
(4) Optionally, parental predisposition S POR A threshold determination module for determining S of the reference sample determined by the module (3) POR Determination of parental predisposition S POR A threshold value;
(5) Parent tendency determination module: sample S for determination by comparison module (3) POR And a parent tendency S POR A threshold value, determining a parent tendency of the test sample;
(6) Optionally, the report output module: and (5) processing and integrating the data obtained in the steps (1) - (5) to generate a report.
In a preferred embodiment, the device is used to report a propensity for parent source contamination, such as a parent source or parent source contamination, of a sample (e.g., an SEM sample) to be tested.
In a further preferred embodiment, the device is for reporting ploidy abnormalities of progeny DNA in a test sample, the device further comprising the following modules:
(7) Heterozygous site rate determination module: sequence information data for analysis module (1) to determine the total number of parental non-equal homozygous allele sites and the number of non-parental and maternal predisposition sites of a sample at a selected DNA segment, thereby determining the heterozygosity site rate S of the sample at said segment loh And is combined with heterozygosity S loh Comparing the threshold values;
(8) Optionally, the heterozygosity S loh A threshold determination module for determining S of the reference sample determined by the module (7) loh Determining the heterozygous site rate S loh The threshold value is set to be a threshold value,
(9) Heterozygous site rate comparison module: for determining the sample heterozygosity site rate S of the module (7) loh Comparing threshold values;
(10) A progeny heteroploidy determination module: for determining child heteroploids from the analysis data of modules (5) and (7),
wherein, if S of the test sample containing the progeny DNA loh Heel reference frame S loh Equivalent or slightly smaller, S uneven Is remarkable in>Reference frame S uneven Prompting the child generation to be a parent source trisome or a parent source triploid; on the contrary, S uneven Is remarkable in<Reference frame S uneven The hint child generation is parent source trisomy or parent source triploid;
if the test sample contains progeny DNA S loh Is remarkable in<Reference frame S loh ,S uneven >>Reference frame S uneven The son generation is indicated to be parent source uniparent diploid or haploid; on the contrary, S uneven <<Reference frame S uneven The child generation is suggested to be parent-source uniparent diploid or haploid.
System and method for controlling a system
In one aspect, the invention provides a system for detecting a predisposition to a subject parent, the system comprising means configured to enable the subject parent predisposition detection method of the invention, or the parent pollution identification method of the invention, or the progeny ploidy abnormality detection method of the invention. In one embodiment, an apparatus is configured to:
Receiving an input comprising genetic variation sequence information of a test sample and/or a reference sample,
-performing a parental propensity test of the test sample based on the entered sequence information.
In a preferred embodiment, the apparatus is further configured to:
-determining whether the sample is contaminated with an parent source based on the analyzed parent propensity of the sample; or (b)
-determining whether there is a ploidy abnormality in the progeny DNA in the test sample based on the analyzed parent predisposition of the test sample.
In a preferred embodiment, the device is a device as described above for the detection of the predisposition of a subject parent, the identification of contamination, and/or the detection of progeny heteroploidy.
In the system of the present invention, it may further include:
-amplification means for single cell whole genome amplification, preferably MALBAC amplification, of the test sample and/or the reference sample;
sequence information detection means for performing genetic variation sequence information detection on the amplified product, including, but not limited to, polymorphic site (e.g., SNP) detection, sequencing detection.
In a preferred embodiment, the system is used to determine the propensity of a sample (e.g., an SEM sample) to be contaminated with a parent source, such as a parent source or a parent source.
In a further preferred embodiment, the system is used to determine ploidy abnormalities of progeny DNA in a test sample.
In yet another aspect, the present system may comprise an apparatus comprising:
an amplification unit for single-cell whole genome amplification of the test sample and/or the reference sample;
the detection and analysis unit is used for detecting and analyzing the genetic variation information of the amplification product obtained by the amplification unit;
and a sample parent tendency determining unit for determining the parent tendency of the sample from the genetic variation information obtained by the detecting and analyzing unit.
Preferably, the system of the present invention will contain means for genomic sequence information query, and a programmed memory or medium for allowing a computer to analyze the resulting data. Sequence information query data (including, for example, sequencing datasets, SNP datasets, genetic variation site datasets, genotyping datasets) may be stored datasets, or in the form of "on the fly". As used herein, a "data set" encompasses both types of data sources.
The means for the genomic sequence information query is not particularly limited. In a preferred embodiment, high density SNP chips are used. In another preferred embodiment, higher depth sequencing data of related individuals of the offspring is obtained using a high throughput sequencing apparatus.
The present invention can be executed by a computer. Accordingly, the present invention also provides a computer programmed to perform the above method. The computer typically includes: a CPU interfacing with a computer communication, a system memory (RAM), a non-transitory memory (ROM), and one or more other storage devices such as a hard-board, floppy-disk, CD ROM drive. The computer may also include presentation devices such as a printer, CRT monitor or LCD presenter, and input devices such as a keyboard, mouse, pen, touch screen or voice activated system. The input device may receive data, such as directly from the sequence information query tool through an interface.
Application of computer products, systems and devices of the present invention
In one aspect, the invention also provides the use of the device, system and apparatus according to the invention for performing a parent predisposition test of a test sample, for detecting a predisposition to parent contamination of a test sample, and/or for detecting progeny aneuploidy in a test sample.
In a further aspect, the invention also provides the use of the device, system and apparatus according to the invention in the manufacture of a product for performing a parent predisposition test of a test sample, for detecting a predisposition to parent contamination of a test sample, and/or for detecting progeny aneuploidy in a test sample.
Examples
Materials and methods
Materials:
obtaining peripheral blood of aborted fetal tissues and parents of aborted fetal tissues for extracting fetal gDNA and parents gDNA; IVF blastula trophoblast biopsies and embryo culture media (SEM) were obtained for single cell genome expansion. All studies were informed by institutional ethics committee approval and written informed consent prior to performance.
Blastula biopsy
Blastula 5/6 days after in vitro insemination were selected for biopsy. Firstly, using a laser membrane breaker to make a 10-15 mu m small hole on the opposite side of the cell mass in the blastula to carry out auxiliary hatching, continuously culturing for 4-6 hours, after the trophoblast cells hatch, adopting a laser cutting combined with suction method to carry out biopsy to obtain partial cells, and when the blastula is not hatched, using a biopsy needle to carry out biopsy to suck partial trophoblast cells from the opening of the transparent belt under negative pressure, and continuously culturing the blastula after the biopsy.
SEM culture solution obtaining
1) Redundant discarded embryos were obtained from intracytoplasmic sperm injection (ICSI) under a row microscope.
2) Embryos whose surrounding granulosa cells had been stripped as clean as possible were cultured to Day3, the culture medium was changed, and embryos were placed in microdroplets of 25 μl blastula medium and cultured in an incubator of 5% CO2,5% O2 at 37deg.C. When small amounts of granulosa cells remain in the zona pellucida of the D3 embryo, the granulosa cells need to be removed. A Larix tube, and the particles are removed again; in the blastocyst culture liquid drop, the blowing and suction are repeated, and all the granular cells are removed as much as possible.
3) Transferring the embryo without granulosa cells into new blastula culture solution drop for continuous culture, and when culturing to D4 afternoon, performing liquid exchange and cleaning for 2-3 times:
3 droplets per embryo were washed and transferred into labeled blastocysts dishes for further culture.
In the washing, each embryo uses a special glass tubule, 3 special washing droplets, which cannot be used crosswise.
4) After 2-3 washes, embryos are placed in 25. Mu.l blastocyst culture droplets. All embryos were subjected to a single-droplet culture method in an incubator at 37 ℃,5% co2,5% o 2.
5) When the blastocyst develops to stage 4 (typically D5/D6), a culture of embryos rated 4BC or more and fully expanded is collected.
Tissue gDNA preparation
Genomic DNA was extracted from aborted fetal tissue (not less than 0.5 g) and its father and/or maternal peripheral blood samples (not less than 2 mL) using a genomic DNA extraction kit (Tiangen biochemistry, universal genomic DNA extraction kit, DP 304) according to the product specifications.
The extracted genomic DNA was quantified using a Qubit dsDNA HS Assay kit using Qubit 3.0 (DNA concentration should be 30 ng/. Mu.L or more). Thereafter, gDNA was fragmented using the Yikang DNA fragmentation kit (KT 100804248) according to the manufacturer's instructions.
Preparing a DNA fragmentation reaction system, fully and uniformly mixing, and standing on ice:
the following reaction conditions were set on the PCR instrument, and the sample was put into the instrument to start the reaction:
after completion of the reaction, the sample was taken out, 60. Mu.L of AMpureXP DNA purification beads (Ageneplurt) were added, and the fragmented gDNA was purified according to the manufacturer's instructions. The purified product was quantified using the Qubit dsDNA HS Assay kit.
Preparation of mother source pollution simulation mixed sample
The purified fragmented fetal gDNA and maternal gDNA are mixed in proportion according to the quantitative result after dilution or undiluted, and a simulated mixed sample with maternal DNA accounting for a specified proportion is prepared.
Single cell whole genome amplicon preparation
3-5 IVF blasts or 20ul SEM culture were transferred to 5. Mu.l lysate (30 mM Tris-Cl pH7.8, 2mM EDTA,20mM KCl,0.2%Triton X-100) and single cell whole genome amplification was performed using the chromaInst amplification kit (cat. XK-005-96) from Yikang Gene company by MALDBAC two-step method. The operation steps are as follows:
the cleavage system was formulated as follows:
the following reaction conditions were set on the PCR instrument, and the sample was placed into the instrument to start the cleavage reaction:
after the reaction was completed, a pre-amplification reagent was added to the PCR tube as follows:
/>
The following reaction conditions were set on the PCR instrument, and the sample was put into the instrument to start the reaction:
after the reaction was completed, a second round of amplification reagents were added to the PCR tube as follows:
the following reaction conditions were set on the PCR instrument, and the sample was put into the instrument to start the reaction:
after the reaction was completed, 65. Mu.L of AMpureXP beads were added to the PCR tube, and the amplified product was purified by beads according to the manufacturer's instructions. mu.L of purified amplification product was taken and quantified using Qubit dsDNA HS Assay Kit (Invitrogen, Q32584).
Genotyping assays
Sample processing and signal detection of Illumina Infinium ASA chips were performed using 200ng of purified fragmented gDNA and 100ng of single cell whole genome amplification product according to standard protocols provided by the manufacturer (Standard Operation Protocol, SOP).
After genetic testing, SNP genotyping was performed using a Illumina Infinium ASA chip (Illumina, product number: 20016317). After the chip scan data is obtained, BAF information (0-1 value) of each polymorphic site is obtained by analysis using the command line program iaap-cli and gencall algorithm of Illumina company. The following genotypic data quality control criteria were used: as long as the sample level Call Rate is greater than 30% for subsequent analysis.
Example 1 comparison of BAF distribution in gDNA samples with in single cell whole genome amplification products
Identification of parental predisposition or contamination for gDNA samples that do not require single cell whole genome amplification, such as detection samples from NIPT, it is common to identify whether there is contamination or parental predisposition by allele relative frequency such as BAF (B allele frequency) of the chip.
The general calculation of the proportion of maternal contamination in a gDNA sample from a fetus is shown below.
Based on this general calculation principle, 7 different cases can be classified according to genotyping information of the fetus and mother. The calculation principle is based on the assumption that: maternal allele counts and maternal allele counts of the fetus fit a binomial distribution. Thus, assuming that the maternal contamination ratio in the sample is r, the frequency of B genotype in fetal sample (BAF) = (1-r) x (ratio of B allele in fetal genotype) + r x (ratio of B allele in maternal genotype). For example, if the genotypes of the fetus and mother are AA, baf=0 in the sample; if the genotype of the fetus is AA and the genotype of the mother is AB, baf=0.5r in the sample; and so on. Similarly, the same computational principles can be used to estimate whether a fetus has a particular parent predisposition.
BAF patterns of the following different samples were examined to investigate the applicability of the general calculation principle described above to the different samples:
fetal gDNA sample: extracting a purified fragmented gDNA sample from the fetal tissue of 200ng;
maternal contamination gDNA sample: 200ng of purified fragmented gDNA sample from aborted fetal tissue and incorporating 30% maternal contaminants (where fetal gDNA is 140ng and maternal gDNA is 60 ng);
single cell amplification product samples: single cell amplification product of 3-5 IVF blastula trophoblast biopsies, 100ng.
Samples were processed and signal detected on Illumina Infinium ASA chips according to standard protocols (Standard Operation Protocol, SOP) provided by Illumina manufacturers to obtain chip scan data. And then, carrying out data analysis by using iaap-cli of Illumina and a gencall algorithm to obtain LRR (log R ratio) values and BAF (B allele frequency) distribution modes.
As shown in FIG. 1, a sample of contaminant gDNA formed by adding 30% maternal gDNA to fetal gDNA, whose BAF distribution pattern showed substantial agreement with that expected from the general calculation principles described above, was divided into 7 clusters: the BAF center values for each cluster were 0, 0.15, 0.35, 0.5, 0.65, 0.85 and 1, respectively. This suggests that maternal contamination of fetal gDNA can be judged by routine BAF distribution analysis.
However, comparing the fetal gDNA sample with the amplification product sample, both were found to exhibit a distinct BAF distribution. FIG. 2A shows the BAF distribution results of fetal gDNA samples (panel A of FIG. 2A) and amplified product samples (panel B of FIG. 2A) at selected loci where the parents are different homozygous alleles, such as where the parents are AA, where the mothers are BB, or where the parents are BB, where the mothers are AA. In principle, the genotype at these selected sites should be AB (baf=0.5) in a non-contaminating embryo sample according to mendelian genetic law. As shown in fig. 2A, the BAF distribution of fetal gDNA samples substantially meets this expectation. However, as shown in fig. 2B, for single cell whole genome amplification products from trophoblast cells, AA (baf=0), BB (baf=1), and genotypes between 0-0.5 and 0.5-1 of BAF appear at many sites. Similar events also occur with amplifications of SEM broth. FIG. 2B shows the BAF profile of the ChromInst amplification product of embryo blastocyst culture fluid, wherein panel (A) shows the BAF profile without maternal contamination; (B) The panels show BAF profiles with a parent source contamination ratio of about 30%.
These results indicate that for single cell amplification products, allele relative frequency does not reflect its native biological state due to problems such as allele amplification bias or even allele trip (ADO), which is affected by allele amplification bias. In this case, it is evident that conventional BAF approaches will have difficulty in achieving identification of parental predispositions/parental pollution.
Thus, there is a need to develop a method for efficiently achieving identification of parental predisposition or contamination for single cell whole genome amplification products. EXAMPLE 2SEM culture contamination identification
MC Parental predisposition statistic S determination
And obtaining SNP detection data of a product obtained by amplifying a single-cell whole genome of an SEM sample by using a Yikang ChromInst amplification kit and a parent gDNA sample of an embryo corresponding to the product by using a nucleic acid chip-based genetic variation detection method.
After genetic testing, BAF values for each polymorphic site were obtained for the father and mother as well as for the SEM samples using iaap-cli and gencall algorithms. BAF value is 0-1; wherein, homozygous B allele locus (BB), baf=1; homozygous a allele locus (AA), baf=0; heterozygous allele locus (AB), baf=0.5.
According to the figure 1, selecting SNP locus with father AA genotype and mother BB genotype, determining BAF value (between 0 and 1) of SEM sample on the locus, and counting BAF not less than 0.6 (N MB ) And BAF is less than or equal to 0.4 (N) PA ) Is a bit number of (c). Similarly, SNP loci with BB as father and AA as mother are selected, BAF values (between 0 and 1) of biopsy samples at the loci are determined, and BAF is less than or equal to 0.4 (N MA ) The number of bits of (1) and BAF are not less than 0.6 (N) PB ) Is a bit number of (c).
The formula is used: s is S MC =(N MB +N MA )/(N PA +N PB )
Construction of parent tendency statistic S of SEM sample MC . If S MC If the value is significantly greater than 1, prompting that the parent source tends to be present; if S MC If the parent tendency is obviously smaller than 1, prompting that the parent tendency exists; if S MC No significant difference from 1, no parent trend was indicated (fig. 1).
Establishing a pollution-free tendency reference system
Selecting 35 embryo culture solution samples which are specially treated and confirmed to be free of maternal pollution; and meanwhile, obtaining a peripheral blood sample of the parent of the corresponding embryo.
The embryo culture fluid sample and the peripheral blood sample are collected in the manner and in the amount described in the material method.
After collection, the embryo culture broth was treated as follows and confirmed to be free of maternal contamination by STR identification:
confirm the sample collection of the non-contaminating culture fluid:
day 4 embryo culture, the zona pellucida can be removed by the following procedure to reach morula stage: embryos are transferred from the broth to a hatching medium containing 0.5% pronase covered with 1-2min of oil at 37℃with 5% CO2. Under an optical microscope system, the transparent belt was completely dissolved. After washing 2-3 times with blastula culture solution, embryos are placed in 25 μl blastula culture solution. The cells were cultured in an incubator at 37℃with 5% CO2 and 5% O2.
When the blastocyst develops to stage 4 (generally D5/D6), the embryo is rated to be more than 4BC, and the blastocyst is fully expanded, and the culture solution is collected.
STR identification:
1) Taking 5 mu L of SEM sample, adding 0.5 mu L of proteinase K solution (concentration 20 mu g/. Mu.L), mixing, incubating at 55 ℃ for 15min, and treating at 95 ℃ for 1min;
2) STR identification method refers to the patent: CN107557481a, CN110157812 a.
Single cell whole genome amplification was performed on a validated 10. Mu.L SEM using the Yikang ChromInst amplification kit in the manner described in materials and methods. 100ng of amplified product and 200ng of parent gDNA were taken and subjected to nucleic acid chip detection to obtain SNP data. S was calculated for each of the 35 SEM samples as described above MC Quantity, build reference S MC Density profile.
As shown in fig. 5, reference frame S MC The distribution approximately follows a normal distribution. Thus, S is set by using the mean.+ -. 3x standard deviation MC The upper limit and the lower limit are respectively 1.26 and 0.80. If S MC >1.26, suggesting that genetic material is prone to maternal contamination; if S MC <0.80, suggesting that genetic material is father-source-contaminated.
Sample blending experiment I:
samples of aborted fetal tissue, corresponding father peripheral blood, and maternal peripheral blood were obtained as described in materials and methods, and genomic DNA was extracted to obtain purified fragmented gDNA. After quantification using the Qubit dsDNA HS Assay kit, each sample was diluted to 1 ng/. Mu.L.
Mixing the diluted samples according to the following proportion, and uniformly oscillating for 2-3s to prepare a simulation mixed sample with parent source and parent source DNA accounting for a certain proportion:
all simulated mixed samples were diluted 10-fold to a final concentration of 100 pg/. Mu.L. Thus, for example, M9 and F9 will have a fetal DNA amount of 10pg/uL and a maternal or paternal DNA amount of 90 pg/uL.
1 μl of diluted mock mix samples were used for whole genome amplification using the Yikang ChromInst amplification kit, the procedure being as described in materials and methods. After magnetic bead purification of the amplified product, 2. Mu.L of the purified amplified product was taken and quantitated by Qubit. 100ng of the remaining amplified product, 200ng of corresponding parent gDNA, and Illumina Infinium ASA chip detection; performing data analysis to calculate S of the simulated hybrid sample MC
The results of the parent sample blending experiments are shown in fig. 6. As shown, all simulated hybrid samples S MC The values of the statistics are all larger than a parent source pollution threshold value 1.26 determined according to the reference system; and shows a correlation with the blending amount. S under 10%, 30%, 50%, 70% maternal genetic material blends MC Statistics are 1.32, 2.15, 5.64 and 18.16, respectively; 90% of the parent source is blended S MC The value is 3648.
The blending experiment shows that the interference information (such as ADO interference) caused by single-cell whole genome amplification can be effectively eliminated through the constructed parent tendency statistic and the reference threshold value, and the parent tendency of a sample can be effectively determined; and based on S obtained from this blending experiment MC The value can approximately obtain the parent tendency proportion of the sample and reflect the parent source pollution degree of the sample.
Sample blending experiment II:
a simulated mixed sample (1 ng/. Mu.L) of parent and parent DNA was prepared according to the method described in sample blending experiment I. The simulated mixed sample was 10-fold diluted with blank (i.e., unused) embryo culture medium to a final concentration of 100 pg/. Mu.L.
1 μl of diluted mock mix samples were used for whole genome amplification using the Yikang ChromInst amplification kit, the procedure being as described in materials and methods. After the amplification product is subjected to magnetic bead purification2. Mu.L of the purified amplification product was taken and quantitated by Qubit. 100ng of the remaining amplified product, 200ng of corresponding parent gDNA, and Illumina Infinium ASA chip detection; performing data analysis to calculate S of the simulated hybrid sample MC
EXAMPLE 3 identification of embryo ploidy Using ICSI blastula trophoblast TE cells
uneven loh Parental predisposition magnitude S and heterozygosity site rate S determination
And obtaining SNP detection data from single-cell whole genome amplificates of the blastocyst trophoblast biopsy sample and parent gDNA samples of corresponding embryos by adopting a genetic variation detection method based on a nucleic acid chip.
After genetic detection, using iaap-cli and gencall algorithm to obtain BAF value (value between 0 and 1) of each polymorphic site of father and mother and biopsy sample; wherein, homozygous B allele locus (BB), baf=1; homozygous a allele locus (AA), baf=0; heterozygous allele locus (AB), baf=0.5.
According to FIG. 5, SNP loci with father being AA genotype and mother being BB genotype are selected, BAF value (between 0 and 1) of the corresponding biopsy sample at the loci is determined, and the number of loci (N) of BAF > 0.6 is counted MB ) And BAF.ltoreq.0.4 site number (N PA ). Similarly, SNP loci with BB as father and AA as mother are selected, BAF value (between 0 and 1) of biopsy sample at the loci is determined, and BAF is less than or equal to 0.4 (N MA ) And the number of sites (N) of BAF.gtoreq.0.6 PB ). Count the total number of all SNP loci selected (N total ) And the total number of sites having BAF between 0.4 and 0.6 among these sites (N LOH )。
The formula is used: s is S uneven =(N MB +N MA )/(N PA +N PB ) Calculating parent tendency statistic S of sample uneven
The formula is used: s is S loh =N LOH /N total The heterozygous site ratio of the sample was calculated.
Establishing a reference frame without parental tendencies
31 samples of known whole ploidy (46, XX or 46, XY) biopsied ICSI blastula trophoblast cells were obtained. The TE biopsy samples and their parental peripheral blood samples were collected in the manner and amounts described in the materials methods.
Whole genome amplification was performed on whole biopsy samples using the MALBAC amplification procedure described in materials and methods. Sample amplification product (100 ng) and extracted parent peripheral blood gDNA (200 ng) are subjected to gene chip platform detection according to materials and methods, and SNP genotype information is obtained. According to the above, S of each of the 31 biopsy samples was calculated uneven And S is loh Amount, construction of chromosome level S uneven And S is loh And (5) reference system distribution.
FIG. 8 shows S uneven Distribution of reference frame values over each chromosome (a) and its density distribution (B). Based on S in FIG. 8B uneven Density distribution, S is set by mean +3x standard deviation uneven The upper threshold is 1.75; and set S by means of the mean-2 x standard deviation uneven The lower threshold is 0.53.
FIG. 9 shows S loh Distribution of reference frame values over each chromosome (a) and its density distribution (B). Based on S in FIG. 9B loh Density distribution, S is set by mean ± 3x standard deviation loh The upper and lower thresholds are 0.76 and 0.22, respectively.
Ploidy analysis of biopsy cell samples
Ploidy analysis was performed on samples of embryo biopsied cells with known chromosomal level CNV abnormalities (duplications and deletions).
The biopsy cell samples used in this experiment included:
5 biopsy samples of embryo chromosome monomers;
-4 embryonic chromosome trisome biopsy samples;
embryo parent triploid biopsy samples 1.
ICSI blastula trophoblast biopsies were subjected to whole genome amplification using MALPBAC amplification procedures described in materials and methods. The amplified product is divided into two parts, one part (not less than 100 ng) is subjected to library establishment, and the copy number (copy number variation, CNV) variation is detected by using a CNV-Seq method of second-generation sequencing NGS, wherein the specific method is shown in the CN105574361B patent; another part of the sample amplification product (100 ng) and extracted parent peripheral blood gDNA (200 ng) are subjected to gene chip platform detection according to the materials and methods, so as to obtain BAF information of polymorphic sites.
Sequencing CNV detection results for each sample were as follows:
sample of Nuclear type
dup1 47,XN,+22(×3)
dup2 47,XN,+7(×3)
dup3 47,XN,+15(×3)
dup4 47,XN,+22(×3)
del1 45,XN,-21(×1)
del2 45,XN,-2(×1)
del3 45,XN,-18(×1)
del4 45,XN,-13(×1)
del5 45,XN,-22(×1)
The amplification product (100 ng) of the sample and the parent peripheral blood gDNA (200 ng) corresponding to the sample were subjected to gene chip analysis as described in materials and methods, and SNP gene variation data was obtained. Calculating a parental predisposition statistic S at the chromosome level for each biopsy cell sample in the same manner as the reference frame uneven And heterozygous site rate S loh The method comprises the steps of carrying out a first treatment on the surface of the And is matched with S determined based on the reference system uneven Threshold (0.53-1.75) and S loh The threshold (0.22-0.76) is compared.
If sample S loh Heel reference frame S loh Equivalent or slightly smaller, S uneven Is remarkable in>Reference frame S uneven The sample is indicated to be maternal trisomy (chromosome level) or maternal triploid (maternal chromosomes are replicated by one set); on the contrary, S uneven Is remarkable in<Reference frame S uneven Prompting parent trisomy (chromosome level) or parent triploid (parent-derived chromosomes are replicated one more time).
If S loh Is remarkable in<Reference frame Sloh ,S uneven >>Reference frame S uneven Prompting parent source uniparent diploid or haploid; on the contrary, S uneven <<Reference frame S uneven The parent source is suggested to be a parent diploid or haploid.
Via parental predisposition statistics S uneven And heterozygous site rate S loh The determined ploidy results were substantially expected, indicating that embryo ploidy abnormalities could be effectively determined in a small number of biopsied cells using statistical values of parental tendencies.
As shown in FIG. 10, S in known maternal trisomy embryo biopsy cells (dup 1-4) loh Heel reference frame S loh It is quite possible that the first and second heat exchangers,are all within a threshold range; and S is uneven Is remarkable in>Reference frame S uneven
As shown in FIG. 10, in the known chromosome monosomic embryo biopsy cells (del 1-5), S loh Are all remarkable<Reference frame Sloh And parent haploid del2 has S uneven >>Reference frame S uneven The method comprises the steps of carrying out a first treatment on the surface of the Parent haploids del 1 and 3-5 have S uneven <<Reference frame S uneven
As shown in FIG. 11, in the known parent triploid embryo biopsy cells, S of 22 autosomes loh Are all in accordance with reference system S loh Equivalent, and S uneven Is remarkable in<Reference frame S uneven
Some embodiments of the invention
1. A method for determining the parental predisposition of a test sample, wherein said test sample comprises progeny genomic DNA in an amount of not more than 1ng (and preferably about 1-500pg, more preferably 1-100 pg), wherein said method comprises the steps of:
(1) Performing a genetic variation site (preferably SNP site) analysis on single-cell whole genome amplification products of the test sample;
(2) Obtaining the frequency of Maternal Alleles (MAF) and/or the frequency of maternal alleles (PAF) at the parental unequal homozygous loci of the sample at the selected DNA segment (e.g., at the whole genome level, or at the chromosome segment level);
(3) Classifying the genetic locus as a parent predisposition locus, a parent predisposition locus or a non-parent and parent predisposition locus based on the MAF and/or PAF determined in step (2) compared to classification thresholds a and b, wherein classification threshold a is a value of 0.1 to 0.4, classification threshold b is a value of 0.6-0.9, and a+b = 1;
Wherein,
-classifying said locus as a parent predisposition locus if MAF +.a and/or PAF +.b;
-classifying said sites as maternal-predisposed sites if PAF +.a and/or MAF +.b;
-classifying the sites as non-parent and parent predisposition sites if MAF and/or PAF have > a and < b values;
preferably, the classification threshold a is 0.4; and b is 0.6, and
-classifying said locus as a parent predisposition locus if MAF +.0.4 and/or PAF +.0.6;
-classifying said sites as maternal-predisposed sites if PAF +.0.4 and/or MAF +.0.6;
-classifying the locus as a non-parent and parent predisposition locus if MAF and/or PAF have >0.4 and <0.6 value;
(4) Counting the number of parent source bias sites (N MAF ) And parent tendency bit number (N) PAF );
(5) Based on the number of parent source tendency sites (N) MAF ) And parent tendency bit number (N) PAF ) Determining a statistical magnitude of parent tendencies of the sample at the DNA segment level (S POR );
(6) Sample S determined in step (5) POR Value, and parent tendency threshold (i.e., S POR Threshold value) to determine the parental propensity of the test sample at the level of the DNA segment.
2. The method of embodiment 1, wherein S POR The threshold is established using a reference frame without a parent propensity,
preferably, the reference frame consists of 1-40 or more reference samples without a parent predisposition;
preferably, the statistical magnitude of the parental tendency S based on the reference frame POR 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the reference S POR A threshold value;
if S of sample POR Greater than the reference S POR An upper threshold value, suggesting a substantial parent propensity for the test sample to be present on the DNA segment; if S of sample POR Less than the lower threshold, then an indication is made that the subject sample has a parent propensity on the DNA segment.
3. The method of embodiment 1, wherein theSingle cell whole genome amplification was performed using a method selected from the group consisting of: amplification pre-primer extension PCR (Primer extension preamplification PCR, PEP-PCR), degenerate oligonucleotide primer PCR (Degenerate oligonucleotide primer-PCR, DOP-PCR), multiple displacement amplification technique (Multiple Displacement Amplification, MDA), multiple annealing loop cycle amplification technique (Multiple Annealing and Looping Based Amplification Cycles, MALDBAC), blunt-ended or cohesive-ended ligation library construction, and the like,(US20190106738A1);
Wherein, preferably, MALDBAC is used for the single cell whole genome amplification, more preferably, MALDBAC is used The whole genome amplification is performed.
4. The method of embodiment 1, wherein in step (2), BAF values of the test sample at the parental unequal homozygous loci are obtained based on, for example, nucleic acid chip analysis, and the parent allele frequency and/or maternal allele frequency of the sample at the loci are determined based on the BAF values.
5. The method of embodiment 4, wherein the method comprises:
-determining the locus of the father being the AA genotype and the mother being the BB genotype on said DNA segment selected, calculating the BAF value of the test sample at said locus, counting the BAF ∈classification threshold b (preferably 0.6) (N MB ) And BAF. Ltoreq.class threshold a (preferably 0.4) (N PA ) The number of sites in (a); and, determining the locus of BB as father and AA as mother, calculating BAF value of the tested sample at said locus, and calculating BAF less than or equal to classification threshold a (preferably 0.4) (N MA ) And BAF. Gtoreq. Classification threshold b (preferably 0.6) (N PB ) The number of sites in (a);
-calculating the number of parent source propensity sites (N MAF ) And parent tendency bit number (N) PAF ) Wherein
N MAF =N MB +N MA ;N PAF =N PA +N PB
-construction of a statistical quantity S of the parental predisposition of the test sample POR Wherein
S POR =N MAF /N PAF =(N MB +N MA )/(N PA +N PB )。
6. The method of embodiment 1, wherein, in step (2), based on analysis such as NGS sequencing, allele depth values (AD) of the parent allele and the maternal allele of the test sample at the parental non-homozygous locus are obtained on the selected DNA segment, i.e., AD Father And AD (analog to digital) Mother and mother And determining the parent allele frequency and/or the maternal allele frequency of the test sample at the site based on the AD value:
wherein the sample is at maternal allele frequencies at the locus; maf=ad Mother and mother /(AD Father +AD Mother and mother );
Wherein the sample is at the paternal allele frequency of the locus; paf=ad Father /(AD Father +AD Mother and mother )。
7. The method of embodiment 6, wherein the MAF of the test sample is counted at the selected DNA segment is ≡classification threshold b (preferably 0.6) (N MAF ) The number of bits and PAF. Gtoreq. Classification threshold b (preferably 0.6) (N PAF ) S of construction of test sample POR Wherein
S POR =N MAF /N PAF
8. The method of any of embodiments 1-7, wherein the method is used to identify parent DNA contamination in a progeny trace DNA sample, wherein a parent predisposition to the sample indicates the likelihood of the sample being father-source contaminated or mother-source contaminated,
preferably, the sample is SEM embryo culture broth, the method comprising: at the DNA segment (in particular, at the whole genome level), the frequency of Maternal Alleles (MAF) and/or the frequency of Paternal Alleles (PAF) of the test sample at all parental non-equal homozygous loci of the segment is obtained) The method comprises the steps of carrying out a first treatment on the surface of the And constructing a parent pollution tendency value S of the sample MC Wherein
S MC =S POR =N MAF /N PAF
9. The method of embodiment 8, wherein the reference frame consists of SEM samples without a parent propensity, e.g., 1-40 reference SEM samples,
preferably, a parental tendency statistic S is established for each SEM reference frame sample MC And based on reference system S MC Mean ± 1-5 standard deviations (preferably 2-3, especially 3 standard deviations), the parental predisposition S is set MC A threshold value, wherein,
if S of tested SEM sample MC If the threshold value is larger than the upper limit of the threshold value, prompting that the tested SEM sample has a parent source pollution tendency;
if S of tested SEM sample MC If the sample is smaller than the threshold lower limit, prompting that the tested SEM sample has father source pollution tendency;
preferably S MC The upper threshold is 1.26, S MC The lower threshold is 0.80; if S of SEM sample MC >1.26, indicating that the SEM sample has a parent source pollution tendency; if S of SEM sample MC <0.80, the SEM sample is suggested to have a parent contamination propensity.
10. The method of embodiments 8-9, wherein the parent DNA contamination (e.g., parent DNA contamination or parent DNA) is less than 50%,40%, more preferably less than 10%, or less than 9%,8%,7%,6%,5%,4%,3%,2% or 1%, or less than 0.1% or less in the test SEM sample.
11. The method according to embodiments 8-10, wherein the ratio of embryo DNA to parent contaminating DNA in the test SEM sample is 1:9 to 9:1.
12. The method of embodiment 1, wherein the method is for identifying chromosomal ploidy abnormalities in a trace DNA sample of progeny, wherein the parental predisposition of the sample is indicative of the likelihood that chromosomal ploidy abnormalities exist in the progeny, wherein preferably the sample is isolated from a biopsy cell, such as an IVF blastula trophoblast cell,
preferably, the indication of the propensity to be by the parent is selected from: chromosomal ploidy variations of parent trisomy, parent uniparent diploid or haploid, and any combination thereof.
13. The method of embodiment 12, wherein, on the DNA segment (in particular, at the chromosomal level), the frequency of Maternal Alleles (MAF) and/or the frequency of maternal alleles (PAF) of the test sample at the parental unequal homozygosity locus of the segment is obtained; and constructing a parental predisposition value S of the sample uneven Wherein
S uneven =S POR =N MAF /N PAF
14. The method of embodiments 12-13, wherein the method further comprises,
-counting the total number of parental non-equal homozygous sites on said selected DNA segment (N total ) And the number of non-parent and parent tendencies bits (N LOH );
-based on N LOH And N total Determining the heterozygosity site rate (S) of the test sample at the DNA segment level loh ) The method comprises the steps of carrying out a first treatment on the surface of the And
determining the heterozygous site rate of the test sample (i.e., sample S loh Value) and heterozygous site rate threshold (i.e., S loh Threshold value) is compared.
15. The method of embodiments 12-14, wherein the method further comprises:
establishing S using a reference frame without a parental propensity uneven Threshold and optionally S loh A threshold, preferably, the parental-less predisposition reference frame consists of 1-40 or more whole ploidy biopsy samples;
preferably, reference frame S is used uneven Mean.+ -. 1-5 standard deviations (preferably 2-3, especially 3 standard deviations) to set the parental predisposition S uneven Threshold, preferably S uneven The upper threshold is about 1.75 and the lower threshold is about 0.53;
preferably, reference frame S is used loh Mean.+ -. 1-5 standard deviations (preferably 2-3, especially 3Standard deviation) to set S loh Threshold, preferably S loh The upper threshold is about 0.76 and the lower threshold is about 0.22.
16. The method of embodiments 12-15, wherein the method comprises:
construction of the sample under test at the chromosomal level S uneven Statistics and S predetermined based on a parental-less predisposition reference frame uneven Comparing the threshold values;
Construction of the chromosome level S of the test sample loh Value and S predetermined based on a parental-less predisposition reference frame loh Comparing the threshold values;
if the sample S is tested loh In reference frame S loh Within the threshold value, S uneven Greater than reference S uneven Prompting a parent source trisome or a parent source triploid if the threshold is upper limit; on the contrary, S uneven Less than reference S uneven A threshold lower limit, prompting a parent source trisomy or a parent source triploid;
if S loh Less than reference S loh Threshold lower limit, S uneven Greater than reference S uneven An upper threshold, such as greater than 10,12,15,17,19 or 20, indicative of maternal monoploid diploid or haploid; on the contrary, S uneven Less than reference S uneven An upper threshold, e.g., less than 0.2,0.15,0.1,0.05,0.02 or 0.01, indicates that the parent is monoploid or haploid.
17. The method of embodiments 12-16, wherein the parental predisposition value and heterozygous locus rate of the test sample on each of the plurality of autosomes is constructed, and the likelihood that the test sample is a parent source or parent source triploid is determined, preferably,
if the test sample exhibits a parent predisposition and heterozygosity site rate on more than 15 (e.g., 15, 16,17,18,19,20,21, or 22) chromosomes that suggest maternal trisomy, then the test sample is judged likely to be maternal trisomy;
if the test sample exhibits a parent predisposition and heterozygosity site rate on more than 15 chromosomes (e.g., 15, 16,17,18,19,20,21, or 22) that suggest a parent triploid, then the test sample is judged likely to be a parent triploid.
18. An apparatus, device or system, characterized in that,
-optionally, single cell whole genome amplification of the test sample and optionally of the reference sample can be performed;
-optionally, detection of genetic variation (preferably SNP) information of the progeny genome of the obtained single cell whole genome amplification product can be performed, e.g. wherein said information is determined by nucleic acid chip or NGS sequencing;
-the method according to any of embodiments 1-17 can be performed to identify a parental predisposition of the test sample, or to detect a parental contamination of the test sample, or to identify progeny DNA ploidy abnormalities in the test sample, based on the genetic variation information of the test sample and optionally the genetic variation information of the reference frame.
19. The use of the apparatus, device, or system of embodiment 18,
use of a DNA ploidy abnormality in a progeny of a test sample for identifying a parental predisposition of the test sample, or for detecting a parental contamination of the test sample, or for identifying a progeny DNA ploidy abnormality in the test sample, or for preparing a product for identifying a parental predisposition of the test sample, or for detecting a parental contamination of the test sample, or for identifying a progeny DNA ploidy abnormality in the test sample.

Claims (45)

1. A method for determining the parental predisposition of a test sample, wherein said test sample comprises progeny genomic DNA in an amount of not more than 1ng, wherein said method comprises the steps of:
(1) Carrying out genetic variation site analysis on single-cell whole genome amplification products of the sample to be tested;
(2) Obtaining the frequency MAF of the maternal allele and the frequency PAF of the paternal allele of the test sample at the parental unequal homozygote locus on the selected DNA segment;
(3) Classifying the genetic locus as a parent predisposition locus, a parent predisposition locus or a non-parent and parent predisposition locus based on the MAF and PAF determined in step (2) compared to classification thresholds a and b, wherein classification threshold a is a value of 0.3-0.4, classification threshold b is a value of 0.6-0.7, and a+b = 1;
wherein,
-classifying said locus as a parent predisposition locus if MAF +.a and/or PAF +.b;
-classifying said sites as maternal-predisposed sites if PAF +.a and/or MAF +.b;
-classifying the loci as non-parent and parent predisposition loci if MAF and PAF have > a and < b values;
(4) Counting the number N of parent source tendency sites of the test sample on the selected DNA segment MAF And number of parent source tendency bits N PAF
(5) Number of parent source tendency bit N MAF And number of parent source tendency bits N PAF Determining a statistical magnitude S of parent predisposition of the sample at the DNA segment level POR, Wherein the method comprises the steps of
S POR
(6) Sample S determined in step (5) POR Value, and parental predisposition threshold, i.e. S POR Threshold, comparing, determining the parental propensity of the test sample at the DNA segment level,
wherein S is POR Threshold value uses the statistical magnitude of parental tendency S of a reference system without parental tendency POR To establish the set-up of the device,
if S of sample POR Greater than the S POR The upper limit of the threshold value, then suggests that the subject sample has a parent source propensity on the DNA segment; if S of sample POR Less than the S POR The lower limit of the threshold value suggests that the subject sample has a parent propensity on the DNA segment.
2. The method of claim 1, wherein the test sample comprises progeny genomic DNA in an amount of 1-500 pg.
3. The method of claim 1, wherein the test sample comprises progeny genomic DNA in an amount of 1-100 pg.
4. The method of claim 1, wherein in step (1), the genetic variation site analysis is SNP site analysis.
5. The method of claim 1, wherein in step (2) the DNA segment selected is a whole genome, chromosome, or chromosome segment.
6. The method of claim 1, wherein in step (3), the classification threshold a is 0.4; and b is 0.6, and
-classifying said locus as a parent predisposition locus if MAF +.0.4 and/or PAF +.0.6;
-classifying said sites as maternal-predisposed sites if PAF +.0.4 and/or MAF +.0.6;
-classifying the sites as non-parent and parent predisposition sites if MAF and PAF have >0.4 and <0.6 values.
7. The method of claim 1, wherein the reference frame consists of 1-40 or more reference samples without a parent predisposition.
8. The method of claim 1, wherein a parental propensity statistic S for each reference frame sample is determined POR Parental tendency statistical value S based on reference system POR 1-5 standard deviations of the mean of (2) to set the reference S POR A threshold value.
9. The method of claim 8, wherein the statistical magnitude of the parental propensity to become based on the reference frame S POR 2-3 standard deviations of the mean of (2) to set the reference S POR A threshold value.
10. The method of claim 9, wherein the statistical magnitude of the parental propensity to become based on the reference frame S POR Mean.+ -. 3 of (3)Standard deviation to set reference S POR A threshold value.
11. The method of claim 1, wherein the single cell whole genome amplification is performed using a method selected from the group consisting of: PEP-PCR, DOP-PCR, MDA, MALBAC, blunt end or cohesive end ligation pooling or ChromaInst.
12. The method of claim 1, wherein in step (2), based on nucleic acid chip analysis, BAF values of the test sample at the parental unequal homozygous loci are obtained, and the maternal allele frequency and/or maternal allele frequency of the sample at the loci is determined based on the BAF values.
13. The method of claim 12, wherein the method comprises:
-determining the position of the father being AA genotype and the mother being BB genotype on the selected DNA segment, calculating the BAF value of the sample at the position, and counting the number N of positions of the BAF being larger than or equal to the classification threshold b MB And BAF. Ltoreq. The number N of loci of the classification threshold a PA The method comprises the steps of carrying out a first treatment on the surface of the Determining the loci of which father is BB and mother is AA genotype, calculating the BAF value of the sample at the loci, and counting the number N of loci of which BAF is less than or equal to the classification threshold value a MA And BAF. Gtoreq. The number of loci N of classification threshold b PB
-calculating the number of parent source propensity sites N of the test sample on said DNA segment MAF And number of parent source tendency bits N PAF Wherein
N MAF =; N PAF />
-construction of a statistical quantity S of the parental predisposition of the test sample POR Wherein
S POR
14. The method of claim 13, wherein classification threshold a is 0.4 and classification threshold b is 0.6.
15. The method of claim 1, wherein, in step (2), allele-depth AD values, i.e., AD, of the parent allele and the maternal allele of the test sample at the parental unequal homozygote locus are obtained on the selected DNA segment Father And AD (analog to digital) Mother and mother And determining the parent allele frequency and/or the maternal allele frequency of the test sample at the site based on the AD value:
Wherein the parent allele frequencies of the test sample at said locus; maf=ad Mother and mother /(AD Father +AD Mother and mother );
Wherein the sample is at the paternal allele frequency of the locus; paf=ad Father /(AD Father +AD Mother and mother )。
16. The method of claim 15, wherein in step (2), the allele depth value is obtained based on analysis of NGS sequencing.
17. The method of any one of claims 1-16, wherein the method is used to identify parent DNA contamination in a progeny trace DNA sample, wherein the parent predisposition of the test sample indicates the likelihood of parent or maternal contamination of the sample.
18. The method of claim 17, wherein the test sample is SEM embryo culture fluid, the method comprising:
(i) Obtaining the frequency MAF of the maternal allele and the frequency PAF of the paternal allele of the test sample at the parental unequal homozygote locus on the selected DNA segment;
(ii) Counting samples at the time of comparison with classification thresholds a and bNumber of parent source tendency bits N on the segment MAF And number of parent source tendency bits N PAF The method comprises the steps of carrying out a first treatment on the surface of the And
(iii) Determining a statistical magnitude S of parent tendencies of a sample POR I.e. the parent pollution tendency value S MC, Wherein the method comprises the steps of
(iv) Sample S to be determined MC Value, and parental predisposition S established using a reference frame without parental predisposition POR The threshold values are compared to determine the likelihood that the sample is contaminated by a parent source or a parent source.
19. The method of claim 18, wherein the DNA segment is whole genome DNA.
20. The method of claim 18, wherein the reference frame consists of SEM samples without a parental propensity.
21. The method of claim 20, wherein the reference frame consists of 1-40 reference frame SEM samples.
22. The method of claim 20, wherein a parental propensity statistic S is established for each SEM reference frame sample POR I.e. S for each sample MC And based on S of reference frame samples MC Mean value + -1-5 standard deviations, and setting parent tendency S POR A threshold value, wherein,
if S of tested SEM sample MC Greater than the S POR The upper limit of the threshold value indicates that the tested SEM sample has a parent source pollution tendency;
if S of tested SEM sample MC Less than the S POR The lower limit of the threshold value suggests that the tested SEM sample is prone to parent contamination.
23. The method of claim 22, wherein S based on reference frame samples MC Mean value + -2-3Standard deviation, setting parent tendency S POR A threshold value.
24. The method of claim 22, wherein said S POR The upper threshold is 1.26, S POR The lower threshold is 0.80; if S of tested SEM sample MC >1.26, prompting that the tested SEM sample has a parent source pollution tendency; if S of tested SEM sample MC <0.80, the tested SEM sample is prompted to have a propensity for parent contamination.
25. The method of any one of claims 1-16, wherein the method is used for non-diagnostic purposes to identify chromosomal ploidy abnormalities in progeny in a minute DNA sample of the progeny, wherein the parental predisposition of the test sample indicates the likelihood that chromosomal ploidy abnormalities exist in the progeny.
26. The method of claim 25, wherein the test sample is IVF blastula trophoblast cells.
27. The method of claim 25, wherein the indication of predisposition by the parent is selected from the group consisting of: chromosomal ploidy variations of parent trisomy, parent uniparent diploid or haploid, and any combination thereof.
28. The method of claim 27, wherein the indication of predisposition by the parent is selected from the group consisting of: chromosomal ploidy variation of parent, maternal and maternal triploid.
29. The method of claim 25, wherein the method comprises:
(i) Obtaining the frequency MAF of the maternal allele and the frequency PAF of the maternal allele of the sample at the parental unequal homozygote locus on the selected DNA segment;
(ii) Counting the number of parent source predisposition sites N of the test specimen on the segment by comparison with the classification threshold values a and b MAF And fatherNumber of source tendency bits N PAF The method comprises the steps of carrying out a first treatment on the surface of the And
(iii) Determination of the parental predisposition value S of the test sample POR I.e. S uneven, Wherein the method comprises the steps of
(iv) Sample S to be determined uneven Value, and parental predisposition S established using a reference frame without parental predisposition POR The threshold is compared to determine the likelihood that the sample is present with a chromosomal ploidy abnormality.
30. The method of claim 29, wherein the method further comprises,
-counting the total number N of parental non-equal homozygous sites on said selected DNA segment on which the test sample is placed total And number of non-parent and parent tendencies bits N LOH ;
-based on N LOH And N total Determining the heterozygosity site rate S of the test sample at the DNA segment level loh The method comprises the steps of carrying out a first treatment on the surface of the And
-determining the heterozygous site rate of the test sample, i.e. sample S loh Value, and heterozygosity locus rate threshold, i.e. S loh A threshold value, a comparison is made,
wherein the S is established using the reference frame without a parental propensity loh A threshold value.
31. The method of claim 29, wherein a parental propensity statistic S is established for each reference frame sample POR I.e. S for each sample uneven Value and based on S of reference frame samples uneven Mean value + -1-5 standard deviations to set the parental tendency S POR A threshold value.
32. The method of claim 31, wherein S based on reference frame samples uneven Mean value + -2-3 standard deviations to set the parental tendency S POR A threshold value.
33. The method of claim 31, wherein S POR The upper threshold is 1.75, S POR The lower threshold is 0.53.
34. The method of claim 30, wherein S for each reference frame sample is established loh Value and based on S of reference frame samples loh Mean value + -1-5 standard deviations to set S loh A threshold value.
35. The method of claim 34, wherein S based on reference frame samples loh Mean value + -2-3 standard deviations to set S loh A threshold value.
36. The method of claim 34, wherein S loh The upper limit of the threshold is 0.76 and the lower limit is 0.22.
37. The method of claim 29, wherein the DNA segment is a chromosome.
38. The method of claim 29, wherein the parental-less predisposition reference frame consists of 1-40 or more whole-ploid biopsy samples.
39. The method of claim 30, wherein the method comprises:
construction of the sample under test at the chromosomal level S uneven Value and S established based on a parental-free-tendency reference frame POR Comparing the threshold values;
construction of the chromosome level S of the test sample loh Value and S established based on a parental-free-tendency reference frame loh Comparing the threshold values;
(i) If the sample S is tested loh The value is in the reference system S loh Within the threshold value of the threshold value,
(a) When the sample S is tested uneven A value greater than the reference system S POR When the threshold is limited, prompting that the sample is a parent source trisome or a parent source triploid; or,
(b) When the sample S is tested uneven A value less than the reference S POR Prompting that the sample is a parent source trisome or a parent source triploid when the threshold is lower than the threshold;
(ii) If the sample S is tested loh A value less than the reference S loh A lower threshold value limit is set to be a threshold value,
(a) When the sample S is tested uneven A value greater than the reference system S POR When the threshold is limited, prompting that the sample to be tested is parent source uniparent diploid or haploid; or,
(b) When the sample S is tested uneven A value less than the reference S POR And when the threshold value is up, prompting that the tested sample is parent-source uniparent diploid or haploid.
40. A system, characterized in that,
-the method according to any of claims 1-39 can be performed to identify a parental predisposition of the test sample, or to detect a parental contamination of the test sample, or to identify progeny DNA ploidy abnormalities in the test sample, based on the genetic variation information of the test sample and the genetic variation information of the reference frame.
41. The system of claim 40, further capable of performing one or both of the following:
-single cell whole genome amplification of the test sample; and
-detecting genetic variation information of the progeny genome of the obtained single cell whole genome amplification product.
42. The system of claim 41, wherein the genetic variation information of the progeny genome is determined by nucleic acid chip or NGS sequencing.
43. The system of claim 42, wherein the genetic variation information is SNP information.
44. The non-diagnostic use of the system of any one of claims 40-43,
-use for identifying a parent predisposition to a test sample, or detecting a parent contamination of a test sample, or identifying a progeny DNA ploidy abnormality in a test sample.
45. Use of a system according to any one of claims 40-43 in the manufacture of a product for identifying a parent predisposition to a test sample, or detecting a parent contamination of a test sample, or identifying a progeny DNA ploidy abnormality in a test sample.
CN202111536093.6A 2020-12-23 2021-12-15 Method and device for identifying parent tendency of nucleic acid sample Active CN114214425B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011542368 2020-12-23
CN2020115423682 2020-12-23

Publications (2)

Publication Number Publication Date
CN114214425A CN114214425A (en) 2022-03-22
CN114214425B true CN114214425B (en) 2024-01-19

Family

ID=80702562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536093.6A Active CN114214425B (en) 2020-12-23 2021-12-15 Method and device for identifying parent tendency of nucleic acid sample

Country Status (1)

Country Link
CN (1) CN114214425B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI807861B (en) * 2022-06-15 2023-07-01 中國醫藥大學 Method for identifying affinity of taiwanese population and system thereof
CN115433777A (en) * 2022-10-26 2022-12-06 北京中仪康卫医疗器械有限公司 Integrated identification method for CNV, SV and SGD abnormalities and abnormal sources of embryos
CN116497106B (en) * 2023-06-30 2024-03-12 北京大学第三医院(北京大学第三临床医学院) Identification method for maternal pollution in prenatal diagnosis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107557481A (en) * 2017-10-10 2018-01-09 苏州绘真医学检验所有限公司 The detection trace mixing reagent of people's source DNA 13 CODIS str locus seats of sample, kit and its apply
CN110157812A (en) * 2019-05-29 2019-08-23 苏州市公安局刑事科学技术研究所 Composite amplification reagent kit that is a kind of while detecting autosome and Y chromosome str locus seat

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130196862A1 (en) * 2009-07-17 2013-08-01 Natera, Inc. Informatics Enhanced Analysis of Fetal Samples Subject to Maternal Contamination
CA3160848A1 (en) * 2011-02-24 2013-03-28 The Chinese University Of Hong Kong Molecular testing of multiple pregnancies

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107557481A (en) * 2017-10-10 2018-01-09 苏州绘真医学检验所有限公司 The detection trace mixing reagent of people's source DNA 13 CODIS str locus seats of sample, kit and its apply
CN110157812A (en) * 2019-05-29 2019-08-23 苏州市公安局刑事科学技术研究所 Composite amplification reagent kit that is a kind of while detecting autosome and Y chromosome str locus seat

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Salvage of fetal karyotype information from SNP array data obtained from products of conception with maternal cell contamination;Karin Sasaki等;PRENATAL DIAGNOSIS;第1-7页 *

Also Published As

Publication number Publication date
CN114214425A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN114214425B (en) Method and device for identifying parent tendency of nucleic acid sample
US9639657B2 (en) Methods for allele calling and ploidy calling
CN113436680B (en) Method for simultaneously identifying chromosome structural abnormality and carrier state of pathogenic gene of embryo
WO2021073604A1 (en) Method and system for clearing noisy genetic data, phasing haplotype, and reconstructing offspring genome, and use thereof
US20100160717A1 (en) In vitro fertilization
CN114480610A (en) Method for detecting translocation fragment monomer or trisomy in latent equilibrium translocation carrier embryo
AU2020296108B2 (en) Systems and methods for determining pattern of inheritance in embryos
US20160371432A1 (en) Methods for allele calling and ploidy calling
JP7446343B2 (en) Systems, computer programs and methods for determining genome ploidy
CN114480609A (en) Method for identifying chromosome insertion translocation carrying embryo and normal embryo
Soler et al. Rescuing monopronucleated-derived human blastocysts: a model to study chromosomal topography and fingerprinting
CN114480611A (en) Method for identifying diseased embryo and normal embryo of CNV microdeletion and microdropping syndrome
CA3143723C (en) Systems and methods for determining pattern of inheritance in embryos
US20240185957A1 (en) Methods for allele calling and ploidy calling
Tian et al. Preimplantation genetic testing in the current era, a review
Luo et al. Pre-implantation genetic diagnosis for a family with Usher syndrome through targeted sequencing and haplotype analysis
Dong et al. PGT for human blastocysts with potential parental contamination using quantitative parental contamination testing (qPCT): an evidence-based study
KULLER Molecular Genetic Technology
Li et al. Whether the log-likelihood ratio-based IVF-PGTA assay is a more efficient method?
WO2017124214A1 (en) Method for detecting chromosome robertsonian translocation
CN117238375A (en) Detection system, device and method for analyzing chromosome aneuploidy and parental pollution of embryo
CN117925820A (en) Method for detecting variation before embryo implantation
CN115287369A (en) Single cell sequencing based non-single sperm determination method
Katz-Jaffe Preimplantation Genetic Diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant