WO2022027212A1 - 基于NGS-trio的单亲二倍体检测方法及应用 - Google Patents

基于NGS-trio的单亲二倍体检测方法及应用 Download PDF

Info

Publication number
WO2022027212A1
WO2022027212A1 PCT/CN2020/106716 CN2020106716W WO2022027212A1 WO 2022027212 A1 WO2022027212 A1 WO 2022027212A1 CN 2020106716 W CN2020106716 W CN 2020106716W WO 2022027212 A1 WO2022027212 A1 WO 2022027212A1
Authority
WO
WIPO (PCT)
Prior art keywords
sites
mutation
inheritance
uniparental
conform
Prior art date
Application number
PCT/CN2020/106716
Other languages
English (en)
French (fr)
Inventor
刘晶星
于世辉
喻长顺
向丽娜
陈白雪
Original Assignee
广州金域医学检验中心有限公司
广州金域医学检验集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州金域医学检验中心有限公司, 广州金域医学检验集团股份有限公司 filed Critical 广州金域医学检验中心有限公司
Priority to US18/019,858 priority Critical patent/US20230282307A1/en
Priority to PCT/CN2020/106716 priority patent/WO2022027212A1/zh
Publication of WO2022027212A1 publication Critical patent/WO2022027212A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the invention relates to the technical field of bioinformatics analysis, in particular to an NGS-trio-based uniparental diploid detection method and application.
  • Genomic imprinting also known as genetic imprinting, is a genetic process that marks a gene or genomic domain with information about its parental origin through biochemical pathways. Such genes are called imprinted genes, and their expression depends on the source of the chromosome (paternal or maternal) on which they are derived, and whether the gene is silenced (mainly methylation) on the chromosome from which it is derived. Some imprinted genes are expressed only from the maternal chromosome, while others are only expressed from the paternal chromosome.
  • UPD Uniparental diploidy
  • a pair of homologous chromosomes are derived from the paternal and maternal parents, respectively, and uniparental diploidy (UniParental Disomy referred to as UPD) refers to a pair of homologous chromosomes (or parts of chromosomes) derived from the same parent. , if these segments contain imprinted genes, it will lead to disordered gene expression.
  • the current method for diagnosing UPD is to detect whether the methylation levels in the same segment of a pair of homologous chromosomes are consistent.
  • UPD is caused by the fact that two homologous chromosomes are not separated during meiosis, resulting in gametes with abnormal chromosome copy numbers. Compared with normal gametes, there is one copy, and abnormal gametes have 2 or 0 copies. This in turn produces a zygote (trisomy or monosomy) with abnormal copy number.
  • trisomy rescue as shown in Figure 1, that is, a chromosome is randomly lost; or through monosomy rescue, as shown in Figure 2, by duplicating a single chromosome to change back to euploidy.
  • the three-body rescue has a 1/3 probability of producing UPD, while the single-body rescue must produce UPD.
  • the method of using SNP chip has the disadvantage of high cost, and its target probe is a polymorphic site, which cannot detect other pathogenic micromutations (point mutations, microindels, etc.) at the same time;
  • the chromosomal genetic origin of the proband can be directly inferred, so as to directly determine whether UPD (rather than LOH) Indirectly speculate UPD), improving the diagnostic positivity rate without adding any cost.
  • a method for detecting uniparental diploidy based on NGS-trio comprising the following steps:
  • Mutation site screening select the mutation sites that meet the predetermined conditions in each sample, and define them as the qualified mutation sites of the sample, and the mutation sites removed by screening are positioned as the unqualified mutation sites of the sample;
  • Merge locus data Take the union of the unqualified mutation sites of all samples in the same group of trio samples, obtain the chromosome coordinates of each unqualified mutation site in the union, and remove the unqualified sites from the qualified sites of each sample. Mutation sites with consistent site coordinates; then, based on the remaining qualified mutation sites in the group of samples, the genotypes at the non-mutation-free sites are genotyped as homozygous sites consistent with the reference sequence;
  • Inheritance pattern classification classify the inheritance pattern of the trio combination of each mutation site, and divide the mutation sites into: sites that conform to biparental inheritance, sites that only conform to uniparental inheritance, and sites that do not conform to the law of inheritance;
  • Judgment of uniparental fragments If the coverage of continuous loci that only conform to single-parent paternal inheritance exceeds the predetermined value, it is judged as a fragment of single-parent paternal origin; if the coverage of continuous loci that only conform to single-parent maternal inheritance exceeds the predetermined value, then judge Fragments of single parent origin;
  • Pathogenic UPD screening Check whether the above UPD segment covers the imprinted gene or the corresponding band. If there is no coverage, it is judged as benign UPD, if there is coverage, it indicates the risk of pathogenic UPD.
  • the above method can directly infer the chromosomal inheritance of the proband. source, so as to directly determine whether it is UPD, and improve the positive rate of diagnosis without adding any cost.
  • NGS sequencing data can be either whole exome sequencing data or whole genome sequencing data.
  • the mutation site is selected according to the following method:
  • the remaining sites are mutation sites that meet the predetermined conditions.
  • chr1:69849G>A Het type is chr1:69849[A/G ], chr1:69849G>A, and the Hom type was chr1:69849[A/A].
  • the type is chr1:69849[A/G/T], that is, there are more than two types, and this locus needs to be removed.
  • the mutation site that meets the predetermined conditions needs to meet all the screening conditions at the same time, and does not meet all the removal conditions.
  • the genotype frequency and gene frequency at a locus in the population will be It remains the same from generation to generation, in a state of genetic equilibrium. Therefore, false positive sites can be excluded by the chi-square test.
  • the frequency of a locus AA-AB-BB is regular.
  • the allele frequency of A is 0.4
  • the allele frequency of B is 0.6.
  • the theoretical value of the number of people with genotype AA is 1600, BB is 3600, AB is 4800, use the actual number of these genotypes in the population database and the theoretical number to do a chi-square test, and exclude the actual number and the theoretical number. point).
  • the high-quality mutation sites are mutation sites that meet the following criteria: GATK-VQSR quality control PASS, total coverage>20X, mutation frequency>25%.
  • the trio samples in the same group include paternal samples, maternal samples and proband samples;
  • the mutation site data with the same coordinates are arranged in the order of proband-parent-parent.
  • Testing according to the method of the present invention must include samples from the proband and parents, and both are indispensable.
  • the loci that conform to the inheritance of both parents are divided into:
  • Type 1 only match the locus inherited from both parents
  • Type 0 A locus that conforms to both biparental inheritance and uniparental inheritance
  • Type 3F a site that can only be rescued by the parent monomer
  • Type 2F a site that may be rescued by paternal monosomy or rescued by paternal trisomy
  • Type 3M a site that can only be rescued by a parent monomer
  • Type 2M a site that may be rescued by a maternal monomer or by a maternal trisomy
  • loci consistent with biparental inheritance refer to the loci from which the two alleles of the proband can be found in the parents, including loci that are only consistent with biparental inheritance (i.e., type 1, such as Aa-AA-aa). type), and also includes loci that are consistent with both biparental and uniparental inheritance (ie, type 0).
  • the step of judging uniparental fragments if there are more than 8 consecutive 2F or 3F sites and the coverage exceeds 1Mbp, it is judged to be a fragment derived from a single parental source; if more than 8 consecutive loci are obtained The 2M or 3M type site, covering more than 1Mbp, is judged to be a fragment of uniparental origin.
  • the above-mentioned continuous sites are not divided by type 1 sites in the middle, such as more than 8 consecutive sites of type 2F or 3F, and the middle is not divided by type 1 sites, or more than 8 consecutive 2M or 3M sites.
  • Type 1 loci that are not divided by type 1 loci in the middle.
  • the data judged to be a single parent fragment is compared with the copy number analysis result of whole exome sequencing, and if the copy number analysis indicates that the segment is a single copy, it is judged to be a fragment Missing; otherwise it is judged as UPD.
  • the invention also discloses the application of the above-mentioned NGS-trio-based uniparental diploid detection method in the development or preparation of a pathogenic UPD screening device.
  • the invention also discloses an NGS-trio-based uniparental diploid screening device, comprising: a data acquisition module, a data analysis module and a UPD judgment module;
  • the data acquisition module is used to acquire the NGS sequencing data of the same group of trio samples
  • the data analysis module is used to analyze the above-mentioned sequencing data, and divide the mutation sites into: sites that conform to biparental inheritance, sites that only conform to uniparental inheritance, and sites that do not conform to genetic laws;
  • the UPD judgment module is configured to perform UPD judgment on the mutation site according to a preset rule, and obtain a judgment result
  • the data analysis module analyzes according to the following steps:
  • Mutation site screening select the mutation sites that meet the predetermined conditions in each sample, and define them as the qualified mutation sites of the sample, and the mutation sites removed by screening are positioned as the unqualified mutation sites of the sample;
  • Merge locus data Take the union of the unqualified mutation sites of all samples in the same group of trio samples, obtain the chromosome coordinates of each unqualified mutation site in the union, and remove the unqualified sites from the qualified sites of each sample. Mutation sites with consistent site coordinates; then, based on the remaining qualified mutation sites in the group of samples, the genotypes at the non-mutation-free sites are genotyped as homozygous sites consistent with the reference sequence;
  • Inheritance pattern classification classify the inheritance pattern of the trio combination of each mutation site, and divide the mutation sites into: sites that conform to biparental inheritance, sites that only conform to uniparental inheritance, and sites that do not conform to the law of inheritance;
  • the UPD judgment module analyzes according to the following steps:
  • Judgment of uniparental fragments If the coverage of continuous loci that only conform to single-parent paternal inheritance exceeds the predetermined value, it is judged as a fragment of single-parent paternal origin; if the coverage of continuous loci that only conform to single-parent maternal inheritance exceeds the predetermined value, then judge Fragments of single parent origin;
  • Pathogenic UPD screening Check whether the above UPD segment covers the imprinted gene or the corresponding band. If there is no coverage, it is judged as benign UPD, if there is coverage, it indicates the risk of pathogenic UPD.
  • the mutation site is selected according to the following method:
  • the remaining sites are mutation sites that meet the predetermined conditions.
  • the high-quality mutation sites are mutation sites that meet the following criteria: GATK-VQSR quality control PASS, total coverage>20X, mutation frequency>25%.
  • the trio samples in the same group include paternal samples, maternal samples and proband samples;
  • the mutation site data with the same coordinates are arranged in the order of proband-parent-parent.
  • the loci that conform to the inheritance of both parents are divided into:
  • Type 1 only match the locus inherited from both parents
  • Type 0 A locus that conforms to both biparental inheritance and uniparental inheritance
  • Type 3F a site that can only be rescued by the parent monomer
  • Type 2F a site that may be rescued by paternal monosomy or rescued by paternal trisomy
  • Type 3M a site that can only be rescued by a parent monomer
  • Type 2M a site that may be rescued by a maternal monomer or by a maternal trisomy
  • loci consistent with biparental inheritance refer to the loci where the two alleles of the proband can be found in the parents, including loci that are only consistent with biparental inheritance (i.e., type 1, such as Aa-AA-aa). type), and also includes loci that are consistent with both biparental and uniparental inheritance (ie, type 0).
  • the step of judging uniparental fragments if there are more than 8 consecutive 2F or 3F loci and the coverage exceeds 1Mbp, it is judged as a fragment derived from a single parental source; if more than 8 consecutive loci are obtained The 2M or 3M type site, covering more than 1Mbp, is judged to be a fragment of uniparental origin.
  • the data judged to be a single parent fragment is compared with the copy number analysis result of whole exome sequencing, and if the copy number analysis indicates that the segment is a single copy, it is judged to be a fragment Missing; otherwise it is judged as UPD.
  • the present invention also discloses a storage medium, wherein the storage medium includes a stored program, and the program implements the functions of the above-mentioned modules.
  • the invention also discloses a processor, which is used for running a program, and the program realizes the functions of the above modules.
  • the present invention has the following beneficial effects:
  • a method for detecting uniparental diploidy based on NGS-trio of the present invention can determine whether UPD occurs and whether UPD occurs in High-risk imprinting areas without additional experimentation and labor costs.
  • this method can also be used to assist in determining the heterozygous deletion of large fragments, and the density resolution according to the mutation site can reach 1Mbp, which has excellent detection performance.
  • 1 is a schematic diagram of three-body rescue in the background technology
  • Fig. 2 is a schematic diagram of monomer rescue in the background technology
  • Fig. 3 is the flow chart of the detection method of uniparental diploidy based on NGS-trio in embodiment 1;
  • Example 4 is a schematic diagram of a screening device module in Example 2.
  • Example 5 is a schematic diagram of a normal sample in Example 3.
  • Fig. 6 is the analysis schematic diagram of trio sample group NP21S0557-NP21S0558-NP21S0549 in embodiment 4;
  • Fig. 7 is the enlarged schematic diagram of the frame line part in Fig. 4;
  • Example 8 is a schematic diagram of the analysis of trio sample group NP19E0911-NP19E0910-NP19E0912 in Example 4;
  • Fig. 9 is the enlarged schematic diagram of the frame line part in Fig. 6;
  • Figure 10 is a schematic diagram of the analysis of trio sample group NP20E957-NP20E956-NP20E958 in Example 4;
  • Fig. 11 is the enlarged schematic diagram of the frame line part in Fig. 8;
  • Example 12 is a schematic diagram of the analysis of trio sample groups NP21F6166--NP21F6167-NP21F6168 in Example 5;
  • Figure 13 is an enlarged schematic diagram of the frame line part in Figure 10;
  • Figure 14 is a schematic diagram of the analysis of trio sample groups NP19F0315--NP19F0313-NP19F0314 in Example 5;
  • Figure 15 is an enlarged schematic diagram of the frame line part in Figure 12;
  • Figure 16 is a schematic diagram of the analysis of trio sample group NP21F3536--NP21F3567-NP21F3537 in Example 5;
  • FIG. 17 is an enlarged schematic diagram of the frame line part in FIG. 14;
  • Figure 19 is an enlarged schematic diagram of the frame line part in Figure 16;
  • Figure 20 is a schematic diagram of the analysis of trio sample groups NP19E0056--NP9E0057-NP9E0055 in Example 6;
  • Figure 21 is an enlarged schematic diagram of the frame line part in Figure 18;
  • the abscissa is the chromosome number, and the lower part of the figure is the proportion of continuous homozygous fragments to the entire chromosome length.
  • the half part is the distribution of mutation sites on each chromosome;
  • the schematic diagrams of different types of loci on each chromosome are as follows from left to right: the cross-shaped unInherit_2 refers to the -2 type locus, and the dot unInherit_1 refers to the type-2 locus.
  • the diamond Norm refers to the normal locus
  • the solid line exome_bed refers to the whole exome sequencing coverage
  • imprint location refers to the imprinted segment
  • imprint gene refers to the imprinted gene range
  • the inverted triangle Mather refers to the uniparental maternal genetic locus ( 3M and 2M)
  • the right triangle Father refers to the uniparental paternal genetic loci (3F and 2F).
  • a method for detecting uniparental diploidy based on NGS-trio includes the following steps:
  • NGS sequencing data of the same group of trio samples. It can be understood that the NGS sequencing data may be whole exome sequencing data or whole genome sequencing data.
  • the proband sample, the paternal sample and the maternal sample are required to be complete.
  • the mutation sites that meet the predetermined conditions in each sample are selected respectively, which are defined as the qualified mutation sites of the sample, and the mutation sites removed by screening are positioned as the unqualified mutation sites of the sample. Filter by:
  • the heterozygous site removes sites with a mutation frequency higher than 70%, and the homozygous site removes sites with a frequency lower than 85%;
  • the genotypes at the non-mutation-free sites are complemented as homozygous sites consistent with the reference sequence; for example, the proband chr1:69849[A/G], the father chr1:69849[A/A], the mother has no mutation at this position. Since the reference sequence at this position is G, the mother is typed chr1:69849[G/G] at this position.
  • trio combinations of mutation sites can generally be obtained from whole exome sequencing data. And sorted in the following way, the trio combination sequence of the mutation site is: proband-parent-mother, such as Aa-AA-aa, the proband is Aa, the father is AA, and the mother is aa.
  • the trio combination of each mutation site was classified into the inheritance pattern, and the mutation sites were divided into: sites that conform to biparental inheritance, sites that only conform to uniparental inheritance, and sites that do not conform to the law of inheritance. Specifically:
  • Locus consistent with biparental inheritance that is, the source of the two alleles of the proband can be found in the parents, of which the Aa-AA-aa type must be inherited from both parents, and this type of locus is marked as type 1 (only in line with both parents). Inherited locus), other types such as Aa-Aa-Aa, AA-AA-Aa are also in line with biparental inheritance, but also in line with uniparental inheritance, this locus cannot be used as the basis for any judgment, marked as type 0 ( both biparental and uniparental loci).
  • the loci that only conform to uniparental inheritance that is, the two alleles of the proband can only be inherited from one parent.
  • AA-AA-aa and AA-Aa-aa types there are two cases: AA-AA-aa and AA-Aa-aa types , in which AA-Aa-aa can only be produced by the aforementioned monomer rescue, which is marked as type 3F, and AA-AA-aa may be produced by either monomer rescue or trisomy rescue, marked as type 2F; Similarly, if inherited from the mother, the corresponding types are marked as 3M and 2M.
  • loci that do not conform to the laws of inheritance If it is a sporadic locus, it may be caused by gene mutation and sequencing errors in the genetic process. If it is a large area, the possibility that the parents are not biological is considered. There are two cases: type AA-aa-aa, in which both parents are not biological, marked as type -2; type Aa-aa-aa, in which one parent is not biological, marked as type -1.
  • the predetermined value (threshold) of loci that do not conform to the law of inheritance is set to 800.
  • the single-parent paternal source/maternal source fragment is determined according to the following method: reaching more than 8 consecutive 2F or 3F loci (the middle is not divided by 1 type loci), and the coverage exceeds 1Mbp, that is, Fragments that are judged to be of single parent origin; similarly, if there are more than 8 consecutive 2M or 3M loci (the middle is not divided by type 1 loci), and the coverage exceeds 1Mbp, it is judged to be a fragment of single parent origin. .
  • CNV copy number variation
  • a screening device for uniparental diploidy based on NGS-trio includes: a data acquisition module, a data analysis module and a UPD judgment module.
  • the data acquisition module is used to acquire NGS sequencing data of trio samples of the same group.
  • the data analysis module is used to analyze the above-mentioned sequencing data, and the mutation sites are divided into: sites that conform to biparental inheritance, sites that only conform to uniparental inheritance, and sites that do not conform to genetic laws; the data analysis module is implemented according to the implementation of Steps 2 to 4 in Example 1 are analyzed.
  • the UPD judging module is configured to perform UPD judgment on the mutation site according to a preset rule to obtain a judgment result; the UPD judging module makes judgment according to Steps 5 to 8 in Embodiment 1.
  • a uniparental diploid screening based on NGS-trio is performed with a certain group (NP19E1936-NP19E1937-NP19F0086) clinical samples, and the screening device of Example 2 is used.
  • a uniparental diploid screening based on NGS-trio is exemplified by 3 groups of clinical samples, and the screening device of Example 2 is used.
  • Trio sample group NP21S0557-NP21S0558-NP21S0549.
  • Trio sample group NP19E0911-NP19E0910-NP19E0912.
  • Trio sample group NP20E957-NP20E956-NP20E958.
  • a uniparental diploid screening based on NGS-trio is exemplified by 3 groups of clinical samples, and the screening device of Example 2 is used.
  • Trio sample group NP21F6166--NP21F6167-NP21F6168.
  • Trio sample group NP19F0315--NP19F0313-NP19F0314.
  • Trio sample group NP21F3536--NP21F3567-NP21F3537.
  • a uniparental diploid screening based on NGS-trio is exemplified by 2 groups of clinical samples, and the screening device of Example 2 is used.
  • Trio sample group NP19E1380--NP19E1381-NP19E1382.
  • Trio sample group NP19E0056--NP9E0057-NP9E0055.
  • the above samples are all high-risk pathogenic heterozygous deletions, and the clinical impact is similar to UPD of the opposite source of deletion (for example, the clinical impact of paternal heterozygous deletion is similar to that of maternal UPD).
  • Screening total 792 groups non-biological relationship 5 groups Single parent origin detected 46 (14+32) groups
  • PWS-AS refers to the pathogenic condition caused by chr15-UPD, in which maternal UPD can lead to PWS, paternal UPD can lead to AS,
  • chr15-UPD is a relatively common pathogenic condition. At present, there are corresponding methylation detection methods on the market. Among them, maternal UPD can lead to PWS, and paternal UPD can lead to AS. The 7 cases screened in this example The chr15-UPD was verified by methylation detection, and the results were all matched, indicating that the method of the present invention has high detection result accuracy.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Analytical Chemistry (AREA)
  • Primary Health Care (AREA)
  • Organic Chemistry (AREA)
  • Pathology (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种基于NGS-trio的单亲二倍体检测方法及应用,属于生物信息学分析技术领域。该方法通过获取NGS-trio测序数据,通过分析和判断,可直接推断先证者的染色体遗传来源,从而直接判断是否UPD(而不是通过LOH来间接推测UPD),在不增加任何成本的前提下提高了诊断阳性率。且该方法还可以用于辅助判断大片段的杂合缺失,而根据突变位点的密度分辨率可以达到1Mbp,具有优异的检测性能。

Description

基于NGS-trio的单亲二倍体检测方法及应用 技术领域
本发明涉及生物信息学分析技术领域,特别是涉及一种基于NGS-trio的单亲二倍体检测方法及应用。
背景技术
基因组印记(Genomic imprinting),又称遗传印记,是通过生化途径,在一个基因或基因组域上标记其双亲来源信息的遗传学过程。这类基因称作印记基因,这类基因表达与否取决于它们所在染色体的来源(父系或母系),以及在其来源的染色体上该基因是否发生沉默(沉默机制主要为甲基化)。有些印记基因只从母源染色体上表达,而有些则只从父源染色体上表达。
正常二倍体中,一对同源染色体分别来源于父本和母本,单亲二倍体(UniParental Disomy简称UPD)是指一对同源染色体(或染色体的部分区段)来源于同一个亲本,如果这些区段包含印记基因,则会导致基因表达紊乱。目前诊断UPD的方法为检测一对同源染色体的同一段区间的甲基化水平是否一致。
大部分情况下,UPD是由于减数分裂的时候两个同源染色体没有分离,从而产生染色体拷贝数异常的配子,相比较正常配子里是一个拷贝,异常的配子为2个或0个拷贝,进而产生拷贝数异常的合子(三体或单体)。最后通过三体拯救,如图1所示,即随机丢失一条染色体;或通过单体拯救,如图2所示,复制一份单体染色体来变回整倍体。其中三体拯救有1/3的概率会产生UPD,而单体拯救必定产生UPD。
对于单体拯救产生的UPD,由于会产生整个染色体的纯合,可以通过LOH(loss of heterozygosity,杂合性丢失)间接检出进行推测;而对于三体拯救产生的UPD,虽然由于减数分裂时的重组偶尔也会产生局部LOH,但局部性的LOH的原因较多(如近亲结婚),并不能100%确定UPD。
并且,对于常规技术中用于检测UPD的甲基化检测方法只能处理染色体局部的小片段,而且针对不同区域需要设计不同的实验,效率低速度慢,并不适用于全基因组范围的筛查;
而采用SNP芯片的方法又存在成本较高的缺陷,且其靶标探针为多态性位点,无法同时检测其他的致病微小突变(点突变、微小插入缺失等);
全外显子测序是目前检测基因缺陷疾病最普遍的方法,可以检测致病性点突变、微小插入缺失、拷贝数变异等,是大多数此类患者的首选项目。但基于单个样本的测序数据,如CN110211630A所公开的,只能通过LOH间接推测UPD。
发明内容
基于此,有必要针对上述问题,提供一种基于NGS-trio的单亲二倍体检测方法,采用该 方法,可直接推断先证者的染色体遗传来源,从而直接判断是否UPD(而不是通过LOH来间接推测UPD),在不增加任何成本的前提下提高了诊断阳性率。
一种基于NGS-trio的单亲二倍体检测方法,包括以下步骤:
数据获取:获取同组trio样本的NGS测序数据;
突变位点筛选:分别选取每一个样本中符合预定条件的突变位点,定义为该样本的合格突变位点,被筛选去除的突变位点定位为该样本的不合格突变位点;
位点数据合并:将同组trio样本中所有样本的不合格突变位点取并集,获取并集中各不合格突变位点的染色体坐标,从每个样本的合格位点中剔除与上述不合格位点坐标一致的突变位点;再根据该组样本中剩余的合格突变位点,互相补充无突变位置处的基因分型为与参考序列一致的纯合位点;
遗传模式分类:对每个突变位点的trio组合进行遗传模式的分类,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点;
亲系判断:如不符合遗传规律的位点小于预定值,则进行后续分析,如不符合遗传规律的位点大于等于预定值,则判断为样本不合格;
判断单亲片段:如只符合单亲父源遗传的连续位点覆盖范围超过预定值,则判断为单亲父源来源的片段;如只符合单亲母源遗传的连续位点覆盖范围超过预定值,则判断为单亲母源来源的片段;
判断UPD:对上述判断为单亲片段的测序数据覆盖深度进行分析,如提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD区段;
致病性UPD筛查:核查上述UPD区段是否覆盖到印记基因或对应条带,如无覆盖,则判定为良性UPD,如有覆盖,则提示具有致病性UPD风险。
随着测序成本的降低,越来越多全外显子测序检测方案选择同时检测先证者及其父母的样本,基于这样的trio家系数据,采用上述方法,可以直接推断先证者的染色体遗传来源,从而直接判断是否UPD,在不增加任何成本的前提下提高了诊断阳性率。
可以理解的,上述NGS测序数据既可以是全外显子测序数据,也可以是全基因组测序数据。
在其中一个实施例中,所述突变位点筛选步骤中,按照如下方法选取突变位点:
1)在NGS测序数据中筛选高质量突变位点;
2)去除位于Y染色体上的突变位点;
3)筛选其中的点突变位点;
4)根据Hardy-Weinberg平衡排除疑似假阳性位点;
5)对于杂合位点去除突变频率高于70%的位点,对于纯合位点去除频率低于85%的的位点;
6)将每个位置的突变分型,去除分型数多于2种的位点;
7)剩余位点即为符合预定条件的突变位点。
在突变分析分析中,由于人是二倍体,一个位置最多2种基因型,多于两种的一般是测 序错误,例如:chr1:69849G>A,Het分型为chr1:69849[A/G],chr1:69849G>A,Hom分型为chr1:69849[A/A]。例如同时有chr1:69849G>A,Het和chr1:69849G>T,Het,则分型为chr1:69849[A/G/T]即多于2种分型,需要去除这个位点。
可以理解的,所述符合预定条件的突变位点需要同时符合所有的筛选条件,并且不符合所有的去除条件。
可以理解的,根据Hardy-Weinberg平衡定律,在一个群体无限大,且又具备随机交配、没有突变没有选择、没有遗传漂变的情况下,群体内一个位点上的基因型频率和基因频率将代代保持不变,处于遗传平衡状态。因此,可通过卡方检验排除假阳性位点。例如,一个位点AA-AB-BB的频率是有规律的,比如本地人群库里1万个人,A的等位基因频率是0.4,B是0.6,那基因型为AA的人数的理论值是1600个,BB是3600个,AB是4800个,用人群库里这些基因型的实际人数跟理论人数做卡方检验,排除实际人数与理论人数偏离过多的位点(即高度疑似假阳性位点)。
常规NGS测序结果中参杂了大量质量不好的位点,对本方法后续的UPD判断流程有很大干扰,如果用全部位点检测的效果较差。因此,本发明通过上述方法选取突变位点,可以提高分析结果的准确性。
在其中一个实施例中,所述突变位点筛选步骤中:
所述高质量突变位点为符合以下标准的突变位点:GATK-VQSR质控PASS、总覆盖>20X、突变频率>25%。
在其中一个实施例中,所述数据获取步骤中,所述同组trio样本中包括父系样本、母系样本和先证者样本;
所述位点数据合并步骤中,将坐标一致的突变位点数据按照先证者-父-母的顺序排列。
按照本发明的方法进行检测,必须包括先证者和父母样本,缺一不可。
在其中一个实施例中,所述遗传模式分类步骤中,将符合双亲遗传的位点分为:
1型:只符合双亲遗传的位点;
0型:既符合双亲遗传、也符合单亲遗传的位点;
将只符合单亲遗传的位点分为:
3F型:只能由父源单体拯救产生的位点;
2F型:可能由父源单体拯救,也可能由父源三体拯救产生的位点;
3M型:只能由母源单体拯救产生的位点;
2M型:可能由母源单体拯救,也可能由母源三体拯救产生的位点;
将不符合遗传规律的位点分为:
-1型:父母中任一方不符合遗传规律;
-2型:父母双方不符合遗传规律。
可以理解的,上述符合双亲遗传的位点指先证者的两个等位基因可在父母中找到来源的位点,其中包括只符合双亲遗传的位点(即1型,如Aa-AA-aa型),也包括既符合双亲遗传、也符合单亲遗传的位点(即0型)。
在其中一个实施例中,所述判断单亲片段步骤中,如达到连续8个以上2F或3F型的位点,覆盖范围超过1Mbp,即判断为单亲父源来源的片段;如达到连续8个以上2M或3M型的位点,覆盖范围超过1Mbp,即判断为单亲母源来源的片段。
可以理解的,上述连续位点至中间不被1型位点所分割,如连续8个以上2F或3F型的位点,中间不被1型的位点分割,或连续8个以上2M或3M型的位点,中间不被1型的位点分割。
在其中一个实施例中,所述判断UPD步骤中,将判断为单亲片段的数据与全外显子测序拷贝数分析结果进行对比,如拷贝数分析提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD。
本发明还公开了上述的基于NGS-trio的单亲二倍体检测方法在研发或制备用于致病性UPD筛查装置中的应用。
本发明还公开了一种基于NGS-trio的单亲二倍体的筛查装置,包括:数据获取模块、数据分析模块和UPD判断模块;
所述数据获取模块用于获取同组trio样本的NGS测序数据;
所述数据分析模块用于对上述测序数据进行分析,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点;
所述UPD判断模块用于根据预设规则,对上述突变位点进行UPD判断,得出判断结果;
所述数据分析模块按照如下步骤进行分析:
突变位点筛选:分别选取每一个样本中符合预定条件的突变位点,定义为该样本的合格突变位点,被筛选去除的突变位点定位为该样本的不合格突变位点;
位点数据合并:将同组trio样本中所有样本的不合格突变位点取并集,获取并集中各不合格突变位点的染色体坐标,从每个样本的合格位点中剔除与上述不合格位点坐标一致的突变位点;再根据该组样本中剩余的合格突变位点,互相补充无突变位置处的基因分型为与参考序列一致的纯合位点;
遗传模式分类:对每个突变位点的trio组合进行遗传模式的分类,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点;
所述UPD判断模块按照如下步骤进行分析:
亲系判断:如不符合遗传规律的位点小于预定值,则进行后续分析,如不符合遗传规律的位点大于等于预定值,则判断为样本不合格;
判断单亲片段:如只符合单亲父源遗传的连续位点覆盖范围超过预定值,则判断为单亲父源来源的片段;如只符合单亲母源遗传的连续位点覆盖范围超过预定值,则判断为单亲母源来源的片段;
判断UPD:对上述判断为单亲片段的测序数据覆盖深度进行分析,如提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD区段;
致病性UPD筛查:核查上述UPD区段是否覆盖到印记基因或对应条带,如无覆盖,则判定为良性UPD,如有覆盖,则提示具有致病性UPD风险。
在其中一个实施例中,所述突变位点筛选步骤中,按照如下方法选取突变位点:
1)在NGS测序数据中筛选高质量突变位点;
2)去除位于Y染色体上的突变位点;
3)筛选其中的点突变位点;
4)根据Hardy-Weinberg平衡排除疑似假阳性位点;
5)对于杂合位点去除突变频率高于70%的位点,对于纯合位点去除频率低于85%的的位点;
6)将每个位置的突变分型,去除分型数多于2种的位点;
7)剩余位点即为符合预定条件的突变位点。
在其中一个实施例中,所述突变位点筛选步骤中:
所述高质量突变位点为符合以下标准的突变位点:GATK-VQSR质控PASS、总覆盖>20X、突变频率>25%。
在其中一个实施例中,所述数据获取模块中,所述同组trio样本中包括父系样本、母系样本和先证者样本;
所述位点数据合并步骤中,将坐标一致的突变位点数据按照先证者-父-母的顺序排列。
在其中一个实施例中,所述遗传模式分类步骤中,将符合双亲遗传的位点分为:
1型:只符合双亲遗传的位点;
0型:既符合双亲遗传、也符合单亲遗传的位点;
将只符合单亲遗传的位点分为:
3F型:只能由父源单体拯救产生的位点;
2F型:可能由父源单体拯救,也可能由父源三体拯救产生的位点;
3M型:只能由母源单体拯救产生的位点;
2M型:可能由母源单体拯救,也可能由母源三体拯救产生的位点;
将不符合遗传规律的位点分为:
-1型:父母中任一方不符合遗传规律;
-2型:父母双方不符合遗传规律。
可以理解的,上述符合双亲遗传的位点指先证者的两个等位基因可在父母中找到来源的位点,其中包括只符合双亲遗传的位点(即1型,如Aa-AA-aa型),也包括既符合双亲遗传、也符合单亲遗传的位点(即0型)。
在其中一个实施例中,所述判断单亲片段步骤中,如达到连续8个以上2F或3F型的位点,覆盖范围超过1Mbp,即判断为单亲父源来源的片段;如达到连续8个以上2M或3M型的位点,覆盖范围超过1Mbp,即判断为单亲母源来源的片段。
在其中一个实施例中,所述判断UPD步骤中,将判断为单亲片段的数据与全外显子测序拷贝数分析结果进行对比,如拷贝数分析提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD。
本发明还公开了一种存储介质,所述存储介质包括存储的程序,所述程序实现上述模块 的功能。
本发明还公开了一种处理器,所述处理器用于运行程序,所述程序实现上述模块的功能。
与现有技术相比,本发明具有以下有益效果:
本发明的一种基于NGS-trio的单亲二倍体检测方法,基于全外显子组/全基因组测序的trio数据,在检查常规致病突变的同时可以判断是否发生UPD,以及UPD是否发生在高风险印记区域,无需额外实验和人力成本。
并且,该方法还可以用于辅助判断大片段的杂合缺失,而根据突变位点的密度分辨率可以达到1Mbp,具有优异的检测性能。
附图说明
图1为背景技术中三体拯救示意图;
图2为背景技术中单体拯救示意图;
图3为实施例1中基于NGS-trio的单亲二倍体检测方法流程图;
图4为实施例2中筛查装置模块示意图;
图5为实施例3中正常样本示意图;
图6为实施例4中trio样本组NP21S0557-NP21S0558-NP21S0549分析示意图;
图7为图4中框线部分放大示意图;
图8为实施例4中trio样本组NP19E0911-NP19E0910-NP19E0912分析示意图;
图9为图6中框线部分放大示意图;
图10为实施例4中trio样本组NP20E957-NP20E956-NP20E958分析示意图;
图11为图8中框线部分放大示意图;
图12为实施例5中trio样本组NP21F6166--NP21F6167-NP21F6168分析示意图;
图13为图10中框线部分放大示意图;
图14为实施例5中trio样本组NP19F0315--NP19F0313-NP19F0314分析示意图;
图15为图12中框线部分放大示意图;
图16为实施例5中trio样本组NP21F3536--NP21F3567-NP21F3537分析示意图;
图17为图14中框线部分放大示意图;
图18为实施例6中trio样本组NP19E1380--NP19E1381-NP19E1382分析示意图;
图19为图16中框线部分放大示意图;
图20为实施例6中trio样本组NP19E0056--NP9E0057-NP9E0055分析示意图;
图21为图18中框线部分放大示意图;
其中:图5,6,6,8,10,12,14,16,18,20中,横坐标为各染色体号,图下半部分为连续的纯合片段占整个染色体长度的比例,图上半部分为各染色体上突变位点的分布情况;
图7,9,11,13,15,17,19,21的放大示意图中,每一染色体上不同类型位点示意图左向右依次为:十字状unInherit_2指-2型位点,圆点unInherit_1指-1型位点,菱形Norm指正常位点,实线exome_bed指全外显子测序覆盖范围,imprint location为印记区段,imprint gene 为印记基因范围,倒三角Mather指单亲母源遗传位点(3M和2M),正三角Father指单亲父源遗传位点(3F和2F)。
具体实施方式
为了便于理解本发明,下面将参照相关附图对本发明进行更全面的描述。附图中给出了本发明的较佳实施例。但是,本发明可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本发明的公开内容的理解更加透彻全面。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。本文所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。
实施例1
一种基于NGS-trio的单亲二倍体检测方法,其流程如图1所示,包括以下步骤:
一、数据获取。
获取同组trio样本的NGS测序数据。可以理解的,该NGS测序数据可以是全外显子组测序数据,也可以是全基因组测序数据。
而对于样本,需要先证者样品、父系样本和母系样本三者齐全。
二、突变位点筛选。
对于一组trio样本,分别选取每一个样本中符合预定条件的突变位点,定义为该样本的合格突变位点,被筛选去除的突变位点定位为该样本的不合格突变位点,具体按照以下方法进行筛选:
1、在全外显子组测序中筛选高质量突变位点(GATK-VQSR质控PASS、总覆盖>20X、突变频率>25%);
2、去除位于Y染色体上的突变位点;
3、筛选其中的点突变位点;
4、根据Hardy-Weinberg平衡在本地人群频率库中排除可能的假阳性位点;
5、杂合位点去除突变频率高于70%的位点,纯合位点去除频率低于85%的位点;
6、将每个位置的突变分型,去除分型数多于2种的(人是二倍体,一个位置最多2种基因型,多于两种的一般是测序错误),例如chr1:69849G>A,Het分型为chr1:69849[A/G],chr1:69849G>A,Hom分型为chr1:69849[A/A]。例如同时有chr1:69849G>A,Het和chr1:69849G>T,Het,则分型为chr1:69849[A/G/T]即多于2种分型,需要去除这个位点。
7、将上述筛选合格的位点与不合格的位点分别汇总记录。
合格的位点需同时“符合上述所有筛选条件”以及“不符合上述所有去除条件”。
三、位点数据合并。
1、将同组trio样本中三个样本(先证者、父和母样本)的不合格突变位点取并集,获取并集中各不合格突变位点的染色体坐标,从每个样本的合格位点中剔除与上述不合格位点坐标一致的突变位点;即一个位点只要在一个样本中质量不合格,那在另外两个样本中也要剔除。
2、再根据该组样本中剩余的合格突变位点,互相补充无突变位置处的基因分型为与参考序列一致的纯合位点;例如先证者chr1:69849[A/G],父亲chr1:69849[A/A],母亲该位置处无突变,由于该位置处参考序列为G,母亲该处分型为chr1:69849[G/G]。
经过上述处理,全外显子测序数据一般可获得5万个左右符合条件的突变位点trio组合。并按照以下方式排序,突变位点的trio组合顺序为:先证者-父-母,如Aa-AA-aa即先证者为Aa,父亲为AA,母亲为aa。
四、遗传模式分类。
对每个突变位点的trio组合进行遗传模式的分类,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点。具体为:
1、符合双亲遗传的位点:即先证者的两个等位基因可以在父母中找到来源,其中Aa-AA-aa型必然是双亲遗传,此类位点标记为1型(只符合双亲遗传的位点),其他诸如Aa-Aa-Aa,AA-AA-Aa等类型虽然也符合双亲遗传,但同时也符合单亲遗传,这种位点不能作为任何判断的依据,标记为0型(既符合双亲遗传、也符合单亲遗传的位点)。
2、只符合单亲遗传的位点:即先证者的两个等位基因只可能遗传自父母一方,以遗传自父亲为例,有两种情况AA-AA-aa和AA-Aa-aa型,其中AA-Aa-aa只可能因为前述的单体拯救产生,此类标记为3F型,而AA-AA-aa即有可能是单体拯救也可能是三体拯救产生,标记为2F型;同理如果遗传自母亲对应类型标记为3M和2M型。
3、其余不符合遗传规律的位点:如果是散发的若干位点,可能原因有遗传过程中发生基因突变和测序错误等,如果是大范围则考虑父母非亲生的可能性。有两种情况:AA-aa-aa型,父母双方都非亲生,标记为-2型;Aa-aa-aa型,父母一方非亲生,标记为-1型。
五、亲系判断。
如不符合遗传规律的位点小于预定值,则进行后续分析,如不符合遗传规律的位点大于等于预定值,则判断为样本不合格。
正常情况下由于基因突变和测序错误可能会产生少数散发的-1和-2型位点,一般不超过100个,而非亲生情况下即使只有一方非亲生也有数千个-1型位点。
综上,超过800个-1和-2型位点则判断为非亲生,即本实施例中,设定不符合遗传规律位点预定值(阈值)为800个。
如果亲系判断为非亲生,则无法进行后续分析。如亲系判断样本符合要求,则进入后续程序。
六、判断单亲片段。
如只符合单亲父源遗传的连续位点覆盖范围超过预定值,则判断为单亲父源来源的片段;如只符合单亲母源遗传的连续位点覆盖范围超过预定值,则判断为单亲母源来源的片段。
具体的,本实施例中,按照如下方法判断单亲父源/母源片段:达到连续8个以上2F或3F型的位点(中间不被1型的位点分割),覆盖范围超过1Mbp,即判断为单亲父源来源的片段;同理,达到连续8个以上2M或3M型的位点(中间不被1型的位点分割),覆盖范围超过1Mbp,即判断为单亲母源来源的片段。
七、判断UPD。
对上述判断为单亲片段的测序数据覆盖深度进行分析,如提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD区段。具体为:
结合全外显子测序拷贝数变异(CNV)分析结果,即上述单亲父源/母源片段的测序数据覆盖深度与同批次其他样本的对比,若CNV分析提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD;特别的,很大段的缺失一般是致死性的,如果该区段达到整条染色体的一半以上甚至为整条染色体,若样本来源非胚胎,基本可以排除片段缺失。
八、致病性UPD筛查。
核查上述UPD区段是否覆盖到印记基因或对应条带,如无覆盖,则判定为良性UPD,如有覆盖,则提示具有致病性UPD风险。
实施例2
一种基于NGS-trio的单亲二倍体的筛查装置,如图4所示,包括:数据获取模块、数据分析模块和UPD判断模块。
所述数据获取模块用于获取同组trio样本的NGS测序数据。
所述数据分析模块用于对上述测序数据进行分析,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点;该数据分析模块按照实施例1中步骤二至步骤四进行分析。
所述UPD判断模块用于根据预设规则,对上述突变位点进行UPD判断,得出判断结果;该UPD判断模块按照实施例1中步骤五至步骤八进行判断。
实施例3
一种基于NGS-trio的单亲二倍体筛查,以某组(NP19E1936-NP19E1937-NP19F0086)临床样本进行,采用实施例2的筛查装置。
结果如图3所示,该样本中几乎只有Norm(正常)位点,其他类型位点零星散发,可能是测序错误或遗传过程中的新发突变,结果显示为正常样本。
实施例4
一种基于NGS-trio的单亲二倍体筛查,以3组临床样本进行示例,采用实施例2的筛查装置。
1、trio样本组:NP21S0557-NP21S0558-NP21S0549。
结果如图4-5所示,该样本中符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点都有,且均匀分布,同时-1和-2型位点为11443个,超过800个,判断为结果不合格,父母非亲生或者样本有误,无法进行后续判断。
2、trio样本组:NP19E0911-NP19E0910-NP19E0912。
结果如图6-7所示,该样本中符合双亲遗传的位点、只符合单亲母源遗传的位点和不符合遗传规律的位点都有,且均匀分布,缺少单亲父源型位点(几乎无2F或3F的位点),同时-1和-2型位点为5878个,超过800个,判断为结果不合格,父亲非亲生或样本有误,无法进行后续判断。
3、trio样本组:NP20E957-NP20E956-NP20E958。
结果如图8-9所示,该样本中符合双亲遗传的位点、只符合单亲父源遗传的位点和不符合遗传规律的位点都有,且均匀分布,缺少单亲母源型位点(几乎无2M或3M的位点),同时-1和-2型位点为6044个,超过800个,判断为结果不合格,母亲非亲生或样本有误,无法进行后续判断。
上述样本经分析后,均由于不符合trio样本要求,缺失亲缘父系样本和/或母系样本,无法继续后续分析。
实施例5
一种基于NGS-trio的单亲二倍体筛查,以3组临床样本进行示例,采用实施例2的筛查装置。
1、trio样本组:NP21F6166--NP21F6167-NP21F6168。
结果如图10-11所示,该样本中chr15上只有符合单亲母源遗传的位点,其余常染色体上几乎都是符合双亲遗传的位点且均匀分布,缺少单亲父源遗传的位点和不符合遗传规律的位点(几乎无2F、3F、-1、-2的位点),由于chr15上连续180个2M或3M的位点,覆盖范围约72Mbp,同时CNV结果并无异常,判断为chr15母源UPD,由于该UPD区段覆盖多处基因印记区,提示为高风险致病性UPD。
2、trio样本组:NP19F0315--NP19F0313-NP19F0314。
结果如图12-13所示,该样本中chr6上只有符合单亲父源遗传的位点,其余常染色体上几乎都是符合双亲遗传的位点且均匀分布,缺少单亲母源遗传的位点和不符合遗传规律的位点(几乎无2M、3M、-1、-2的位点),由于chr6上连续813个2F或3F的位点,覆盖范围约169Mbp,同时CNV结果并无异常,判断为chr6父源UPD,由于该UPD区段覆盖多处基因印记区,提示为高风险致病性UPD。
3、trio样本组:NP21F3536--NP21F3567-NP21F3537。
结果如图14-15所示,该样本中chr20上只有符合单亲母源遗传的位点,其余常染色体上几乎都是符合双亲遗传的位点且均匀分布,缺少单亲父源遗传的位点和不符合遗传规律的位点(几乎无2F、3F、-1、-2的位点),由于chr20上连续197个2M或3M的位点,覆盖范围约63Mbp,同时CNV结果并无异常,判断为chr20母源UPD,由于该UPD区段覆盖多处基因印记区,提示为高风险致病性UPD。
经分析,上述样本均具有致病性UPD风险。
实施例6
一种基于NGS-trio的单亲二倍体筛查,以2组临床样本进行示例,采用实施例2的筛查装置。
1、trio样本组:NP19E1380--NP19E1381-NP19E1382。
结果如图16-17所示,该样本中chr15上有一小段局部范围内只有符合单亲父源遗传的位点,chr15其余部分及其余常染色体上几乎都是符合双亲遗传的位点且均匀分布,缺少单亲母源遗传的位点和不符合遗传规律的位点(几乎无2M、3M、-1、-2的位点),由于chr15上连续16个2F或3F的位点,覆盖范围约4Mbp,同时CNV结果提示chr15同一范围内有一约4Mbp的杂合缺失,判断为chr15局部母源性缺失,即局部只有一个拷贝的父源片段(造成的临床影响与父源UPD类似),由于该区段覆盖多处基因印记区,提示为高风险致病母源杂合缺失。
2、trio样本组:NP19E0056--NP9E0057-NP9E0055。
结果如图18-19所示,该样本中chr8上有一小段局部范围内只有符合单亲母源遗传的位点(其中有一个单亲父源位点可能是测序错误或其他原因,不影响总体分析),chr8其余部分及其余常染色体上几乎都是符合双亲遗传的位点且均匀分布,缺少单亲父源遗传的位点和不符合遗传规律的位点(几乎无2M、3M、-1、-2的位点),由于chr8上连续69个2M或3M的位点,覆盖范围约11Mbp,同时CNV结果提示chr8同一范围内有一约11Mbp的杂合缺失,判断为chr8局部父源性缺失,即局部只有一个拷贝的母源片段(造成的临床影响与母源UPD类似),由于该区段覆盖多处基因印记区,提示为高风险致病父源杂合缺失。
经分析,上述样本均为高风险致病杂合缺失,且临床影响与缺失来源相反的UPD类似(例如父源杂合缺失则临床影响与母源UPD类似)。
实施例7
采用实施例2的筛查装置,对本检测中心792例全外显子trio测序中筛查UPD,结果如下表所示。
表1. 792例全外显子trio测序中筛查UPD结果
筛查总计 792组
非亲生关系 5组
检测到单亲来源 46(14+32)组
其中杂合缺失(与CNV结果匹配) 32组
其中UPD 14组
已甲基化验证的PWS-AS区域 7/7组
注:上述“检测到单亲来源”指检测到UPD(14组)或杂合缺失(32组);
上述“PWS-AS”指由于chr15-UPD是导致的致病情况,其中母源UPD会导致PWS,父源UPD会导致AS,
chr15-UPD是一种较常见的致病情况,目前市面上已有相应的甲基化检测方法,其中母源UPD会导致PWS,父源UPD会导致AS,本实施例筛查到的7例chr15-UPD使用甲基化检测验证,结果均匹配,说明本发明的方法具有检测结果准确性高。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (17)

  1. 一种基于NGS-trio的单亲二倍体检测方法,其特征在于,包括以下步骤:
    数据获取:获取同组trio样本的NGS测序数据;
    突变位点筛选:分别选取每一个样本中符合预定条件的突变位点,定义为该样本的合格突变位点,被筛选去除的突变位点定位为该样本的不合格突变位点;
    位点数据合并:将同组trio样本中所有样本的不合格突变位点取并集,获取并集中各不合格突变位点的染色体坐标,从每个样本的合格位点中剔除与上述不合格位点坐标一致的突变位点;再根据该组样本中剩余的合格突变位点,互相补充无突变位置处的基因分型为与参考序列一致的纯合位点;
    遗传模式分类:对每个突变位点的trio组合进行遗传模式的分类,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点;
    亲系判断:如不符合遗传规律的位点小于预定值,则进行后续分析,如不符合遗传规律的位点大于等于预定值,则判断为样本不合格;
    判断单亲片段:如只符合单亲父源遗传的连续位点覆盖范围超过预定值,则判断为单亲父源来源的片段;如只符合单亲母源遗传的连续位点覆盖范围超过预定值,则判断为单亲母源来源的片段;
    判断UPD:对上述判断为单亲片段的测序数据覆盖深度进行分析,如提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD区段;
    致病性UPD筛查:核查上述UPD区段是否覆盖到印记基因或对应条带,如无覆盖,则判定为良性UPD,如有覆盖,则提示具有致病性UPD风险。
  2. 根据权利要求1所述的基于NGS-trio的单亲二倍体检测方法,其特征在于,所述突变位点筛选步骤中,按照如下方法选取突变位点:
    1)在NGS测序数据中筛选高质量突变位点;
    2)去除位于Y染色体上的突变位点;
    3)筛选其中的点突变位点;
    4)根据Hardy-Weinberg平衡排除疑似假阳性位点;
    5)对于杂合位点去除突变频率高于70%的位点,对于纯合位点去除频率低于85%的的位点;
    6)将每个位置的突变分型,去除分型数多于2种的位点;
    7)剩余位点即为符合预定条件的突变位点。
  3. 根据权利要求1所述的基于NGS-trio的单亲二倍体检测方法,其特征在于,所述突变位点筛选步骤中:所述高质量突变位点为符合以下标准的突变位点:GATK-VQSR质控PASS、总覆盖>20X、突变频率>25%。
  4. 根据权利要求1-3任一项所述的基于NGS-trio的单亲二倍体检测方法,其特征在于, 所述数据获取步骤中,所述同组trio样本中包括父系样本、母系样本和先证者样本;
    所述位点数据合并步骤中,将坐标一致的突变位点数据按照先证者-父-母的顺序排列。
  5. 根据权利要求4所述的基于NGS-trio的单亲二倍体检测方法,其特征在于,所述遗传模式分类步骤中,将符合双亲遗传的位点分为:
    1型:只符合双亲遗传的位点;
    0型:既符合双亲遗传、也符合单亲遗传的位点;
    将只符合单亲遗传的位点分为:
    3F型:只能由父源单体拯救产生的位点;
    2F型:可能由父源单体拯救,也可能由父源三体拯救产生的位点;
    3M型:只能由母源单体拯救产生的位点;
    2M型:可能由母源单体拯救,也可能由母源三体拯救产生的位点;
    将不符合遗传规律的位点分为:
    -1型:父母中任一方不符合遗传规律;
    -2型:父母双方不符合遗传规律。
  6. 根据权利要求5所述的基于NGS-trio的单亲二倍体检测方法,其特征在于,所述判断单亲片段步骤中,如达到连续8个以上2F或3F型的位点,覆盖范围超过1Mbp,即判断为单亲父源来源的片段;如达到连续8个以上2M或3M型的位点,覆盖范围超过1Mbp,即判断为单亲母源来源的片段。
  7. 根据权利要求1所述的基于NGS-trio的单亲二倍体检测方法,其特征在于,所述判断UPD步骤中,将判断为单亲片段的数据与全外显子测序拷贝数分析结果进行对比,如拷贝数分析提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD。
  8. 权利要求1-7任一项所述的基于NGS-trio的单亲二倍体检测方法在研发或制备用于致病性UPD筛查装置中的应用。
  9. 一种基于NGS-trio的单亲二倍体的筛查装置,其特征在于,包括:数据获取模块、数据分析模块和UPD判断模块;
    所述数据获取模块用于获取同组trio样本的NGS测序数据;
    所述数据分析模块用于对上述测序数据进行分析,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点;
    所述UPD判断模块用于根据预设规则,对上述突变位点进行UPD判断,得出判断结果;
    所述数据分析模块按照如下步骤进行分析:
    突变位点筛选:分别选取每一个样本中符合预定条件的突变位点,定义为该样本的合格 突变位点,被筛选去除的突变位点定位为该样本的不合格突变位点;
    位点数据合并:将同组trio样本中所有样本的不合格突变位点取并集,获取并集中各不合格突变位点的染色体坐标,从每个样本的合格位点中剔除与上述不合格位点坐标一致的突变位点;再根据该组样本中剩余的合格突变位点,互相补充无突变位置处的基因分型为与参考序列一致的纯合位点;
    遗传模式分类:对每个突变位点的trio组合进行遗传模式的分类,将突变位点分为:符合双亲遗传的位点、只符合单亲遗传的位点和不符合遗传规律的位点;
    所述UPD判断模块按照如下步骤进行分析:
    亲系判断:如不符合遗传规律的位点小于预定值,则进行后续分析,如不符合遗传规律的位点大于等于预定值,则判断为样本不合格;
    判断单亲片段:如只符合单亲父源遗传的连续位点覆盖范围超过预定值,则判断为单亲父源来源的片段;如只符合单亲母源遗传的连续位点覆盖范围超过预定值,则判断为单亲母源来源的片段;
    判断UPD:对上述判断为单亲片段的测序数据覆盖深度进行分析,如提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD区段;
    致病性UPD筛查:核查上述UPD区段是否覆盖到印记基因或对应条带,如无覆盖,则判定为良性UPD,如有覆盖,则提示具有致病性UPD风险。
  10. 根据权利要求9所述的基于NGS-trio的单亲二倍体的筛查装置,其特征在于,所述突变位点筛选步骤中,按照如下方法选取突变位点:
    1)在NGS测序数据中筛选高质量突变位点;
    2)去除位于Y染色体上的突变位点;
    3)筛选其中的点突变位点;
    4)根据Hardy-Weinberg平衡排除疑似假阳性位点;
    5)对于杂合位点去除突变频率高于70%的位点,对于纯合位点去除频率低于85%的的位点;
    6)将每个位置的突变分型,去除分型数多于2种的位点;
    7)剩余位点即为符合预定条件的突变位点。
  11. 根据权利要求9所述的基于NGS-trio的单亲二倍体的筛查装置,其特征在于,所述突变位点筛选步骤中:所述高质量突变位点为符合以下标准的突变位点:GATK-VQSR质控PASS、总覆盖>20X、突变频率>25%。
  12. 根据权利要求9所述的基于NGS-trio的单亲二倍体的筛查装置,其特征在于,所述数据获取模块中,所述同组trio样本中包括父系样本、母系样本和先证者样本;
    所述位点数据合并步骤中,将坐标一致的突变位点数据按照先证者-父-母的顺序排列。
  13. 根据权利要求12所述的基于NGS-trio的单亲二倍体的筛查装置,其特征在于,所述遗传模式分类步骤中,将符合双亲遗传的位点分为:
    1型:只符合双亲遗传的位点;
    0型:既符合双亲遗传、也符合单亲遗传的位点;
    将只符合单亲遗传的位点分为:
    3F型:只能由父源单体拯救产生的位点;
    2F型:可能由父源单体拯救,也可能由父源三体拯救产生的位点;
    3M型:只能由母源单体拯救产生的位点;
    2M型:可能由母源单体拯救,也可能由母源三体拯救产生的位点;
    将不符合遗传规律的位点分为:
    -1型:父母中任一方不符合遗传规律;
    -2型:父母双方不符合遗传规律。
  14. 根据权利要求13所述的基于NGS-trio的单亲二倍体的筛查装置,其特征在于,所述判断单亲片段步骤中,如达到连续8个以上2F或3F型的位点,覆盖范围超过1Mbp,即判断为单亲父源来源的片段;如达到连续8个以上2M或3M型的位点,覆盖范围超过1Mbp,即判断为单亲母源来源的片段。
  15. 根据权利要求9所述的基于NGS-trio的单亲二倍体的筛查装置,其特征在于,所述判断UPD步骤中,将判断为单亲片段的数据与全外显子测序拷贝数分析结果进行对比,如拷贝数分析提示该区段为单拷贝,则判定为片段缺失;否则判定为UPD。
  16. 一种存储介质,其特征在于,所述存储介质包括存储的程序,所述程序实现权利要求9‐15任一项所述模块的功能。
  17. 一种处理器,其特征在于,所述处理器用于运行程序,所述程序实现权利要求9‐15任一项所述模块的功能。
PCT/CN2020/106716 2020-08-04 2020-08-04 基于NGS-trio的单亲二倍体检测方法及应用 WO2022027212A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/019,858 US20230282307A1 (en) 2020-08-04 2020-08-04 Method for detecting uniparental disomy based upon ngs-trio, and use thereof
PCT/CN2020/106716 WO2022027212A1 (zh) 2020-08-04 2020-08-04 基于NGS-trio的单亲二倍体检测方法及应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/106716 WO2022027212A1 (zh) 2020-08-04 2020-08-04 基于NGS-trio的单亲二倍体检测方法及应用

Publications (1)

Publication Number Publication Date
WO2022027212A1 true WO2022027212A1 (zh) 2022-02-10

Family

ID=80119270

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/106716 WO2022027212A1 (zh) 2020-08-04 2020-08-04 基于NGS-trio的单亲二倍体检测方法及应用

Country Status (2)

Country Link
US (1) US20230282307A1 (zh)
WO (1) WO2022027212A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805510A (zh) * 2022-09-01 2023-09-26 杭州链康医学检验实验室有限公司 用于判断样本配对或污染的位点组合及其应用

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8532930B2 (en) * 2005-11-26 2013-09-10 Natera, Inc. Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals
CN104862380A (zh) * 2014-02-25 2015-08-26 林巍 家族特异性遗传病关联等位基因单体型变异标签确认方法
CN107633160A (zh) * 2017-08-14 2018-01-26 广州市圣鑫生物科技有限公司 三联体亲子鉴定方法、系统、计算机设备及可读存储介质
CN110029157A (zh) * 2018-01-11 2019-07-19 北京大学 一种检测肿瘤单细胞基因组单倍体拷贝数变异的方法
CN110211630A (zh) * 2019-06-06 2019-09-06 广州金域医学检验中心有限公司 致病性单亲二倍体的筛查装置及存储介质和处理器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8532930B2 (en) * 2005-11-26 2013-09-10 Natera, Inc. Method for determining the number of copies of a chromosome in the genome of a target individual using genetic data from genetically related individuals
CN104862380A (zh) * 2014-02-25 2015-08-26 林巍 家族特异性遗传病关联等位基因单体型变异标签确认方法
CN107633160A (zh) * 2017-08-14 2018-01-26 广州市圣鑫生物科技有限公司 三联体亲子鉴定方法、系统、计算机设备及可读存储介质
CN110029157A (zh) * 2018-01-11 2019-07-19 北京大学 一种检测肿瘤单细胞基因组单倍体拷贝数变异的方法
CN110211630A (zh) * 2019-06-06 2019-09-06 广州金域医学检验中心有限公司 致病性单亲二倍体的筛查装置及存储介质和处理器

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DEL GAUDIO DANIELA; SHINAWI MARWAN; ASTBURY CAROLINE; TAYEH MARWAN K.; DEAK KRISTEN L.; RACA GORDANA;: "Diagnostic testing for uniparental disomy: a points to consider statement from the American College of Medical Genetics and Genomics (ACMG)", GENETICS IN MEDICINE, NATURE PUBLISHING GROUP US, NEW YORK, vol. 22, no. 7, 16 April 2020 (2020-04-16), New York, pages 1133 - 1141, XP037269971, ISSN: 1098-3600, DOI: 10.1038/s41436-020-0782-9 *
SILVA MARISA; LEEUW NICOLE DE; MANN KATHY; SCHURING-BLOM HELEEN; MORGAN SIAN; GIARDINO DANIELA; RACK KATRINA; HASTINGS ROS: "European guidelines for constitutional cytogenomic analysis", EUROPEAN JOURNAL OF HUMAN GENETICS, KARGER, BASEL, CH, vol. 27, no. 1, 1 October 2018 (2018-10-01), CH , pages 1 - 16, XP036661363, ISSN: 1018-4813, DOI: 10.1038/s41431-018-0244-x *
WANG YU: "Identifying Human Genome-wide CNV, LOH And UPD by Targeted Sequencing of Selected Regions", BASIC SCIENCES, CHINESE MASTER’S THESES FULL-TEXT DATABASE, no. 5, 15 May 2016 (2016-05-15), XP055893716 *
WRIGHT CAROLINE F., FITZPATRICK DAVID R., FIRTH HELEN V.: "Paediatric genomics: diagnosing rare disease in children", NATURE REVIEWS GENETICS, NATURE PUBLISHING GROUP, GB, vol. 19, no. 5, 1 May 2018 (2018-05-01), GB , pages 253 - 268, XP055893717, ISSN: 1471-0056, DOI: 10.1038/nrg.2017.116 *
YAUY KEVIN; DE LEEUW NICOLE; YNTEMA HELGER G.; PFUNDT ROLPH; GILISSEN CHRISTIAN: "Accurate detection of clinically relevant uniparental disomy from exome sequencing data", GENETICS IN MEDICINE, NATURE PUBLISHING GROUP US, NEW YORK, vol. 22, no. 4, 26 November 2019 (2019-11-26), New York, pages 803 - 808, XP037082943, ISSN: 1098-3600, DOI: 10.1038/s41436-019-0704-x *

Also Published As

Publication number Publication date
US20230282307A1 (en) 2023-09-07

Similar Documents

Publication Publication Date Title
Andrews et al. Cross-tissue integration of genetic and epigenetic data offers insight into autism spectrum disorder
Gorkin et al. Common DNA sequence variation influences 3-dimensional conformation of the human genome
CN111863125B (zh) 基于NGS-trio的单亲二倍体检测方法及应用
Esteki et al. Concurrent whole-genome haplotyping and copy-number profiling of single cells
Cooper et al. Mutational and selective effects on copy-number variants in the human genome
Quinn et al. Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 Genomes data
González et al. A fast and accurate method to detect allelic genomic imbalances underlying mosaic rearrangements using SNP array data
JP5881420B2 (ja) 自閉症関連遺伝子マーカー
WO2020244538A1 (zh) 致病性单亲二倍体的筛查方法及应用
CN105177160B (zh) 检测多种新生儿遗传代谢病致病基因的引物及试剂盒
Adams et al. Analysis of DNA sequence variants detected by high‐throughput sequencing
Andrews et al. The clustering of functionally related genes contributes to CNV-mediated disease
Samarakoon et al. High-throughput 454 resequencing for allele discovery and recombination mapping in Plasmodium falciparum
WO2015043278A1 (zh) 同时进行单体型分析和染色体非整倍性检测的方法和系统
CN113593644A (zh) 基于家系的低深度测序检测染色体单亲二体的方法
Werling et al. Limited contribution of rare, noncoding variation to autism spectrum disorder from sequencing of 2,076 genomes in quartet families
WO2022027212A1 (zh) 基于NGS-trio的单亲二倍体检测方法及应用
EP2971126B1 (en) Determining fetal genomes for multiple fetus pregnancies
Kumasaka et al. PlatinumCNV: a Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data
Sun et al. Mapping the Complex Genetic Landscape of Human Neurons
CN114921536A (zh) 一种检测单亲二倍体和杂合性缺失的方法、装置、存储介质和设备
Oliveira et al. Homozygosity mapping using whole-exome sequencing: a valuable approach for pathogenic variant identification in genetic diseases
Shin et al. Association of Polymorphisms at the SIX1/SIX6 Locus With Normal Tension Glaucoma in a Korean Population
CN115579056B (zh) 一组用于评估精神分裂症分子分型的基因群及其诊断产品和应用
Magnusson et al. One CNV Discordance in NRXN1 observed upon genome-wide screening in 38 pairs of adult healthy monozygotic twins

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20948426

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20948426

Country of ref document: EP

Kind code of ref document: A1