CN111863125B

CN111863125B - Method for detecting single parent diploid based on NGS-trio and application

Info

Publication number: CN111863125B
Application number: CN202010774623.XA
Authority: CN
Inventors: 刘晶星; 于世辉; 喻长顺; 向丽娜; 陈白雪
Original assignee: Guangzhou Kingmed Diagnostics Group Co ltd; Guangzhou Kingmed Diagnostics Central Co Ltd
Current assignee: Guangzhou Kingmed Diagnostics Group Co ltd; Guangzhou Kingmed Diagnostics Central Co Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2024-04-12
Anticipated expiration: 2040-08-04
Also published as: CN111863125A

Abstract

The invention relates to a method for detecting a single parent diploid based on NGS-trio and application thereof, belonging to the technical field of bioinformatics analysis. According to the method, the NGS-trio sequencing data are obtained, and the chromosome genetic source of the prover can be directly deduced through analysis and judgment, so that whether UPD is directly judged (instead of indirectly presuming UPD through LOH), and the diagnosis positive rate is improved on the premise of not increasing any cost. The method can be used for assisting in judging the heterozygous deletion of the large fragment, and the density resolution ratio of the mutation site can reach 1Mbp, so that the method has excellent detection performance.

Description

Method for detecting single parent diploid based on NGS-trio and application

Technical Field

The invention relates to the technical field of bioinformatics analysis, in particular to a method for detecting a single parent diploid based on NGS-trio and application thereof.

Background

Genomic imprinting (Genomic imprinting), also known as genetic imprinting, is a genetic process of marking information of its parental origin on a gene or genomic domain by biochemical means. Such genes are called imprinted genes, and whether such genes are expressed or not depends on the source of the chromosome they are in (male or female line) and whether the gene is silenced on the chromosome from which it is derived (the silencing mechanism is mainly methylation). Some imprinted genes are expressed only from maternal chromosomes, while some are expressed only from maternal chromosomes.

In normal diploids, a pair of homologous chromosomes are derived from the male parent and the female parent, respectively, and a uniparent diploid (UniParental Disomy abbreviated UPD) refers to a pair of homologous chromosomes (or a segment of a chromosome) derived from the same parent, which segment, if it contains a imprinted gene, can lead to a disturbance in gene expression. Current methods of diagnosing UPD are to detect whether the methylation levels of the same stretch of a pair of homologous chromosomes are consistent.

In most cases UPD is the result of the fact that two homologous chromosomes are not separated at the time of meiosis, resulting in a gamete with an abnormal chromosome copy number, 2 or 0 copies compared to the normal gamete, resulting in a zygote (trisome or monomer) with an abnormal copy number. Finally, three-body rescue is carried out, as shown in figure 1, namely, one chromosome is randomly lost; or by monomer rescue, as shown in FIG. 2, a single monomeric chromosome is replicated to revert back to the euploid. Three of these have a 1/3 probability of producing UPD, whereas one-body rescue must produce UPD.

UPD generated by monomer rescue can be estimated by indirect detection of LOH (loss of heterozygosity ) because homozygosity of the entire chromosome is generated; while local LOH is occasionally generated due to recombination during meiosis for UPD generated by trisomy rescue, local LOH is more causative (e.g., near wedding) and up to 100% cannot be determined.

Moreover, for the methylation detection method for detecting UPD in the conventional technology, only small fragments of local chromosome can be processed, different experiments are designed aiming at different areas, the efficiency is low, the speed is low, and the method is not suitable for screening of the whole genome range;

the SNP chip method has the defect of higher cost, and the target probe is a polymorphic site, so that other pathogenic minor mutations (point mutation, minor indels and the like) cannot be detected at the same time;

full exon sequencing is currently the most common method for detecting gene-deficient diseases, and can detect pathogenic point mutations, small indels, copy number variations and the like, and is the first choice for most of these patients. UPD can only be inferred indirectly through LOH based on sequencing data of individual samples, as disclosed in CN110211630 a.

Disclosure of Invention

In view of the above, it is desirable to provide a method for detecting a single parent diploid based on NGS-trio, by which the chromosomal genetic source of a prover can be directly inferred, thereby directly judging whether UPD (rather than indirectly predicting UPD by LOH) is present, and improving the diagnostic positive rate without increasing any cost.

An NGS-trio-based method for detecting a uniparent diploid comprises the following steps:

and (3) data acquisition: obtaining NGS sequencing data of the trio samples in the same group;

mutation site selection: respectively selecting mutation sites meeting preset conditions in each sample, defining the mutation sites as qualified mutation sites of the sample, and positioning the mutation sites removed by screening as unqualified mutation sites of the sample;

site data merging: merging unqualified mutation sites of all samples in the same trio sample, acquiring and concentrating chromosome coordinates of each unqualified mutation site, and removing mutation sites consistent with the unqualified site coordinates from qualified sites of each sample; then according to the residual qualified mutation sites in the group of samples, mutually supplementing the genotyping at the non-mutation position to be a homozygote site consistent with the reference sequence;

classification of genetic patterns: classification of genetic patterns was performed for each trio combination of mutation sites, and the mutation sites were classified into: sites conforming to the inheritance of parents, sites conforming to the inheritance of only parents and sites not conforming to the inheritance rule;

judging a parent line: if the sites which do not accord with the genetic rule are smaller than a preset value, carrying out subsequent analysis, and if the sites which do not accord with the genetic rule are larger than or equal to the preset value, judging that the sample is unqualified;

judging the single parent fragment: if the coverage range of the continuous sites which only meet the inheritance of the single parent source exceeds a preset value, judging the fragment of the single parent source; if the coverage range of the continuous sites which only meet the inheritance of the single parent source exceeds a preset value, judging the fragment to be the fragment from the single parent source;

judging UPD: analyzing the coverage depth of the sequencing data judged as the single parent fragment, and judging that the fragment is missing if the section is single copy; otherwise, judging the section as a UPD section;

pathogenic UPD screening: checking whether the UPD segment covers the imprinted gene or the corresponding band, if not, judging as benign UPD, if so, prompting that the UPD segment has pathogenic UPD risk.

Along with the reduction of the sequencing cost, more and more whole exon sequencing detection schemes select samples for simultaneously detecting a prover and parents thereof, and based on the trio family data, the method can directly infer the chromosome genetic source of the prover so as to directly judge whether UPD is carried out or not, and the diagnosis positive rate is improved on the premise of not increasing any cost.

It will be appreciated that the NGS sequencing data described above may be either whole exon sequencing data or whole genome sequencing data.

In one embodiment, in the mutation site selection step, the mutation site is selected as follows:

1) Screening high-quality mutation sites in NGS sequencing data;

2) Removing the mutation site located on the Y chromosome;

3) Screening the point mutation sites;

4) Excluding suspected false positive sites according to Hardy-Weinberg equilibrium;

5) Sites with mutation frequencies higher than 70% were removed for heterozygous sites, and sites with frequencies lower than 85% were removed for homozygous sites;

6) Typing the mutation at each position, and removing more than 2 types of loci;

7) The remaining sites are mutation sites meeting preset conditions.

In mutation analysis, since humans are diploid, one location is at most 2 genotypes, more than two are typically sequencing errors, such as: the chr1:69849G > A, het typing is chr1:69849[ A/G ], chr1:69849G > A, hom typing is chr1:69849[ A/A ]. For example, if there are both chr1:69849G > A, het and chr1:69849G > T, het, then the typing is chr1:69849[ A/G/T ], i.e.more than 2 types of typing, this site needs to be removed.

It will be appreciated that the mutation sites meeting the predetermined conditions need to meet all screening conditions at the same time and do not meet all removal conditions.

It will be appreciated that according to Hardy-Weinberg equilibrium law, under conditions of infinite population, random mating, no mutation, no selection, no genetic drift, the genotype frequency and the gene frequency at a locus within the population will keep the generation unchanged and in genetic equilibrium. Thus, false positive sites can be excluded by chi-square test. For example, the frequency of AA-AB-BB at a locus is regular, such as 1 million persons in the local population pool, allele frequency of a is 0.4, b is 0.6, theoretical value of Aa for that genotype is 1600, BB is 3600, aB is 4800, and the actual and theoretical persons of these genotypes in the population pool are subjected to chi-square test, excluding loci where actual and theoretical persons deviate too much (i.e., highly suspected false positive loci).

A large number of sites with poor quality are doped in the conventional NGS sequencing result, so that the subsequent UPD judgment flow of the method is greatly interfered, and if all sites are used, the detection effect is poor. Therefore, the mutation site is selected by the method, so that the accuracy of an analysis result can be improved.

In one embodiment, the mutation site selection step:

the high quality mutation site is a mutation site meeting the following criteria: GATK-VQSR quality control PASS, total coverage >20X, mutation frequency >25%.

In one embodiment, in the data obtaining step, the parent sample, the mother sample and the prover sample are included in the same set of trio samples;

in the step of merging the locus data, mutation locus data with consistent coordinates are arranged according to the sequence of a prover, a father and a mother.

The test according to the method of the present invention must include both forerunner and parent samples, and is not necessary.

In one embodiment, in the genetic pattern classification step, the loci conforming to the inheritance of the parents are classified into:

type 1: only the genetic locus of parents is met;

type 0: the method is suitable for both the genetic loci of parents and the genetic loci of parents;

loci that only meet uniparental inheritance are classified as:

type 3F: sites that can only be rescued by parent monomers;

type 2F: sites that may be rescued by parent monomers or by parent trisomy;

type 3M: sites that can only be rescued by parent monomers;

type 2M: sites that are likely to be rescued by maternal monomers, or by maternal trisomy;

loci that do not meet genetic rules are classified as:

-type 1: any one of parents does not accord with the genetic rule;

-type 2: both parents do not conform to the genetic rules.

It will be appreciated that the above-described loci that match parental inheritance refer to loci from which both alleles of a prover can find origin in parents, including loci that match only parental inheritance (i.e., type 1, such as Aa-Aa), as well as loci that match both parental inheritance and uniparental inheritance (i.e., type 0).

In one embodiment, in the step of determining the single parent fragment, if more than 8 continuous sites of type 2F or 3F are reached, the coverage area exceeds 1Mbp, i.e. the fragment from the single parent source is determined; if more than 8 sites of 2M or 3M type are reached, the coverage exceeds 1Mbp, namely the fragments which are judged to be the source of the single parent source.

It will be appreciated that the above consecutive sites are not split up from the middle by a type 1 site, e.g., 8 or more sites of type 2F or 3F are consecutive, the middle is not split up by a type 1 site, or 8 or more sites of type 2M or 3M are consecutive, the middle is not split up by a type 1 site.

In one embodiment, in the step of determining UPD, comparing the data determined as the single parent fragment with the result of the copy number analysis of the whole exon sequencing, if the copy number analysis indicates that the segment is single copy, determining that the fragment is missing; otherwise, judging as UPD.

The invention also discloses application of the single parent diploid detection method based on NGS-trio in research and development or preparation of a pathogenic UPD screening device.

The invention also discloses a screening device of the uniparent diploid based on the NGS-trio, which comprises the following steps: the device comprises a data acquisition module, a data analysis module and a UPD judgment module;

the data acquisition module is used for acquiring NGS sequencing data of the trio samples in the same group;

the data analysis module is used for analyzing the sequencing data and dividing mutation sites into: sites conforming to the inheritance of parents, sites conforming to the inheritance of only parents and sites not conforming to the inheritance rule;

the UPD judgment module is used for carrying out UPD judgment on the mutation sites according to a preset rule to obtain a judgment result;

the data analysis module analyzes according to the following steps:

the UPD judging module analyzes according to the following steps:

1) Screening high-quality mutation sites in NGS sequencing data;

2) Removing the mutation site located on the Y chromosome;

3) Screening the point mutation sites;

7) The remaining sites are mutation sites meeting preset conditions.

In one embodiment, the mutation site selection step:

In one embodiment, in the data acquisition module, the same set of trio samples includes a parent sample, a maternal sample, and a prover sample;

type 1: only the genetic locus of parents is met;

loci that only meet uniparental inheritance are classified as:

type 3F: sites that can only be rescued by parent monomers;

type 2F: sites that may be rescued by parent monomers or by parent trisomy;

type 3M: sites that can only be rescued by parent monomers;

loci that do not meet genetic rules are classified as:

-type 1: any one of parents does not accord with the genetic rule;

-type 2: both parents do not conform to the genetic rules.

The invention also discloses a storage medium comprising a stored program, which implements the functions of the above modules.

The invention also discloses a processor, which is used for running a program, and the program realizes the functions of the modules.

Compared with the prior art, the invention has the following beneficial effects:

according to the single parent diploid detection method based on NGS-trio, based on trio data of whole exome/whole genome sequencing, whether UPD occurs and whether UPD occurs in a high risk imprinting area can be judged while conventional pathogenic mutation is checked, and no additional experiment and no additional labor cost are required.

In addition, the method can be used for assisting in judging the heterozygous deletion of the large fragment, and the density resolution ratio of the mutation site can reach 1Mbp, so that the method has excellent detection performance.

Drawings

FIG. 1 is a schematic diagram of three-body rescue in the background art;

FIG. 2 is a schematic diagram of monomer rescue in the background art;

FIG. 3 is a flow chart of a method for detecting a single parent diploid based on NGS-trio in example 1;

FIG. 4 is a schematic diagram of a screening apparatus module in example 2;

FIG. 5 is a schematic diagram of a normal sample in example 3;

FIG. 6 is a schematic diagram of analysis of trio sample group NP21S0557-NP21S0558-NP21S0549 in example 4;

FIG. 7 is an enlarged schematic view of a portion of the wire of FIG. 4;

FIG. 8 is a schematic diagram of the analysis of trio sample groups NP19E0911-NP19E0910-NP19E0912 in example 4;

FIG. 9 is an enlarged schematic view of a portion of the wire of FIG. 6;

FIG. 10 is a schematic diagram of the analysis of trio sample group NP20E957-NP20E956-NP20E958 in example 4;

FIG. 11 is an enlarged schematic view of a portion of the wire of FIG. 8;

FIG. 12 is a schematic diagram of the analysis of trio sample groups NP21F6166- -NP21F6167- -NP21F6168 of example 5;

FIG. 13 is an enlarged schematic view of a portion of the wire of FIG. 10;

FIG. 14 is a schematic diagram of the analysis of trio sample set NP19F0315- -NP19F0313- -NP19F0314 in example 5;

FIG. 15 is an enlarged schematic view of a portion of the wire of FIG. 12;

FIG. 16 is a schematic diagram showing analysis of trio sample group NP21F3536- -NP21F3567- -NP21F3537 in example 5;

FIG. 17 is an enlarged schematic view of a portion of the wire of FIG. 14;

FIG. 18 is a schematic diagram of the analysis of trio sample groups NP19E 1380-NP 19E1381-NP19E1382 in example 6;

FIG. 19 is an enlarged view of a portion of the wire of FIG. 16;

FIG. 20 is a schematic diagram of the analysis of trio sample group NP19E0056- -NP9E0057- -NP9E0055 in example 6;

FIG. 21 is an enlarged schematic view of a portion of the wire of FIG. 18;

wherein: in fig. 5,6,6,8, 10, 12, 14, 16, 18, 20, the abscissa indicates the number of each chromosome, the lower half of the diagram indicates the proportion of the continuous homozygous fragments to the whole chromosome length, and the upper half of the diagram indicates the distribution of the mutation sites on each chromosome;

fig. 7,9, 11, 13, 15, 17, 19, 21 are enlarged schematic diagrams of different types of sites on each chromosome, in order from left to right: cross-shaped uninhereit_2 refers to type-2 sites, round dot uninhereit_1 refers to type-1 sites, diamond-shaped Norm refers to normal sites, solid line exome_bed refers to whole exon sequencing coverage, imprint location is a imprinted segment, imprint gene is a imprinted gene range, inverted triangle Mather refers to single parent genetic sites (3M and 2M), and right triangle Father refers to single parent genetic sites (3F and 2F).

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Example 1

The method for detecting the uniparent diploid based on the NGS-trio comprises the following steps:

1. and (5) data acquisition.

NGS sequencing data for the same set of trio samples was obtained. It will be appreciated that the NGS sequencing data may be whole exome sequencing data or whole genome sequencing data.

For the sample, the first evidence sample, the father sample and the mother sample are all needed.

2. Mutation site selection.

For a group of trio samples, mutation sites meeting preset conditions in each sample are respectively selected, the mutation sites are defined as qualified mutation sites of the sample, and the mutation sites which are screened and removed are positioned as unqualified mutation sites of the sample, and the screening is specifically carried out according to the following method:

1. screening high quality mutation sites (GATK-VQSR quality control PASS, total coverage >20X, mutation frequency > 25%) in whole exome sequencing;

2. removing the mutation site located on the Y chromosome;

3. screening the point mutation sites;

4. excluding possible false positive sites in the local population frequency library according to Hardy-Weinberg equilibrium;

5. sites with mutation frequency higher than 70% are removed from heterozygous sites, and sites with mutation frequency lower than 85% are removed from homozygous sites;

6. mutations at each position are typed, with more than 2 types of typing removed (human being diploid, one position being at most 2 genotypes, more than two types generally being sequencing errors), e.g. chr1:69849G > A, het typing is chr1:69849[ A/G ], chr1:69849G > A, hom typing is chr1:69849[ A/A ]. For example, if there are both chr1:69849G > A, het and chr1:69849G > T, het, then the typing is chr1:69849[ A/G/T ], i.e.more than 2 types of typing, this site needs to be removed.

7. And respectively summarizing and recording the sites which are qualified by screening and sites which are unqualified by screening.

Sites that are eligible should "meet all of the above screening conditions" and "not meet all of the above removal conditions" at the same time.

3. And merging site data.

1. Combining unqualified mutation sites of three samples (a prover, a father and a mother sample) in the same group of trio samples, acquiring and concentrating chromosome coordinates of each unqualified mutation site, and removing mutation sites consistent with the coordinates of the unqualified sites from the qualified sites of each sample; i.e., a site is rejected in one sample as being of unacceptable quality in the other two samples.

2. Then according to the residual qualified mutation sites in the group of samples, mutually supplementing the genotyping at the non-mutation position to be a homozygote site consistent with the reference sequence; for example, the forerunner chr1:69849[ A/G ], the father chr1:69849[ A/A ], the mother has no mutation at this position, and the mother has a pattern of chr1:69849[ G/G ] since the reference sequence at this position is G.

Through the above treatment, the whole exon sequencing data can generally obtain about 5 ten thousand mutation site trio combinations meeting the conditions. And sequenced in the following manner, the trio combination sequence of the mutation sites is as follows: the forepart-father-mother, for example Aa-Aa, is Aa, aa is the father, aa is the mother.

4. And (5) classifying genetic patterns.

Classification of genetic patterns was performed for each trio combination of mutation sites, and the mutation sites were classified into: sites conforming to parental inheritance, sites conforming to uniparental inheritance only, and sites not conforming to genetic rules. The method comprises the following steps:

1. sites conforming to parental inheritance: that is, both alleles of the prover can be found in parents, where the Aa-Aa type must be inherited by the parents, such sites are labeled as type 1 (only the sites inherited by the parents), other types such as Aa-Aa, aa-Aa, etc. are also inherited by the parents, but also by the parents, such sites cannot be used as a basis for any judgment, and are labeled as type 0 (both by the parents and by the parents).

2. Only the single genetic locus is met: that is, both alleles of the prover may only be inherited from the parent, for example from the father, there are two cases Aa-Aa and Aa-Aa types, where Aa-Aa may only be generated as a result of the aforementioned monomer rescue, such being labeled as type 3F, and Aa-Aa may be generated as either a monomer rescue or a trisomy rescue, labeled as type 2F; similarly, if inherited from the mother, the corresponding types are marked as 3M and 2M.

3. The remaining sites that do not meet the genetic rules: in the case of several points of distribution, there may be reasons for genetic mutation, sequencing errors, etc. during the genetic process, and if so, the possibility of parents not being in person is considered. There are two situations: AA-AA type, neither parent is autogenous, labeled-2; aa-Aa, the parent is not in-person and is labeled as type-1.

5. And judging the parent line.

If the sites which do not accord with the genetic rule are smaller than a preset value, carrying out subsequent analysis, and if the sites which do not accord with the genetic rule are larger than or equal to the preset value, judging that the sample is unqualified.

Normally, there are few sporadic-1 and-2 sites that can be generated due to mutation and sequencing errors, typically no more than 100, and not thousands of-1 sites even if only one party is not in-person.

In summary, more than 800 sites of type-1 and type-2 are judged as non-autogenous, i.e., in this example, 800 sites are set to be predetermined values (thresholds) which do not meet the genetic rule.

If the parent line is judged to be non-autogenous, no subsequent analysis can be performed. If the parent system judges that the sample meets the requirement, the subsequent process is entered.

6. Judging the single parent fragment.

If the coverage range of the continuous sites which only meet the inheritance of the single parent source exceeds a preset value, judging the fragment of the single parent source; if the coverage of the continuous sites which only meet the inheritance of the single parent source exceeds a preset value, judging the fragment of the single parent source.

Specifically, in this embodiment, the parent source/parent source fragment of a parent is determined according to the following method: reaching more than 8 continuous sites of 2F or 3F types (the middle is not divided by the site of 1 type), and the coverage exceeds 1Mbp, namely judging the fragment to be a single parent source; similarly, 8 or more sites of 2M or 3M type (the middle is not divided by the site of 1 type) are continuously reached, and the coverage exceeds 1Mbp, namely the fragments from the single parent source are judged.

7. And judging the UPD.

Analyzing the coverage depth of the sequencing data judged as the single parent fragment, and judging that the fragment is missing if the section is single copy; otherwise, judging the section as a UPD section. The method comprises the following steps:

combining with a full exon sequencing Copy Number Variation (CNV) analysis result, namely comparing the sequencing data coverage depth of the single parent source/parent source fragment with other samples in the same batch, and judging that the fragment is missing if CNV analysis indicates that the section is single copy; otherwise, judging the UPD; in particular, deletions of large segments are generally lethal, and fragment deletions can be substantially excluded if the segment is more than half of, and even the entire chromosome, if the sample is of non-embryonic origin.

8. Screening pathogenic UPD.

Checking whether the UPD segment covers the imprinted gene or the corresponding band, if not, judging as benign UPD, if so, prompting that the UPD segment has pathogenic UPD risk.

Example 2

An NGS-trio-based screening device for a uniparent diploid, as shown in fig. 4, comprising: the device comprises a data acquisition module, a data analysis module and a UPD judgment module.

The data acquisition module is used for acquiring NGS sequencing data of the trio samples in the same group.

The data analysis module is used for analyzing the sequencing data and dividing mutation sites into: sites conforming to the inheritance of parents, sites conforming to the inheritance of only parents and sites not conforming to the inheritance rule; the data analysis module performs analysis according to steps two to four of example 1.

The UPD judgment module is used for carrying out UPD judgment on the mutation sites according to a preset rule to obtain a judgment result; the UPD judgment module performs judgment according to steps five to eight in embodiment 1.

Example 3

The NGS-trio based uniparental diploid screening was performed on a set of (NP 19E1936-NP19E1937-NP19F 0086) clinical samples using the screening apparatus of example 2.

The results are shown in FIG. 3, where the sample has almost only Norm sites, and other types of sites sporadically emanate, possibly as sequencing errors or new mutations during genetic processing, and the results are shown as normal samples.

Example 4

An NGS-trio based uniparent diploid screening was exemplified with 3 sets of clinical samples using the screening apparatus of example 2.

1. trio sample group: NP21S0557-NP21S0558-NP21S0549.

As shown in the results of figures 4-5, the sample has sites conforming to the inheritance of parents, sites conforming to the inheritance of single parents and sites not conforming to the inheritance rule, the sites are uniformly distributed, the number of-1 and-2 sites is 11443 and exceeds 800, the result is judged to be unqualified, the parents are not in family or the sample is wrong, and the subsequent judgment cannot be carried out.

2. trio sample group: NP19E0911-NP19E0910-NP19E0912.

As shown in the results of FIGS. 6-7, the sample has sites conforming to the inheritance of parents, sites conforming to the inheritance of a single parent source only and sites not conforming to the inheritance rule, are uniformly distributed, lack single parent type sites (almost no sites of 2F or 3F), have 5878 sites-1 and-2 type sites, exceed 800 sites, are unqualified in the result, and are not autogenous or have mistakes in the sample, and cannot be subjected to subsequent judgment.

3. trio sample group: NP20E957-NP20E956-NP20E958.

As shown in the results of figures 8-9, the sites conforming to the inheritance of parents, the sites conforming to the inheritance of single parent sources and the sites not conforming to the inheritance rule are all present in the sample, are uniformly distributed, lack single parent source type sites (almost no sites of 2M or 3M), and simultaneously the number of-1 and-2 type sites is 6044, more than 800, so that the result is judged to be unqualified, the mother is not in family or the sample is wrong, and the subsequent judgment cannot be carried out.

After the samples are analyzed, the parent samples and/or the mother samples are/is deleted because the requirements of the trio samples are not met, and the subsequent analysis cannot be continued.

Example 5

1. trio sample group: NP21F6166- -NP21F6167- -NP21F6168.

As shown in fig. 10-11, the chr15 in the sample has only sites conforming to the inheritance of the single parent source, the rest of autosomes are almost all sites conforming to the inheritance of the parents and are evenly distributed, the sites conforming to the inheritance of the single parent source and the sites not conforming to the inheritance rule (almost no sites of 2F, 3F, -1, -2) are absent, the coverage range is about 72Mbp due to the fact that 180 sites of 2M or 3M are continuous on the chr15, and meanwhile, the CNV result is not abnormal, and the result is judged as the chr15 parent UPD, and the UPD section covers a plurality of gene imprinting areas, so that the result is indicated as the high risk pathogenic UPD.

2. trio sample group: NP19F0315- -NP19F0313- -NP19F0314.

As shown in fig. 12-13, the chr6 in the sample has only sites conforming to the inheritance of the parent, the rest of autosomes are almost all sites conforming to the inheritance of the parents and are evenly distributed, the sites conforming to the inheritance of the parent and the sites not conforming to the inheritance rule (almost no sites of 2M, 3M, -1, -2) are absent, the coverage range is about 169Mbp due to the sites of 813 2F or 3F continuously on the chr6, and meanwhile, the CNV result is not abnormal, and the result is judged as the chr6 parent UPD, and the UPD section covers a plurality of gene imprinting areas, so that the result is indicated as the high risk pathogenic UPD.

3. trio sample group: NP21F3536- -NP21F3567- -NP21F3537.

As shown in FIGS. 14-15, the result shows that the chr20 has only sites conforming to the inheritance of a single parent source, the rest of autosomes are almost all sites conforming to the inheritance of parents and are evenly distributed, the sites conforming to the inheritance of the single parent source and the sites not conforming to the inheritance rule (almost no sites of 2F, 3F, -1 and-2) are absent, the coverage range is about 63Mbp because of the sites of 197 2M or 3M on the chr20, and CNV results are not abnormal, and the result shows that the chr20 parent source UPD is judged as high risk pathogenic UPD because the UPD section covers a plurality of gene imprinting areas.

All of the samples were analyzed to be at risk for pathogenic UPD.

Example 6

An NGS-trio based uniparent diploid screening was exemplified with 2 sets of clinical samples using the screening apparatus of example 2.

1. trio sample group: NP19E1380- -NP19E1381-NP19E1382.

As shown in fig. 16-17, a small segment of chr15 in the sample has only sites conforming to the inheritance of a single parent source, the rest of chr15 and the rest of autosomes are almost all sites conforming to the inheritance of a parent source and are evenly distributed, the sites lacking the inheritance of the single parent source and the sites not conforming to the inheritance rule (almost no sites of 2M, 3M, -1 and-2), the coverage range is about 4Mbp due to the continuous 16 sites of 2F or 3F on chr15, and the CNV result indicates that there is a heterozygous deletion of about 4Mbp in the same range of chr15, and the partial parent source deletion of chr15 is judged as a partial parent source fragment with only one copy (the clinical effect caused is similar to that of the parent source UPD), and the segment covers a plurality of gene imprinting areas, which indicates that the heterozygous deletion of the pathogenic parent source is at high risk.

2. trio sample group: NP19E0056- -NP9E0057- -NP9E0055.

As shown in FIGS. 18-19, there is a small local region of chr8 in this sample where there is only a single parent genetic locus (where there is a single parent locus that may be a sequencing error or other cause that does not affect the overall analysis), the remainder of chr8 and the remaining autosomes are almost all parent genetic loci and evenly distributed, there is a lack of single parent genetic locus and non-genetic loci (almost no 2M, 3M, -1, -2 sites), and there is a heterozygous deletion of about 11Mbp within the same region of chr8 due to the sequential 69 2M or 3M loci on chr8 coverage, while CNV results suggest a local parent deletion of chr8 (where there is only one copy of the parent fragment (resulting clinical impact similar to parent UPD) is suggested to be a high risk pathogenic heterozygous deletion due to the region covering multiple gene imprinting regions).

The samples were analyzed for high risk pathogenic heterozygous deletions and the clinical effects were similar to the UPD from which the deletion was derived (e.g., parental heterozygous deletions were similar to maternal UPD).

Example 7

UPD was screened in 792 cases of whole exon trio sequencing at the present detection center using the screening apparatus of example 2, and the results are shown in the following table.

Table 1.792 screening UPD results in full exon trio sequencing

Note that: the above "detection of a single parent source" refers to the detection of UPD (group 14) or heterozygous deletion (group 32);

the above-mentioned "PWS-AS" refers to a pathogenic situation due to chr15-UPD, where parent UPD will cause PWS, parent UPD will cause AS,

the chr15-UPD is a common pathogenic condition, and corresponding methylation detection methods are available on the market at present, wherein the parent source UPD can cause PWS, the parent source UPD can cause AS, and the 7 cases of chr15-UPD screened by the embodiment are verified by using methylation detection, and the results are matched, so that the method provided by the invention has high detection result accuracy.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The method for detecting the uniparent diploid based on the NGS-trio is characterized by comprising the following steps of:

and (3) data acquisition: obtaining NGS sequencing data of the trio samples in the same group; the same group of trio samples comprise a father sample, a mother sample and a forerunner sample;

mutation site selection: respectively selecting mutation sites meeting preset conditions in each sample, defining the mutation sites as qualified mutation sites of the sample, and positioning the mutation sites removed by screening as unqualified mutation sites of the sample; the mutation sites were selected as follows: 1) Screening high-quality mutation sites in NGS sequencing data; the high quality mutation site is a mutation site meeting the following criteria: GATK-VQSR quality control PASS, total coverage >20X, mutation frequency >25%; 2) Removing the mutation site located on the Y chromosome; 3) Screening the point mutation sites; 4) Excluding suspected false positive sites according to Hardy-Weinberg equilibrium; 5) Sites with mutation frequencies higher than 70% were removed for heterozygous sites, and sites with frequencies lower than 85% were removed for homozygous sites; 6) Typing the mutation at each position, and removing more than 2 types of loci; 7) The rest sites are mutation sites meeting preset conditions;

site data merging: merging unqualified mutation sites of all samples in the same trio sample, acquiring and concentrating chromosome coordinates of each unqualified mutation site, and removing mutation sites consistent with the unqualified site coordinates from qualified sites of each sample; then according to the residual qualified mutation sites in the group of samples, mutually supplementing the genotyping at the non-mutation position to be a homozygote site consistent with the reference sequence; the mutation site data with consistent coordinates are arranged according to the sequence of the prover, the father and the mother;

classification of genetic patterns: classification of genetic patterns was performed for each trio combination of mutation sites, and the mutation sites were classified into: sites conforming to the inheritance of parents, sites conforming to the inheritance of only parents and sites not conforming to the inheritance rule; loci that are compatible with parental inheritance are classified as: type 1: only the genetic locus of parents is met; type 0: the method is suitable for both the genetic loci of parents and the genetic loci of parents; loci that only meet uniparental inheritance are classified as: type 3F: sites that can only be rescued by parent monomers; type 2F: sites generated by either parent monomer rescue or parent trisomy rescue; type 3M: sites that can only be rescued by parent monomers; type 2M: sites generated by either maternal monomer rescue or maternal trisomy rescue; loci that do not meet genetic rules are classified as: -type 1: any one of parents does not accord with the genetic rule; -type 2: both parents are not in line with the genetic rule;

2. The NGS-trio-based uniparent diploid detection method of claim 1, wherein in the step of determining uniparent fragments, if more than 8 consecutive sites of type 2F or 3F are reached, the coverage exceeds 1Mbp, i.e., fragments from the single parent source are determined; if more than 8 sites of 2M or 3M type are reached, the coverage exceeds 1Mbp, namely the fragments which are judged to be the source of the single parent source.

3. The NGS-trio-based single parent diploid assay method of claim 1, wherein in the step of determining UPD, data determined as a single parent fragment is compared with the results of the whole exon sequencing copy number analysis, if the copy number analysis suggests that the segment is single copy, then determining as a fragment deletion; otherwise, judging as UPD.

4. Use of the NGS-trio based uniparent diploid assay method according to any one of claims 1-3 in the development or preparation of a device for pathogenic UPD screening.

5. An NGS-trio-based screening device for a uniparent diploid comprising: the device comprises a data acquisition module, a data analysis module and a UPD judgment module;

the data acquisition module is used for acquiring NGS sequencing data of the trio samples in the same group; the same group of trio samples comprise a father sample, a mother sample and a forerunner sample;

the data analysis module analyzes according to the following steps:

the UPD judging module analyzes according to the following steps:

6. The NGS-trio-based screening device of the uniparent diploid of claim 5, wherein in the step of determining the uniparent fragment, if more than 8 consecutive sites of type 2F or 3F are reached, the coverage exceeds 1Mbp, i.e., fragments determined to be the source of the uniparent parent source; if more than 8 sites of 2M or 3M type are reached, the coverage exceeds 1Mbp, namely the fragments which are judged to be the source of the single parent source.

7. The NGS-trio-based single-parent diploid screening apparatus of claim 5, wherein in the step of determining UPD, data determined as a single-parent fragment is compared to the results of the whole-exon sequencing copy number analysis, and if the copy number analysis suggests that the segment is single copy, the segment is determined to be deleted; otherwise, judging as UPD.

8. A storage medium comprising a stored program that implements the functionality of the screening apparatus of any one of claims 5-7.

9. A processor for running a program implementing the functionality of the screening device of any one of claims 5-7.