CN117467743A

CN117467743A - Gene capturing method and application thereof in whole exon sequencing

Info

Publication number: CN117467743A
Application number: CN202311271976.8A
Authority: CN
Inventors: 乔志宏; 阳紫莹; 宋立洁; 彭智宇
Original assignee: Tianjin Medical Laboratory Bgi
Current assignee: Tianjin Medical Laboratory Bgi
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-01-30

Abstract

The application provides a gene capturing method and application thereof in whole exon sequencing. The method comprises the following steps: hybridizing the gene to be captured by using a first probe and a second probe, wherein at least part of the first probe is suitable for complementarily pairing with an exon region of the gene to be captured, and at least part of the second probe is suitable for complementarily pairing with a non-coding region of the gene to be captured and a mitochondrial genome; wherein the volume ratio of the first probe to the second probe is 1:1-4:1. According to the method, the first probe and the second probe are used for capturing the non-coding region of the target gene and the mitochondrial genome, so that the capturing efficiency can be effectively enhanced, and the detection range can be enlarged.

Description

Gene capturing method and application thereof in whole exon sequencing

Technical Field

The present application relates to the biotechnology field, in particular to the field of gene sequencing, and more particularly to a gene capturing method, a gene sequencing method and a sequencing data analysis method.

Background

Monogenic genetic diseases are genetic diseases controlled by a pair of alleles, which are more than 9000 in variety and increase at a rate of 10 to 50 per year. According to World Health Organization (WHO) statistics, the overall incidence of monogenic disease in the population reaches a level of 10 per thousand. Currently, monogenic genetic diseases have posed a serious threat to human health.

The whole exome sequencing technology is used as a high throughput sequencing method to sequence 18 ten thousand exons of 2 ten thousand genes in human genome. The exome includes all protein coding regions in the human genome, accounting for 1-2% of the entire human genome. It has been reported that about 85% of the DNA sequence variations known to be associated with diseases occur in exons or their regulatory regions (Botstein D, risch N.discovery genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease.Nat Genet.2003;33 support: 228-237.). In recent years, with rapid development of high-throughput sequencing technology, whole exome sequencing technology has become an effective single-gene genetic disease gene detection method due to the advantages of high throughput, high sensitivity, low cost and the like. However, existing methods of exon sequencing still have certain limitations.

Thus, there is still a need for improvement in the current methods based on whole exon sequencing.

Disclosure of Invention

The present application was completed based on the findings of the inventors:

whole exome sequencing techniques have been validated as suitable for genetic testing of genetic disorders. Full exome sequencing data for each sample typically produces hundreds of thousands of genetic variations. A general analysis strategy involves detailed annotation of these variations and then screening for their frequency in the normal control population frequency database and the function of the variations. At the same time, it is also considered whether the variation has been reported by the literature and whether there is a relevant record in the disease database. Subsequently, by interpreting these variations one by one, it is analyzed whether they are correlated with clinical symptoms of the subject. However, there are certain limitations to this technique, including: 1) Only SNP (single nucleotide mutation)/InDel (InDel mutation) of the exon region is detected, and the non-coding region pathogenic site and mitochondrial variation cannot be detected; 2) The clinically common mutation such as partial dynamic mutation, heterozygosity Loss (LOH) and the like cannot be analyzed; 3) Sequencing produces a large number of genetic variations and is labor intensive to interpret.

The present application aims to solve at least one of the technical problems existing in the prior art. For this reason, the present application aims to propose a means capable of effectively detecting a variety of variations.

In a first aspect of the present application, a method of gene trapping is presented. According to an embodiment of the present application, the method comprises: hybridizing the gene to be captured by using a first probe and a second probe, wherein at least part of the first probe is suitable for complementarily pairing with an exon region of the gene to be captured, and at least part of the second probe is suitable for complementarily pairing with a non-coding region of the gene to be captured and a mitochondrial genome; wherein the volume ratio of the first probe to the second probe is 1:1-4:1.

According to the embodiment of the application, the method can effectively enhance the capturing efficiency and expand the detection range by capturing the non-coding region of the target gene and the mitochondrial genome by using the first probe and the second probe simultaneously.

According to an embodiment of the present application, the above-described gene capturing method further includes at least one of the following technical features:

according to an embodiment of the present application, the first probe is purchased from KAPA HyperExome Probes. In some examples of the present application, the first probe is selected from the group consisting of exome capture probes.

In some examples of the present application, the probe is obtained by:

https:// sequencing. Roche/global/en/products/groups/kapa-hyperexome. Html#documents; ROCHE, KAPA HyperExome,96rxn, cat: 09062572001.

in some examples of the present application, the second probe is synthesized by ROCHE company. In some examples of the present application, the second probe is selected from pathogenic non-coding region variation and mitochondrial genome capture probes for capturing about 3 ten thousand or more non-coding region pathogenic site regions and a full-length mitochondrial genome region of about 16.6kb, which can significantly expand the total-outside detection range and increase the diagnostic rate.

According to the embodiment of the application, the genes to be captured are subjected to library construction in advance. The abundance of the genes to be captured can be obviously increased through library construction treatment. In a sequencing scenario, the capturing efficiency and the capturing range can be remarkably increased by constructing a library of samples to be tested and then capturing genes to be captured in the sequencing library by using the first probe and the second probe.

According to embodiments of the present application, the source of the gene to be captured is no more than 10 samples, preferably no more than 9 samples. In sequencing applications, an excessive number of samples of the gene source to be captured can cause a decrease in the average sequencing depth, thereby affecting the quality and accuracy of the results. In addition, the sensitivity of detection of rare or low frequency variations can also be affected. The inventor verifies through sequencing that the source of the gene to be captured should not be higher than 10 samples, otherwise the accuracy of the sequencing result is affected.

According to an embodiment of the present application, the volume ratio of the first probe to the second probe is 2:1. In the experimental test process, the inventor respectively tests the volume ratios of 1:1, 2:1, 3:1 and 4:1 by taking the total external probe to customized probe input volume ratio as a variable, and finally determines that the volume ratio of 2:1 is the optimal experimental condition. When the volume ratio of the first probe to the second probe is 2:1, the gene capturing efficiency is higher, and the capturing range is wider.

In a second aspect of the present application, a method of gene sequencing is presented. According to an embodiment of the present application, the method comprises: capturing the gene to be detected by using the method described in the first aspect of the application; and sequencing the captured processing product. According to the embodiment of the application, the gene sequencing method can detect the variation of an exon region, a non-coding region, mitochondria, heterozygosity loss, dynamic mutation and the like, and has the advantages of high detection sensitivity, wide detection range and the like.

According to an embodiment of the present application, the above-described gene detection method further includes at least one of the following technical features:

according to an embodiment of the present application, the test gene is obtained by subjecting nucleic acids of a predetermined plurality of test samples to a pooling process. Obtaining a nucleic acid library to be sequenced by carrying out library building treatment on a plurality of predetermined samples to be tested, wherein the nucleic acid library comprises the genes to be tested.

According to an embodiment of the present application, the nucleic acids of the predetermined plurality of samples to be tested are obtained by subjecting the predetermined plurality of samples to nucleic acid extraction processing. The nucleic acid extraction method is not particularly limited in the present application, and any nucleic acid extraction method commonly used in the field of gene sequencing may be applied.

According to an embodiment of the present application, the initial amount of nucleic acid for each predetermined test sample for the pooling process is not less than 50ng, preferably not less than 100ng, more preferably not less than 200ng, most preferably not less than 300ng. The initial amount of nucleic acid determines how much of the replicated DNA or RNA is sequenced in the sequencing. If the starting amount is low, the sequencing depth may be insufficient, resulting in insufficient sequencing coverage of certain regions, thereby affecting the ability to detect low frequency mutations, small amounts of expressed genes, or rare variations. Higher starting amounts generally result in better sequencing depth. The inventor finds out after experimental verification that under the premise of ensuring the accuracy of a sequencing result, the initial quantity of the nucleic acid of a preset sample to be tested is not lower than 50ng, and in order to improve the sequencing depth and the detection sensitivity and ensure the sequencing uniformity and the sequencing quality, the initial quantity of the nucleic acid of the preset sample to be tested is not lower than 300ng.

According to an embodiment of the present application, after the Pooling process, before the hybridization process, pooling is further performed on the Pooling product of the nucleic acid of each predetermined sample to be tested.

In some examples of the present application, the term Pooling treatment refers to mixing of Pooling products (i.e., DNA).

According to an embodiment of the present application, the Pooling process comprises mixing up to 10 samples, preferably no more than 9 samples. The sequencing cost can be reduced and the sequencing efficiency can be improved by carrying out the Pooling treatment. However, the inventor has found through experimental verification that the Pooling treatment cannot exceed 10 samples at most, otherwise, the accuracy and the sensitivity of the sequencing result are affected.

According to the embodiment of the application, after the Pooling treatment, the total amount of the genes to be captured in the hybridization treatment system is not less than 500ng. The inventors have found that maintaining a sufficiently high total nucleic acid amount in the hybridization processing system can help reduce noise and variation in the data. Higher amounts of starting nucleic acid can provide a more stable and reliable signal, thereby improving the quality of the sequencing data.

According to the embodiment of the application, the concentration of the gene to be captured in the hybridization treatment system is not lower than 20 ng/. Mu.L. Increasing the concentration of the gene to be captured in the hybridization processing system can increase the sensitivity of detection.

According to an embodiment of the present application, the sample to be tested is selected from at least one of peripheral blood, fresh tissue and saliva.

According to an embodiment of the present application, the library creating process is performed by: fragmenting the nucleic acid of the sample to be detected; performing double-selection treatment on the fragmented treatment product; performing terminal repair and 'A' treatment on the double-selection treatment product; performing joint adding treatment on the end repairing and adding 'A' treatment products; and carrying out PCR amplification treatment on the adaptor-added treatment product so as to obtain a library product of the nucleic acid of the sample to be detected.

In some examples of the present application, the term "double selection process" (or "double selection capture") refers to a molecular biological technique for selectively enriching or capturing a particular nucleic acid sequence, typically a gene, genomic region, or other DNA or RNA fragment of interest, in a sample. The core goal of this process is to selectively extract specific sequences from complex nucleic acid mixtures for subsequent analysis.

In some examples of the present application, the library creating processing method includes the following steps:

(1) Breaking the DNA of the sample to be detected, using 2 purified magnetic beads with different proportions, respectively removing the large fragments and the small fragments after fragmentation, further sorting the DNA fragments with the target size (about 250-350 bp) to obtain more concentrated DNA fragments (double-selective treatment), and mixing and reacting the DNA fragments, polymerase, polynucleotide kinase, dNTP mixed solution and polynucleotide kinase buffer solution to carry out end repair. Filling or cutting the ends of the fragmented DNA with a polymerase in the presence of dntps, e.g., filling the 5 'protruding ends with the 5' -3 'polymerization activity of the polymerase and/or cutting the 3' protruding ends with the 3'-5' exo activity of the polymerase, converting the 5 'hydroxyl groups to 5' phosphate groups and converting the 3 'phosphate groups to 3' hydroxyl groups with a polynucleotide kinase;

(2) The above end repair product was mixed with dATP, a polymerase lacking 5'-3' and 3'-5' exo-activity, and a polymerase buffer to carry out end addition A. Under the action of polymerase, A can be added to the end of the DNA molecule fragment with the repaired end by using the system;

(3) Mixing the end-added A product with a tag linker, ligase and a ligation buffer solution, and ligating to obtain a nucleic acid sample with the tag linker attached;

(4) And adding nucleic acid single strands complementary to both ends of the linker sequence as primers to carry out PCR amplification based on the nucleic acid sample connected with the tag linker so as to obtain a library product of the nucleic acid of the sample to be detected.

According to an embodiment of the present application, after the "A" addition treatment, before the PCR amplification treatment, further comprising subjecting the adaptor-added treatment product to a first purification treatment. The first purification treatment is performed to remove unreacted, remaining, or other non-modified or non-linked impurities and molecules that may be present from the mixture to purify and enrich the nucleic acid fragments of the sample under test, providing a concentrated template for the subsequent PCR amplification step.

According to an embodiment of the present application, the PCR amplification process further comprises subjecting the PCR amplification process to a second purification process. The purpose of the second purification treatment is to concentrate and remove unwanted components and impurities in the PCR amplification product to obtain a high quality library product.

In a third aspect of the present application, a method of sequencing data analysis is presented. According to an embodiment of the present application, the method comprises sequencing a gene to be tested using the method described in the second aspect of the present application, so as to obtain a sequencing result; and analyzing the sequencing result so as to determine the gene mutation information of the nucleic acid sample to be tested. According to the embodiment of the application, the sequencing data analysis method can efficiently finish sequencing and interpretation of the mutation sites, shortens the interpretation period and improves the interpretation efficiency. In the diagnosis of some clinical diseases, the one-stop detection of single-gene genetic diseases such as exon region variation, non-coding region variation, mitochondrial variation, heterozygosity deletion and dynamic mutation can be realized, the cause of the disease is rapidly clarified, the clinical prognosis is guided, and the death rate is effectively reduced.

According to an embodiment of the present application, the above-described gene sequencing method further includes at least one of the following technical features:

according to embodiments of the present application, the analysis is performed by comparing the sequencing results to a reference gene.

In some examples, the reference genome is a defined sequence, may be a pre-determined assembled DNA and/or RNA sequence by itself, or may be a published DNA and/or RNA sequence by others, and may be any reference template in a biological class to which the pre-obtained sample source individual/target individual belongs, e.g., all or at least a portion of a published genomic assembly sequence of the same biological class. If the sample source or target individual is a human, the genomic reference sequence (also referred to as a reference genome or reference chromosome) may be selected from human reference genomes provided by UCSC, NCBI or ENSEMBL databases, such as HG19, HG38, GRCh36, GRCh37, GRCh38, etc., and the corresponding relationship of each reference genome version may be known to those skilled in the art through the description of the databases, and the version used may be selected. Furthermore, a resource library containing more reference sequences can be pre-configured, for example, before comparison, sequences which are closer to or have a certain characteristic can be selected or determined and assembled according to factors such as the sex, the race, the region and the like of the target individual to serve as the reference sequences, so that more accurate sequence analysis results can be obtained later.

The reference sequence can be constructed when the target sample is detected, or can be pre-constructed and stored and called when the prepared sample is detected. In certain embodiments, the test sample is from a human, and the reference sequence is a human reference genome or a human autosomal group.

According to an embodiment of the present application, the genetic mutation includes at least one selected from the group consisting of single nucleotide site variation (SNV), heterozygosity deletion (LOH), dynamic mutation (STR), mitochondrial variation, copy Number Variation (CNV), single nucleotide mutation (SNP), and InDel mutation (InDel).

According to embodiments of the present application, the sequencing is performed using DNBSEQ-T7 or MGISEQ-2000PE100 sequencing platform.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a flow chart of a sequencing analysis of a sample to be tested according to one embodiment of the present application.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

Definition of the definition

As used herein, the singular forms "a," "an," and "the" include plural referents (one or more). "set" or "plurality" refers to two or more.

In this document, "comprising" or "including" is an open-ended expression, including what is indicated or exemplified hereafter, and also includes what is applicable or consistent with the stated situation, but not specifically recited.

As used herein, "probe" refers to a short single-stranded DNA or RNA molecule that is used to specifically recognize and bind to a target nucleic acid sequence. The primary function is to detect, label, or enrich the nucleic acid sequence of interest in an experiment for analysis, detection, or isolation.

Herein, "sequencing" refers to determining the primary structural base arrangement order of nucleic acid molecules, wherein second generation sequencing can be achieved using sequencing by synthesis (sequencing by synthesis Sequencing by Synthesis, SBS) and/or sequencing by ligation (sequencing by ligation, SBL) principles, and third generation sequencing can be achieved using Single-molecule real-time (SMRT) principles. Sequencing may include DNA sequencing and/or RNA sequencing. Including long fragment sequencing and/or short fragment sequencing, where the long and short fragments are relative, such as nucleic acid molecules longer than 1Kb, 2Kb, 5Kb, or 10Kb may be referred to as long fragments, and shorter than 1Kb or 800bp may be referred to as short fragments; may include double-ended sequencing, single-ended sequencing, paired-ended sequencing, and the like, where double-ended sequencing or paired-ended sequencing may refer to the readout of any two segments or portions of the same nucleic acid molecule that do not overlap completely.

Sequencing may be performed by a sequencing platform, according to embodiments of the present application, alternative sequencing platforms include, but are not limited to, a Hua Daji-block sequencing platform, a pavbio's REVIO platform, a Oxford Nanopore Technologies PromethION platform; the sequencing mode can be single-ended sequencing, double-ended sequencing or the sequencing mode supported by an automatic sequencing platform.

Sequencing the sequence read out is called sequencing sequence, also called reads (reads), and the length of the sequencing sequence or read is also called read length. "sequencing read" is used interchangeably with "read" or "read" and refers to a nucleic acid sequence obtained upon sequencing, which is referred to herein as a "sequencing read" or "read", typically three-generation sequencing reads have a read length of between thousands and tens of thousands of bp.

As used herein, "sequencing library" refers to a collection of samples of DNA, RNA or DNA fragments to be sequenced that are prepared into a particular form when performing a sequencing study such as a genome, transcriptome or proteome. The sequencing library generally comprises the following sequences: linker sequences, sequencing primer recognition sequences, cell tag sequences, sample tag sequences and UMI sequences, cDNA insert sequences, and the like.

Analysis method based on whole exon sequencing

The whole-exome sequencing detection of the exome region of about 2 ten thousand genes in the human genome has become an effective detection mode for detecting the monogenic genetic disease genes, but has some detection limitations, such as the limitation of the detection range of the traditional whole-exome, the sequential detection is often required for patients with strong phenotype heterogeneity to determine the cause of the disease, the detection period is long, the detection cost is high, and the like. In the current whole exon data analysis, the maximum rate limiting step is the comparison analysis between the diseases associated with mutation and the clinical phenotype of the subject.

To increase the rate of analysis, in one example of the present application, the inventors based on a commercial KAPA HyperExome Probes (first probe or exome capture probe) whole exon capture chip, designed a second probe (custom capture probe) by screening for clinically significant non-coding region pathogenic variation and mitochondrial regions, captured the exon and clinically significant non-coding region DNA sequences in the genome and mitochondrial genome, and sequenced on MGISEQ-2000 or DNBSEQ-T7 sequencing platforms. The sequence comparison, mutation detection and annotation are carried out on the off-machine data through the standard operation procedure of Huada internal whole exon analysis. And then, the obtained variation is interpreted by combining the clinical information of the sample, and the genetic reason of the tested person is determined.

For ease of understanding, the method of analysis based on whole exon sequencing in this application is described in detail with reference to fig. 1.

(1) DNA extraction

Extracting genomic DNA from a sample to be tested (including but not limited to peripheral blood, fresh tissue, saliva) of a subject, and performing concentration detection (evaluation of whether DNA yield satisfies the next library construction) and agarose gel electrophoresis (evaluation of whether DNA degradation exists) on the extracted DNA;

(2) Library construction

Library preparation genomic DNA that is to be qualified for quality inspection after extraction is prepared into a sequencing library that is adapted to an MGI sequencing platform, the process comprising: DNA homogenization, enzyme digestion fragmentation, fragment double selection, end repair & addition of "A", linker "Adapter" ligation, post-ligation purification, pre-PCR post-purification and library quality inspection;

(3) Hybrid capture

Hybrid capture, i.e., capturing the exon regions, non-coding regions of clinical significance, and mitochondrial DNA molecules using an extended whole-exon chip, the process includes: library concentration, hybridization capture, elution reagent washing, post-PCR purification and hybridization library quality inspection. The capture chip used for hybridization capture is an extended type all-external capture chip and consists of KAPA HyperExome Probes and KAPA customized probe pool (-3 Mb);

(4) Sequencing on machine

The hybridization library with qualified quality inspection is subjected to cyclization, digestion, nanosphere preparation and other links to generate a library which can be put on an MGI sequencing platform (MGISEQ-2000 or DNBSEQ-T7), and the sequencing read length is PE100/PE150;

(5) Data analysis

The sequencing data FASTQ is subjected to steps of comparison, mutation detection, mutation annotation, mutation filtering and the like to obtain a result which can be used for genetic interpretation, and the analysis result is pushed to a BGI sunward interpretation system in the next step through a flow. The analysis results include: sample quality control, loss of heterozygosity (LOH), dynamic mutation (STR), mitochondrial variation, copy Number Variation (CNV), SNV/InDel variation. The mutation annotation step annotation information comprises: gene-disease-phenotype, genomic elements, population frequency, software calculation and prediction databases (deleterious prediction, conservative prediction, cut prediction), public mutation databases (e.g., clinvar, HGMD, etc.).

(6) Genetic interpretation

Genetic interpretation analysis was performed based on the sun-facing interpretation system developed independently from Huada. First, human phenotypic standard terms are extracted from clinical phenotypes of subjects for examination, and the detected variants are ranked for disease correlation, preferably the first 50 variants are manually interpreted, based on a BGI-developed, autonomous phenotypic ranking system. Taking the clinical phenotype of a subject as a core, integrating genetic modes and phenotype correlations, screening genetic variation which can explain symptoms of the subject, and then reading variation sites based on a sequence variation reading guide of the American medical genetics and genomics society (ACMG) to obtain a genetic diagnosis result.

According to the embodiment of the application, in the detection of some single-gene genetic diseases, the method can detect the exon region variation, the non-coding region variation, the mitochondrial variation, the heterozygosity deletion and the dynamic mutation of the single-gene genetic diseases in a one-stop mode, quickly and clearly determine the cause, is beneficial to guiding clinical prognosis and reduces the death rate.

It should be noted that the features and technical effects described in the different aspects herein may be mutually referred to, and are not described herein again.

The scheme of the present invention will be explained below with reference to examples. It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The examples are not to be construed as limiting the specific techniques or conditions described in the literature in this field or as per the specifications of the product. For example, by reagents in some commercially available kits or by means of techniques and conditions well established in the art. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.

Example 1

This example demonstrates the feasibility and effectiveness of the sequencing optimization protocol of the present application by sequencing 4 infants with complex clinical phenotypes and 5 standards (2 purchased from Coriell,3 from self-established DNA).

Patient 1: the infant has various clinical phenotypes, very poor physical condition after birth, muscle weakness, suction weakness, refusal to eat, epilepsy occurrence on day 4, uncontrollable medicine and more serious. Quadriplegia, feeding difficulties, delayed mental movements, and often a confusing expression. It takes 8 months to receive a large number of routine examinations and partial gene detection, and no diagnosis can be confirmed. After full exome sequencing detection, data analysis was performed using a gene set with retarded development, and two suspected pathogenic variations were detected on the ADSL gene associated with adenylosuccinate lyase deficiency: ADSL gene c.953C > T (heterozygous)/c.1277G > A (heterozygous). The final diagnosis was that of adenylyl succinate lyase deficiency.

Patient 2: the patient moves involuntarily throughout the body for more than 10 years, ataxia and family genetic history is indicated in the censoring information. After detection of the extended whole exome, 1 dynamic mutation pathogenic variation on the HTT gene associated with huntington's disease was found.

Patient 3: the infant shows low reaction force, weak sucking, difficult feeding and low muscular tension after birth, and clinically focuses on neuromuscular problems and small fat Willi syndrome without family genetic history. After detection of the expanded whole exome, the large fragment homozygote was found to exist on chromosome 15 (chr 15: 23600000-45400000), and analysis of the whole exotic data showed a segmental maternal uniparental diploid.

Patient 4: the patient has low muscle strength, amyotrophy, and can not walk after 12 days, and the whole body muscle is atrophic, and the muscle strength of the upper limb is weakened; no family genetic history

The detection scheme is as follows:

genomic DNA of peripheral blood of the above 4 patients was extracted with MGI nucleic acid extraction kit (nucleic acid extraction reagent of Ehan apparatus 20150250), and the extraction concentration information was as follows:

table 1: genomic DNA concentration of 4 patients and 5 standards

Sample of	Extraction concentration (ng/. Mu.l)	Quality control	Remarks
				Patient 1	45.2	Qualified is not less than 300ng	-
Patient 2	56	Qualified is not less than 300ng	-
				Patient 3	38.3	Qualified is not less than 300ng	-
Patient 4	31.8	Qualified is not less than 300ng	-
				GM13716	97.4	Qualified is not less than 300ng	Coriell
GM13717	102	Qualified is not less than 300ng	Coriell
				NA09912	112	Qualified is not less than 300ng	Coriell
WD003	42.5	Qualified is not less than 300ng	Self-established standard DNA
				WD004	44	Qualified is not less than 300ng	Self-established standard DNA

1. Library construction

1.1 genomic DNA Enzymolysis

a) Respectively taking 300ng of DNA sample to be detected into 0.2ml PCR tubes, supplementing the DNA sample to 14 mu L by nuclease-free water, fully mixing the DNA sample and the nuclease-free water, and then carrying out instantaneous centrifugation.

b) The amount of breaking reaction mixture required for the test (preparation is carried out on ice) was prepared according to the proportions of table 2.

c) Adding 6 mu L of DNA into the PCR tube of the above (a) to break the reaction mixture (ice operation), fully mixing, instantly centrifuging, placing the mixture on a PCR instrument for 14 minutes at 32 ℃ and 30 minutes at 65 ℃, and cooling to 4 ℃ for holding.

Table 2: breaking up the reaction mixture

Reagent name	Standard quantity of reaction
		Breaking buffer solution	2μL
Breaking enzyme	4μL
		Total volume of	6μL

1.2 fragment double selection

a) 2N 1.5mL EP tubes (n=number of samples to be tested, n=9 in this example) were prepared, and 30 μl of magnetic beads were dispensed from the N tubes, and sample number-1 was labeled; n tubes are respectively filled with 12 mu L of magnetic beads, the number of the marked sample is-2, and the magnetic beads need to be balanced for at least 30min at room temperature before being used;

b) Adding 30 mu L of TE buffer solution into the broken sample, supplementing the total volume to 50 mu L, and blowing and uniformly mixing;

c) Transferring 50 μl of the sample into a 1.5ml EP tube containing 30 μl of magnetic beads, checking sample number, blowing for 10 times, mixing thoroughly, standing for 5min, and adsorbing with a magnetic rack to obtain supernatant;

d) Transferring 80 μL supernatant into 1.5ml EP tube containing 12 μL magnetic beads, checking sample number, blowing for 10 times, mixing thoroughly, standing for 5min, and adsorbing with magnetic rack to obtain supernatant;

e) Discarding the supernatant, adding 500 mu L of 75% ethanol one by one (slowly adding, avoiding separation of magnetic beads as much as possible), fixing a magnetic frame, and rotating the tube for one circle (360 degrees);

f) Repeating step e) once;

g) Removing the supernatant, adsorbing after short separation, sucking the residual liquid by using a 10 mu L suction head, and airing at room temperature until the surface of the magnetic bead is anhydrous;

h) Adding 42 mu L of TE buffer solution for back dissolution, blowing for 10 times, fully mixing, standing for 5min, adsorbing with a magnetic rack until the liquid is clarified after short separation, taking 40 mu L of supernatant in a new 0.2mL PCR tube for the next end repair reaction, marking a sample number by a tube cover, and checking the sample number after sample transfer.

1.3 terminal repair & addition of "A"

a) Preparing the end repair mixture in a new centrifuge tube according to the proportion of Table 3, wherein the consumption is considered in preparation, and the preparation is carried out on ice;

b) Adding 10 mu L of the end repair mixture into the PCR tube in the step (h), fully mixing, performing instantaneous centrifugation, placing the mixture on a PCR instrument, incubating at 37 ℃ for 30 minutes, incubating at 65 ℃ for 15 minutes, cooling to 4 ℃ and keeping, wherein the PCR instrument needs a heat cover (recommended 105 ℃);

table 3: mixed solution for repairing tail end

Reagent(s)	Standard quantity of reaction
		Terminal repair buffer solution	7.1μL
Terminal repair enzyme	2.9μL
		Total volume of	10μL

1.4 Joint connection

a) Preparing a connection reaction mixed solution with the amount required for detection in a new centrifuge tube according to the proportion of Table 4, wherein loss is considered in preparation, and the preparation is performed on ice;

table 4: connection reaction mixture

b) Preparing N0.2 ml PCR tubes (N=the number of samples to be detected, N=9 in the embodiment, the tube cover marks the sample number, 5 mu L of tag connectors X (X is one of tag connectors 1-20) are respectively added, and each sample needs to correspond to a unique tag connector;

Note that: each original sample corresponds to a unique tag linker, which is a necessary condition to enable simultaneous detection of multiple samples from different sources in one sequencing reaction. Therefore, special attention should be paid to recording in detail the original sample name or number corresponding to each tag connector when in use.

c) Then adding 25 mu L of the connection reaction mixed solution respectively, and operating on an ice box, taking care of replacing the suction head to avoid pollution;

d) Taking out the end repair reaction product, centrifuging briefly, transferring the end repair reaction product into a 0.2mL PCR tube added with a joint and connection reaction mixed solution, checking sample numbers by sample transferring, blowing 10 slowly, mixing uniformly, centrifuging;

e) The mixture was incubated on a PCR apparatus at 23℃for 1 hour, cooled to 4℃and kept, and the PCR apparatus was covered with a hot cap (50℃was recommended). After the reaction, the PCR tube was taken out and centrifuged briefly.

1.5 ligation product purification

a) Preparing N1.5 mL EP tubes (N=the number of samples to be detected, N=9 in the embodiment), subpackaging 50 mu L of magnetic beads for each tube, labeling the sample numbers by a tube cover, and balancing the magnetic beads for at least 30min at room temperature before using;

b) Taking out the connection reaction product (80 mu L), adding 20 mu L TE buffer, and blowing and mixing uniformly;

c) Transferring 100 μl of the sample into a 1.5mL EP tube containing 50 μl magnetic beads, checking sample number, blowing for 10 times, mixing thoroughly, standing for 10min, and adsorbing with a magnetic rack until the liquid is clarified after short separation;

d) Discarding the supernatant, adding 500 mu L of 75% ethanol one by one (slowly adding, avoiding separation of magnetic beads as much as possible), fixing a magnetic frame, and rotating the tube for one circle (360 degrees);

e) Repeating step d) once;

f) Discarding the supernatant, adsorbing after short separation, sucking the residual liquid by using a 10 mu L suction head, and drying at 40 ℃ until the magnetic beads are cracked (carefully observing the cracking condition of the magnetic beads, and continuously heating to potentially risk that the magnetic beads collapse from a sample adding hole, so as to cause loss and pollution among samples);

g) Adding 46 mu L TE buffer solution for back dissolution, blowing for 10 times, fully mixing, standing for 5min, adsorbing with a magnetic rack until the liquid is clear after short separation, and taking 44 mu L of supernatant in a new 0.2mL PCR tube for Pre-PCR reaction.

1.6Pre-PCR amplification

a) Preparing PCR reaction mixed solution with the amount required for detection in a new centrifuge tube according to the proportion of Table 5, wherein loss is considered in preparation, and the preparation is performed on ice;

table 5: PCR reaction mixture

Wherein, the PCR primer in Table 5 is MGI library universal amplification primer, and the 5'-3' sequence is as follows:

MGIAd_PCR_1：/5Phos/GAACGACATGGCTACGA

MGIAd_PCR_2：TGTGAGCCAAGGAGTTG。

b) Add 56. Mu.L of PCR reaction mixture (ice-cold operation) to the PCR tube of step a), mix well, spin down, run the PCR procedure of Table 6, and cover the PCR instrument thermally (suggested 105 ℃).

Table 6: PCR reaction procedure

1.7Pre-PCR product purification

a) Preparing N1.5 ml EP tubes (N=the number of samples to be detected, N=9 in the embodiment), subpackaging 100 mu L of magnetic beads for each tube, covering the tube with a marked sample number, and balancing the magnetic beads for at least 30min at room temperature before using;

b) Taking out PCR product (100 μl), centrifuging briefly, transferring into 1.5mL EP tube containing 100 μl magnetic beads, checking sample number during sample transfer, blowing for 10 times, mixing thoroughly, standing for 10min, and adsorbing with magnetic rack until liquid is clarified after short separation;

c) Discarding the supernatant, adding 500 mu L of 75% ethanol one by one (slowly adding, avoiding separation of magnetic beads as much as possible), fixing a magnetic frame, and rotating the tube for one circle (360 degrees);

d) Repeating step c) once;

e) Discarding the supernatant, adsorbing after short separation, sucking the residual liquid by using a 10 mu L suction head, and drying at 40 ℃ until the magnetic beads are cracked (carefully observing the cracking condition of the magnetic beads, and continuously heating to potentially risk that the magnetic beads collapse from a sample adding hole, so as to cause loss and pollution among samples);

f) Adding 27 mu L of non-nuclease water for dissolving back, blowing for 10 times, fully mixing, standing for 5min, adsorbing with a magnetic rack until liquid is clarified after short separation, taking 25 mu L of supernatant in a new 1.5mL EP tube for later use, labeling a sample number by a tube cover, and checking the sample number after sample transfer. If the subsequent experiment is not carried out temporarily, the purified sample is stored at-20 ℃.

1.8Pre-PCR library concentration detection

Purified products were assayed for concentration using Qubit, each library concentration required to be greater than 20 ng/. Mu.L, total >500ng.

2. Hybrid capture

2.1 preparation before hybridization

a) Pooling tube preparation: according to the number of hybridization chips N, 2*N 1.5mL centrifuge tubes were prepared, and library pooling tubes were labeled on tube caps: N-C-P+ chip type + date pooling index N block tube was labeled on tube cap: block+chip type.

b) Sample Pooling: calculation was performed based on Pre-PCR library concentration, and 400ng of each sample was taken for pooling. 9 Pre-PCR libraries were pooling into 1 tube and stored in 2.1 a) Chinese library pooling tube.

c) Index N Block Mix: index N block (200. Mu.M) was placed in order and 1.6. Mu.L was added sequentially to the EP tube in a total spot of 1.6 XN (N is the number of hybridized samples).

d) Note that: wherein Index Block numbering is consistent with library linker numbering, e.g., tag number 1 linker used in the library, index Block1 is used.

e) PCR Block: based on the number of hybridization samples, 0.32 μl of N volumes of PCR Block (N is the number of hybridization samples) was added to the EP tube.

2.2 hybridization reactions

a) Preparing hybridization sealing mixed solution, adding the hybridization sealing mixed solution Mix into a Pre-PCR product Pooling tube of 2.1 b) to form hybridization mixed solution:

TABLE 7 preparation of hybridization blocked mixes

Reagent(s)	Standard amount of probe reaction
		Public sealing liquid	20μL
Index N Block Mix	1.6*NμL
		PCR Block	0.32*NμL

b) Concentrating the hybridization mixture: cover the hybridization mixture and place in concentrator, mode: V-AQ mode 60 ℃, the instrument is started according to Star, until the hybridization mixture in the tube is concentrated to dry powder, and the time is about 30min. Under different environmental conditions, the concentration efficiency of the instrument is slightly different.

Note that: the vacuum concentrator can perform preheating operation in advance.

c) The concentrated hybridization sample tube was gently removed to prevent dry powder from splashing, and the EP tube cap was closed.

d) Preparing hybridization reaction liquid:

table 8: preparation of hybridization reaction solution

Reagent name	Sample size (μL) of single hybridization reaction
		Hybridization buffer 1	28
Hybridization buffer 2	9
		PCR-grade water	17.4

54.4 mu L of the prepared hybridization reaction solution is added into a sample Pooling tube of which 2.2 c) is concentrated to be dry powder, fully and uniformly mixed in an oscillating way, and the mixture is placed for 10 minutes at room temperature after centrifugation.

e) Probe preparation: the required probes are placed in an ice box for thawing and centrifuging, N PCR small tubes are taken, and numbering marks are made on tube covers. And respectively taking 4 mu L of the exome capture probe and 2 mu L of the customized capture probe, adding the exome capture probe and the customized capture probe into a PCR small tube, transferring the hybridization reaction mixed solution of 2.2 d) into the PCR small tube with the corresponding number and added with the probes, blowing for 10 times, uniformly mixing, centrifuging briefly, and standing at room temperature for 5min.

f) And (3) hybridization: placing the hybridization mixture of 2.2 e) on a PCR instrument, and setting the procedure as follows: 95℃for 5min,65℃for infinity, a heat cover for 105℃and 70. Mu.L of the reaction system, the hybridization time was 16 hours, and the procedure was run to start hybridization.

2.3 elution after hybridization

a) Preparing an eluting reagent: the hybridization elution reagent was thawed at room temperature and prepared according to Table 9. The required reagent dosage is configured according to the number of the probe chips eluted according to actual needs.

Table 9: elution reagent formulation table

b) N new 1.5mL centrifuge tubes (N is the number of eluting chip probes) are taken, 100 mu L of streptavidin magnetic beads are added into each tube, and the tubes are placed on a magnetic rack for 5 minutes, and after the liquid is clarified, the supernatant is carefully sucked and discarded.

Note that: the streptavidin magnetic beads need to be balanced for 30 minutes at room temperature before use, and are fully and uniformly mixed before use.

c) The streptavidin magnetic beads are washed by the magnetic bead binding liquid for 3 times, and the specific operation steps are as follows:

[1] mu.L of 1 Xbead-bound solution was added to each tube.

[2] Thermomixer 20 ℃/1400rpm 20s is shaken until well mixed, and the mixture is centrifuged instantaneously.

[3] The centrifuge tube was placed on a magnetic rack for 5 minutes until the liquid was clear, carefully aspirated and the supernatant discarded.

[4] Repeating the steps [1] to [3]2 times, (washing 3 times altogether). The magnetic beads are used to bind the DNA captured by the probe.

d) Magnetic beads bind to hybridization products: after hybridization for 16-24 hours, the hybridized product of the whole exons hybridized at 65 ℃ is short-separated and then transferred to the activated 3.3 d) magnetic beads, and the transfer is carried out by blowing and mixing or gentle shaking and mixing.

e) And (3) performing on-machine reaction: after the hybridization sample and the streptavidin magnetic beads are uniformly mixed, the hybridization sample is quickly placed on a PCR instrument for incubation at 65 ℃ for 15min.

f) After the incubation, the product was transferred to the corresponding labeled 1.5mL EP tube, the magnetic rack was set up until the liquid was clear and the supernatant was discarded.

g) Each tube was added with 100. Mu.L of pre-heated 1 Xwashing buffer 1 at 65℃and mixed by shaking at 20℃1400rpm for 10s, and the mixture was placed on a magnetic rack until the liquid was clear, and the supernatant was removed.

h) 200. Mu.L of a pre-heated 1 Xsolvent wash buffer at 65℃was added, the mixture was homogenized at 20℃and 1400rpm for 20s, incubated at 65℃for 5min, the solution was allowed to settle by a magnetic rack, and the supernatant was discarded. This procedure was repeated once.

i) 200. Mu.L of normal temperature 1 Xwashing buffer 1, themo mixer 20 ℃/1400rpm are added and mixed for 2min, the mixture is put on a magnetic shelf until the liquid is clear, and the supernatant is discarded:

j) 200. Mu.L of 1 Xwashing buffer 2 was added and mixed well for 1min at 20℃at 1400rpm in a ThermoMixer, and the solution was allowed to settle by a magnetic rack, and the supernatant was discarded.

k) 200. Mu.L of 1 Xwashing buffer 3 was added, the mixture was mixed for 30s at 20℃and 1400rpm in a magnetic rack until the liquid was clear, and the supernatant was discarded.

l) resuspension of the beads: mu.L of Nuclease-free Water was added and the mixture was blown and mixed.

2.4Post-PCR amplification

a) The PCR reaction mixture was prepared in a fresh centrifuge tube in the ratio shown in Table 5, and the amount of the PCR reaction mixture was measured on ice in consideration of the loss.

Table 10: post-PCR reaction mixture

Reagent(s)	Standard quantity of reaction
		PCR reaction solution 2	25μL
PCR primer	5μL
		Total volume of	30μL

b) To the PCR tube of step 3.12, 30. Mu.L of Post-PCR reaction mixture (ice working) was added, thoroughly mixed, and the Post-PCR reaction program of Table 6 was run, with a hot lid (suggested 105 ℃ C.) for the PCR instrument.

Table 11: post-PCR reaction procedure

c) After the reaction is finished, the PCR tube is placed on a magnetic rack for standing for 5 minutes until liquid is clarified, the supernatant (about 50 mu L) is completely transferred into a new 1.5mL centrifuge tube, 75 mu L of purified magnetic beads 2 which are balanced to room temperature are added into the supernatant, the mixture is fully mixed, the mixture is placed at room temperature for 5 minutes, and the centrifuge tube is placed on the magnetic rack for standing for 5 minutes until the liquid is clarified after the instantaneous centrifugation.

d) Carefully aspirate and discard the supernatant, add 500 μl of 75% ethanol, mix well, and rest for 1 min until the liquid is clear. This procedure was repeated once.

e) Carefully aspirate and discard the supernatant, hold the centrifuge tube on a magnetic rack, open the tube lid, rest for 5 minutes at room temperature and air dry (no reflection of light from the bead surface). Note that: excessive air drying or incomplete removal of ethanol may reduce yield.

f) Taking down the centrifuge tube, adding 22 mu L TE buffer solution into each tube, fully mixing the magnetic beads, standing at room temperature for 5 minutes, performing instantaneous centrifugation, placing the centrifuge tube on a magnetic rack, standing for 5 minutes until the liquid is clear, and transferring all the supernatant into a new centrifuge tube.

g) The Qubit is used for DNA quantification, the concentration is more than or equal to 10 ng/. Mu.L, and the yield is more than 160ng.

3. Sequencing on machine

And (3) performing sequencing pretreatment operations such as cyclization, digestion, nanosphere preparation and the like on the library after hybridization capture, and performing on-machine sequencing on MGISEQ-2000PE100 after the completion of the sequencing pretreatment operations, wherein the data size of each sample is more than 15Gb.

4. Analysis of biography and genetic interpretation

4.1 analysis of raw letter

a. The sequencer acquires an original short reading sequence;

b. removing the linker and low quality data from the sequencing data;

c. positioning the short sequence to a position corresponding to the human genome data;

d. counting sequencing result information, short sequence number, target area coverage, average sequencing depth and the like;

e. filtering low quality value and low coverage mononucleotide;

f. annotating, determining genes, coordinates, amino acid changes and the like of mutation sites;

g. dynamic mutation and heterozygosity deletion analysis procedure and annotation

h. And generating a sample analysis result data set and pushing the sample analysis result data set to a sunward interpretation system.

The sequencing data statistics are shown in Table 12.

Table 12: sequencing results statistics

Sample of	Total data volume	Target rate in	Average depth	Coverage of [ ]>＝20x)
					Patient 1	17,844,452,100	72.01％	247.01	99.43％
Patient 2	15,251,983,100	71.53％	209.27	99.62％
					Patient 3	18,839,286,500	71.79％	259.24	99.49％
Patient 4	18,466,598,400	68.10％	228.96	99.72％
					GM13716	18,849,548,000	65.40％	245.67	99.90％
GM13717	20,376,412,600	74.20％	232.96	99.95％
					NA09912	19,066,911,500	72.20％	203.88	99.93％
WD003	16,765,161,000	73.98％	239.29	99.93％
					WD004	19,578,981,400	70.20％	241.13	99.93％

The results of genetic interpretation for 4 patients are shown in Table 13.

Table 13: genetic interpretation

/>

The results of the 5 cases of standard samples are shown in Table 14. As can be seen from Table 14, the detection results of the standard product are consistent with the actual variation.

Table 14: standard detection results:

GM13716 and GM13717 in Table 14 are ATN1 dynamic mutation references, and the known mutation results are from the Coriell functional network (https:// www.coriell.org /). Dentate erythrocyte-pallidus atrophy is related to CAG trinucleotide repeat in ATN1 gene, the CAG repeat number is in the normal repeat range when 7-34, and the pathogenicity repeat number is 49-88.

In summary, the method for sequencing and analyzing the exome according to the technical scheme of the application can realize one-stop detection of the pathogenic variation of the genetic diseases with different variation types.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method of gene trapping comprising: hybridizing the gene to be captured by using a first probe and a second probe, wherein at least part of the first probe is suitable for complementarily pairing with an exon region of the gene to be captured, and at least part of the second probe is suitable for complementarily pairing with a non-coding region of the gene to be captured and a mitochondrial genome;

wherein the volume ratio of the first probe to the second probe is 1:1-4:1.

2. The method of claim 1, wherein the first probe is KAPA HyperExome Probes;

optionally, the genes to be captured are subjected to library construction treatment in advance;

optionally, the source of the gene to be captured is no more than 10 samples, preferably no more than 9 samples;

optionally, the volume ratio of the first probe to the second probe is 2:1.

3. A method of gene sequencing comprising:

Capturing the gene to be tested by the method of any one of claims 1 to 2; and

sequencing the captured processing product.

4. The method according to claim 3, wherein the gene to be tested is obtained by subjecting nucleic acids of a predetermined plurality of samples to be tested to a pooling treatment;

optionally, the nucleic acids of the predetermined plurality of samples to be tested are obtained by subjecting the predetermined plurality of samples to nucleic acid extraction treatment.

5. The method according to claim 4, wherein the initial amount of nucleic acid for each predetermined test sample of the pooling process is not less than 50ng, preferably not less than 100ng, more preferably not less than 200ng, most preferably not less than 300ng;

optionally, after the Pooling treatment, before the hybridization treatment, the Pooling treatment is further performed on the Pooling product of the nucleic acid of each predetermined sample to be tested;

preferably, the Pooling process comprises mixing up to 10 samples, preferably no more than 9 samples.

6. The method according to claim 5, wherein the total amount of the gene to be captured in the hybridization treatment system after the Pooling treatment is not less than 500ng;

Preferably, the concentration of the gene to be captured in the hybridization treatment system is not less than 20 ng/. Mu.L.

7. The method of claim 4, wherein the sample to be tested is selected from at least one of peripheral blood, fresh tissue, and saliva;

optionally, the library-building process is performed by:

fragmenting the nucleic acid of the sample to be detected;

performing double-selection treatment on the fragmented treatment product;

performing terminal repair and 'A' treatment on the double-selection treatment product; and

performing joint adding treatment on the end repairing and A adding treatment product;

performing PCR amplification treatment on the adaptor-added treatment product so as to obtain a library-building product of the nucleic acid of the sample to be detected;

optionally, after the "a" addition treatment, before the PCR amplification treatment, further comprising subjecting the adaptor-added treatment product to a first purification treatment;

optionally, the PCR amplification process is followed by a second purification process of the PCR amplification process.

8. A method of sequencing data analysis, comprising:

sequencing a gene to be tested by the method of any one of claims 3 to 7 so as to obtain a sequencing result; and

Analyzing the sequencing result so as to determine the gene mutation information of the nucleic acid sample to be tested.

9. The method of claim 8, wherein the analysis is performed by comparing the sequencing result to a reference gene;

optionally, the genetic mutation comprises at least one selected from the group consisting of a single nucleotide site variation, a heterozygous deletion, a dynamic mutation, a mitochondrial variation, a copy number variation, a single nucleotide mutation, and an indel mutation.

10. The method of claim 8, wherein the sequencing is performed using a DNBSEQ-T7 or MGISEQ-2000PE100 sequencing platform.