WO2014101655A1 - Procédé pour l'analyse d'un acide nucléique à rendement élevé et son application - Google Patents

Procédé pour l'analyse d'un acide nucléique à rendement élevé et son application Download PDF

Info

Publication number
WO2014101655A1
WO2014101655A1 PCT/CN2013/089131 CN2013089131W WO2014101655A1 WO 2014101655 A1 WO2014101655 A1 WO 2014101655A1 CN 2013089131 W CN2013089131 W CN 2013089131W WO 2014101655 A1 WO2014101655 A1 WO 2014101655A1
Authority
WO
WIPO (PCT)
Prior art keywords
probe
nucleic acid
sequence
sequencing
interest
Prior art date
Application number
PCT/CN2013/089131
Other languages
English (en)
Chinese (zh)
Inventor
姜正文
杨锋
Original Assignee
上海天昊生物科技有限公司
天昊生物医药科技(苏州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海天昊生物科技有限公司, 天昊生物医药科技(苏州)有限公司 filed Critical 上海天昊生物科技有限公司
Publication of WO2014101655A1 publication Critical patent/WO2014101655A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention is in the field of biotechnology and molecular diagnostics, and in particular, the present invention relates to a high throughput nucleic acid analysis method and use thereof. Background technique
  • a gene is the material basis of heredity and is a specific nucleotide sequence with genetic information on a DNA or RNA molecule.
  • the genetic material of almost all non-viral organisms is DNA, except that part of the viral genetic material is RNA.
  • Different species have their own specific gene sequences, so the biological species present in the sample can be judged by detecting the gene sequence in the sample.
  • genes are transcribed into mRNA by DNA, and then mRNA is used as a template to translate biologically active protein molecules, thereby presenting the genetic information stored in the DNA sequence.
  • mRNA is used as a template to translate biologically active protein molecules, thereby presenting the genetic information stored in the DNA sequence.
  • DNA methylation is one of the important ways to regulate gene expression.
  • DNA methylation can cause chromatin structure, DNA conformation, DNA stability, and changes in the way DNA interacts with proteins, thereby achieving the goal of controlling gene expression. In most cases, methylation occurs primarily at the 5 carbon atom of the cytosine ring of the cytosine nucleotide in the CpG sequence.
  • genes also have errors in the process of replication to produce "mutations," including point mutations, large fragment deletions/repeats (called copy number polymorphisms, CNV), gene inversions, or gene translocations.
  • CNV copy number polymorphisms
  • Some mutations can seriously affect the function of key genes and lead to diseases. Due to the selection, although the frequency of such mutations is very low in the population, a considerable number of mutations do not cause survival to individuals due to genes that do not seriously affect gene function or influence.
  • SNPs Single nucleotide polymorphisms
  • CNPs copy number polymorphisms
  • gene identification, gene expression analysis, DNA methylation analysis, mutation screening, SNP typing, CNP typing, and CNV detection are important molecular genetic research methods, and they are also widely used in clinical molecular diagnosis. Because of the importance of these genetic analyses, scientists and engineers have developed a variety of assays for each type of analysis.
  • High temperature ligase detection technology (LDR, SNPscan).
  • the detection methods of medium and small flux CNV mainly include real-time quantitative PCR, FISH, multiplexed probe amplification technology (MLPA), Multiple fluorescent competitive PCR technology (AccuCopy) and the like.
  • MLPA multiplexed probe amplification technology
  • AcuCopy Multiple fluorescent competitive PCR technology
  • Microarrays are characterized by high-density probe arrays that are "printed" with a large number of DNA probes of known partial sequence, using molecular hybridization principles to image various processed fluorescently labeled samples.
  • the microarray probe is hybridized, then washed to remove non-specific hybridization signals, and finally detected by a scanner for fluorescence detection, and the target gene-related semaphore is confirmed based on the intensity of the fluorescent signal and the position of the array in which the fluorescent signal is located.
  • the chip can simultaneously analyze thousands or even millions of gene fragments or polymorphic loci, and is widely used in species identification, expression profiling, high-throughput SNP analysis, genome-wide methylation level analysis, and Genomic copy number analysis and more.
  • microarray chips The biggest advantage of microarray chips is high throughput, which can analyze gene changes at the entire genome level, but its drawbacks are due to the prevalence of non-specific hybridization, poor quantitative accuracy, and the need for expensive hybridization and scanning instruments. High and custom chips are expensive and costly, and detection of unknown genes is not possible.
  • Second-generation sequencing technology enables the simultaneous sequencing of millions or even hundreds of millions of single-molecule amplification products. It is widely used in genome resequencing to rapidly identify disease-causing genes, transcriptome analysis, methylation profiles, microRNA identification, Whole-genome level protein-DNA interaction studies and genome sequencing of new species, etc.
  • a new generation of single-molecule direct sequencing technology is also under development in the fast-growing research, with the main representatives being Pacific Biosciences and Helicos.
  • the biggest advantage of this high-throughput sequencing technology is the large throughput and the ability to identify and quantify known or unknown genes at the same time, which is highly specific and efficient.
  • there are some shortcomings mainly due to the poor accuracy of next-generation sequencing compared to conventional sequencing.
  • the mutation introduced by single-molecule amplification will affect the final result analysis, and the technology platform is suitable for the whole genome or transcription.
  • the detection of the group if the detection of the target region or a group of genes is to be carried out, the sample is subjected to enrichment of the target gene segment in advance.
  • the current enrichment methods include multiplex PCR and microfluidic digital PCR for limited gene regions, while large-scale gene region methods mainly use high-density probe sequences covering the target region to perform solid-phase or liquid phase hybridization with samples.
  • the target area is enriched.
  • These enrichment techniques are mainly used for mutation detection of candidate genes, but since these enrichment processes to some extent eliminate the proportional relationship between the product and the original template amount, it is impossible to accurately quantify the enriched candidate gene fragments, such as Expression amount and copy number analysis.
  • the main object of the present invention is to provide a high-throughput genetic analysis method and its use.
  • a high throughput nucleic acid analysis method comprising the steps of:
  • the column corresponds to the sequence of the sequencing primer, where n is a positive integer of 3 ⁇ 440;
  • each probe ligation product 3 'and 5' are both universal sequence regions whose sequences correspond to sequencing primer sequences;
  • the sequencing primer is a sequencing primer for a high throughput single molecule or single molecule amplification cluster sequencing platform.
  • n is a positive integer of 100, preferably: a positive integer selected from 1000 to 10000.
  • sequence of the universal sequence region corresponds to the sequencing primer sequence representation: the sequence of the universal sequence region is identical to the sequencing primer sequence or at least 8 bp identical, or the sequence of the universal sequence region is completely complementary to the sequencing primer sequence or At least 8 bp complementary.
  • the specific probe further has one or more characteristics selected from the group consisting of:
  • the specific probe has a length of 100 bp, preferably 30 to 70 bp, and more preferably 40 to 50 bp.
  • the specific binding region of the specific probe has a length of 50 bp, preferably 15-35 bp, more preferably 20-25 bp.
  • the universal probe region of the specific probe has a length of 5 ⁇ 8 bp, preferably 15-35 bp, more preferably 20-25 bp.
  • sequence of the universal sequence region of the specific probe also corresponds to the amplification primer sequence
  • the specific probe includes a tag sequence.
  • the tag sequence is a sequence of one (preferably three to one, more preferably six to nine) specific bases for distinguishing probe ligation products from different sample sources.
  • the two probes corresponding to each of the nucleic acid fragments of interest are: a 5'-end probe and a 3'-end probe, and the 5'-end probe is capable of interacting with the nucleic acid fragment of interest 3 to be analyzed.
  • the binding ends of the 'ends are complementary, and the 3'-end probes are complementary to the binding region located at the 5' end of the nucleic acid fragment of interest to be analyzed.
  • the structure of the 5' end probe or the 3' end probe is as shown in Formula I:
  • A represents a universal sequence region
  • L represents the nucleic acid linking sequence of A and B;
  • the A and B positions can be interchanged.
  • the L is 0 bases.
  • the relationship between the 5' end probe and the 3' end probe is selected from one or more of the following groups:
  • the 5' end probe and the 3' end probe are in close proximity to the probe: the 5' end probe and the 3' end probe are hybridized to the nucleic acid fragment of interest to be analyzed, and the distance between the two is 0. a base, which is ligated under the action of a ligase to obtain a probe ligation product;
  • the 5' end probe and the 3' end probe are separated by 1-500 bases: the 5' end probe and the 3' end probe are hybridized to the target nucleic acid fragment to be analyzed, and the DNA polymerase is Performing gap polymerization and ligation under the action of a ligase to obtain a probe ligation product;
  • the hybrid system includes probe 3, probe 3 and 5' probe, in addition to the 5' probe and the 3' probe. Immediately after hybridization with the 3' end probe, the three probes are hybridized with the nucleic acid fragment of interest to be analyzed, and ligated under the action of a ligase to obtain a probe ligation product.
  • the probe 3 has a length of from 1 to 500 bp, preferably from 15 to 35 bp, more preferably from 20 to 25 bp.
  • the 5' end of the 3' end probe described in (a) is phosphorylated.
  • the 3' end of the 3' end probe described in (a) and the 5' end of the 5' end probe are protected against exonuclease modification.
  • the anti-exonuclease modification is a thio modification.
  • the 5' end probe and the 3' end probe are preferably at a distance of from 1 to 10 bases.
  • the DNA polymerase has no 5 '-3' exonuclease activity.
  • the step (2) and the step (3) further comprise the step of: amplifying the obtained probe ligation product of the step (2).
  • step (3) the mixture of probe ligation products obtained in step (2) is directly sequenced using a high-throughput single molecule or single molecule amplification cluster sequencing platform; or probe Amplification products of the mixture of ligation products were sequenced using a high throughput single molecule or single molecule amplification cluster sequencing platform.
  • a mixture of probe ligation products or an amplification product thereof is sequenced and analyzed using a third generation sequencing technique or a second generation sequencing technique.
  • the information for obtaining the nucleic acid of interest refers to one or more information selected from the group consisting of: SNP typing information, DNA methylation information, mutation screening information , CNP classification information, CNV information, pathogenic microbial gene information, genetic information of transgenic animal and plant products, gene expression levels.
  • a high throughput SNP typing method comprising the steps of: sequencing and SNP analysis of a mixture of probe ligation products derived from a sample to be tested using the method described in the first aspect, Obtain SNP typing information of the nucleic acid of interest.
  • the high-throughput SNP typing method includes the steps of:
  • nucleic acid fragments to be analyzed For each of the n nucleic acid fragments to be analyzed, for each nucleic acid fragment of interest, three specific probes that bind to different binding regions of the nucleic acid fragment of interest are provided: two 5'-end probes and one 3 a 'end probe, the 5' end probe is an allele-specific probe, and the last base corresponds to a corresponding allele base, and the 3' end probe is a shared probe, wherein n is a positive integer of 40;
  • each probe ligation product 3 'and 5' are both universal sequence regions whose sequences correspond to sequencing primer sequences;
  • step (3) Using the sequencing primers, sequencing and analyzing the mixture of the probe ligation products of the step (2) to obtain SNP typing information of the target nucleic acid.
  • a method for detecting CNV comprising the steps of: sequencing and CNV analysis of a mixture of probe ligation products derived from a sample to be tested using the method of the first aspect to obtain a nucleic acid of interest CNV information.
  • the method for detecting a CNV includes the steps of:
  • a high throughput methylation analysis method comprising the steps of: sequencing and methylating a mixture of probe ligation products derived from a sample to be tested using the method described in the first aspect; Analysis, obtaining methylation information of the nucleic acid of interest.
  • the high-throughput methylation analysis method comprises the steps of: treating a genomic DNA with a methylation-sensitive restriction endonuclease, designing a probe at a tangent point, using the claim 1 The method described detects the amount of genomic DNA that has not been cleaved.
  • the high-throughput methylation analysis method comprises the steps of: sulfating the genomic DNA, and designing a methylation-specific probe and a non-methylation-specific probe for the target gene fragment, respectively.
  • the methylation level of the gene segment of the gene is obtained by detecting the amount of the ligation product of the two probes.
  • a gene expression detecting method comprising the steps of: performing the detection using the method described in the first aspect.
  • Figure 1 shows the technical idea 1 for high throughput measurement in a specific embodiment of the invention.
  • Figure 2 shows a technical idea 2 for high throughput measurement in a specific embodiment of the invention.
  • Figure 3 shows the flow of high-throughput SNP typing based on single-molecule direct or post-amplification sequencing for high-throughput SNP typing.
  • Figure 4 shows the flow of high-throughput CNV detection based on single-molecule direct or post-amplification sequencing for high-throughput CNV detection.
  • Figure 5 shows the flow of high-throughput ligated product detection technology based on single-molecule direct or post-amplification sequencing for high-throughput target mutation screening.
  • Figure 6 shows the flow of high-throughput cloning product detection techniques based on single-molecule direct or post-amplification sequencing for high-throughput candidate gene expression analysis.
  • Figure 7 shows the flow of high-throughput cloning product detection techniques based on single-molecule direct or post-amplification sequencing for high-throughput gene methylation level analysis.
  • Fig. 8 shows the results of detection of exon deletion of DMD gene in Example 2.
  • the inventors have extensively and intensively studied the first time using the high specificity of the multiplexed probe amplification technique and The good preservation characteristics of the quantity information of the target fragment, using the next-generation high-throughput sequencing technology platform to sequence and identify the amplified products of the ligation probe, thereby realizing the quantitative analysis of high-throughput target gene fragments.
  • the present invention has been completed on this basis.
  • the method comprises the steps of: providing, for each of the n nucleic acid fragments to be analyzed, at least two specific probes bound to different binding regions of the nucleic acid fragment of interest for each of the n nucleic acid fragments to be analyzed, the specific probes Having a specific binding region and a universal sequence region, and the sequence of the specific binding region is complementary to the sequence of the binding region of the nucleic acid fragment of interest, and the sequence of the universal sequence region corresponds to the sequencing primer sequence, wherein n is a positive integer of 40 Nucleic acid sample containing the nucleic acid fragment of interest to be analyzed is hybridized with the probe, and the probe is ligated to obtain a mixture of probe ligation products, wherein the 3' and 5' ends of each probe ligation product are The sequence corresponds to a universal sequence region of the sequencing primer sequence; the sequencing primer is used to sequence and analyze the mixture of probe ligation products to achieve quantitative analysis of the high-throughput target gene fragment.
  • Multiple Linked Probe Amplification ML
  • Multiplex probe amplification is a technique for accurately detecting the number of molecules of a target gene.
  • the basic procedure involves hybridization of the probe to the target nucleic acid sequence, followed by ligation, PCR amplification, product capillary electrophoresis and data collection. Analysis of the collected data leads to conclusions.
  • An MLPA probe is an oligonucleotide fragment comprising a primer sequence and a specific sequence.
  • both hybridize to the target sequence after which the two-part probe is ligated using a ligase.
  • the ligation reaction is highly specific, and only when the two probes are completely hybridized to the target sequence, ie, the target sequence is fully complementary to the probe-specific sequence, the ligase can join the two probes into a single single strand of nucleic acid; The sequence is not completely complementary to the probe sequence, and even if there is only one base difference, the hybridization is incomplete, the ligation reaction cannot be performed or the ligation efficiency is greatly reduced.
  • the ligated probes are amplified by a pair of universal primers, and the length of the amplified product of each probe is unique, ranging from 100 to 480 base pairs, and then separated and amplified by capillary electrophoresis.
  • the amplification peak of the corresponding probe will be collected. If the target sequence is detected to have a point mutation or a deletion or an amplification mutation, the amplification peak of the corresponding probe will be Deletion, decrease or increase, therefore, it is possible to judge whether the target sequence has a copy number abnormality or a point mutation according to the change of the amplification peak.
  • the advantage of the multiplex ligation probe amplification technique is that the probe ligation is highly specific, so that multiple target gene fragments can be simultaneously analyzed in one system, and there is a direct relationship between the amount of the ligation product and the original template amount.
  • the ligation products of different gene fragments are amplified by universal primers, the amount of the amplified product well retains the information of the amount of the original template, and the amount of the original template target gene can be detected by analyzing the terminal of the PCR product.
  • Multiplex probe amplification has been used in a variety of fields, including chromosomal aneuploidy changes, SNPs, point mutations, gene rearrangements in chromosome subtelomers, and detection of common childhood genetic diseases.
  • the shortcomings of this method are mainly as follows: 1.
  • the ligation products usually have different lengths, and a pair of universal PCR fluorescent primers are used for amplification, and the amplification amount of different sites is determined by electrophoresis according to the length of the fluorescent labeled PCR product. This greatly limits the number of detection sites in a reaction system, and can only detect 40 to 50 nucleotide sequences, and the flux is low.
  • the linker probe sequence is usually very long (>1001 ⁇ ) and cannot be directly synthesized.
  • connection probe sequence is very long, and the connection probe and the length of the ligation product at different sites can be up to several hundred bases, so the connection efficiency and amplification efficiency between different sites will be There are large differences and fluctuations that affect the accuracy of the detection.
  • the present invention provides a high throughput genetic analysis method.
  • the technical idea of this method is as follows:
  • the first method is to design two adjacent probes (probe 1 and probe 2) for each target fragment, One is the 5' end probe (ie probe 1) and the other is the 3' end probe (ie probe 2).
  • the 5'-end probe first half sequence (a of probe 1) is a universal sequence consistent with subsequent PCR amplification primers, while the latter half (M of probe 1) is a specific sequence hybridized to the nucleic acid fragment of interest.
  • the 5' end of the 3 'end probe is phosphorylated, the first half (M of probe 2) is the specific sequence that hybridizes to the nucleic acid fragment of interest, and the second half (a of probe 2) is the subsequent PCR amplification primer phase. Consistent universal sequence. After the two probes hybridize to the template DNA, they are ligated under the action of a ligase.
  • the second method also designs two probes (probe 1 and probe 2).
  • the structure of the probe is the same as method 1, but there are several to several tens of base distances between the two probes; 1-500 bp, preferably l-10 bp), after hybridization of the probe to the template DNA, extension of the polymerase without 5 '->3 ' exonuclease activity, complementing the gap between the two probes On, and ligated under the action of a ligase.
  • the third method is to design three pairs of probes (probe 1, probe 2 and probe 3), and the 5' and 3' probes (probe 1 and probe 2) have the same structure as method 1, but There are tens to hundreds of base distances between the two probes (preferably
  • the 5' end of the intermediate probe (probe 3) is phosphorylated, exactly matching the gap between the 5' and 3' probes, and the three probes are hybridized with the template DNA and then ligated with ligase. connection.
  • a high temperature thermostable polymerase such as Taq DNA ligase.
  • the PCR primer has a tag sequence (ie, index) of several to tens of bases in length, and the ligation products of different samples can be amplified by PCR primers with different tag sequences, so that the amplification products of different samples are obtained.
  • the sequences can be mixed together and the sequencing sequences are sorted into different samples based on the tag sequence in subsequent sequencing data.
  • the first method is to design two adjacent probes (probe 1 and probe 2), one is 5 'end probe Needle (probe 1), another 3' end probes (probe 2).
  • the 5'-end probe pre-sequence is a universal sequence that matches the next-generation sequencing platform amplification primers or sequencing primers, while the second half is the specific sequence that hybridizes to the nucleic acid fragment of interest, and the 5' end of the 3'-end probe is phosphated.
  • the first half is the specific sequence that hybridizes to the nucleic acid fragment of interest
  • the second half is the universal sequence that matches the amplification primers or sequencing primers of the next-generation sequencing platform, and the 5' end of the 5' end probe is performed at several bases.
  • Thio modification or other protecting group modification is protected from envirozyme exonuclease, and several bases at the 3' end of the 3' end probe are subjected to thio modification or other protecting group modification from accounting exonuclease degradation.
  • the probes are hybridized to the template DNA and ligated under the action of a ligase.
  • the second method also designs two probes, the probe structure is the same as the method one, but there are several to several tens of base distances between the two probes (the distance can be selected from l-500 bp, preferably l-10 bp). )), after the probe hybridizes to the template DNA, it is extended by a polymerase without 5 '->3 ' exonuclease activity, the two probe gaps are complemented, and then ligated by ligase.
  • the third method is to design 3 pairs of probes, and the 5' end and 3' end probe structures are the same as method 1, but there are tens to hundreds of base distances between the two probes (preferably 20- 25bp), the 5' end of the intermediate probe is acidified, matching the gap between the 5' and 3' probes.
  • the 5' end or the 3' end probe is added with a sequence of labels of several to tens of bases in length, and the concatenation products of different samples have different label sequences, so that the concatenation products of different samples can be mixed together.
  • the sequencing sequence can be classified into different samples according to the tag sequence.
  • the three probes are hybridized to the template DNA and ligated under the action of a ligase.
  • a denaturation-refolding-ligation multiple cycle can be carried out using a high temperature thermostable polymerase such as Taq DNA ligase.
  • the ligation reaction product is co-digested with various exonuclease enzymes such as exonuclease III exonuclease III and lamda exonuclease to treat all non-ligated products.
  • exonuclease III exonuclease III
  • lamda exonuclease to treat all non-ligated products.
  • Non-amplified ligation products are directly subjected to single-molecule amplification sequencing or direct single-molecule sequencing using next-generation high-throughput chip sequencing platforms.
  • the term "primer” refers to a generic term for an oligonucleotide that is complementary to a template and which synthesizes a DNA strand complementary to the template in the action of DNA polymerase.
  • the primer may be natural RNA, DNA, or any form of natural nucleotide, and the primer may even be a non-natural nucleotide such as LNA or ZNA.
  • the primer is “substantially” (or “substantially") complementary to a particular sequence on a strand on the template.
  • the primer must be sufficiently complementary to a strand on the template to begin extension, but the sequence of the primer does not have to be fully complementary to the sequence of the template. For example, a sequence that is not complementary to the template is added to the 5' end of the primer complementary to the template at a 3' end, such primers are still substantially complementary to the template. As long as there are sufficiently long primers to bind well to the template, the non-fully complementary primers can also form a primer-template complex with the template for amplification.
  • primers include, but are not limited to, degenerate primers, sequencing primers, linker primers, and the like.
  • degenerate primers include, but are not limited to, degenerate primers, sequencing primers, linker primers, and the like.
  • sequencing primers include, but are not limited to, sequencing primers, linker primers, and the like.
  • linker primers include, but are not limited to, linker primers, and the like.
  • General practitioners can use conventional methods for primer design and optimization. High-throughput sequencing
  • the "re-sequencing" of the genome enables humans to detect abnormal changes in disease-associated genes as early as possible, and contributes to in-depth research on the diagnosis and treatment of individual diseases.
  • Solexa high-throughput sequencing includes two steps: DNA cluster formation and on-machine sequencing: a mixture of PCR amplification products is hybridized with a fixed sequencing probe immobilized on a solid phase carrier, and subjected to solid phase bridge PCR amplification to form a sequencing cluster; The sequencing cluster is sequenced by "edge synthesis-edge sequencing" to obtain the nucleotide sequence of the disease-associated nucleic acid molecule in the sample.
  • the DNA cluster is formed by using a flow cell with a single-stranded primer attached to the surface.
  • the DNA fragment of the single-stranded state is fixed by the principle of complementary pairing of the primer sequence with the primer on the surface of the chip.
  • the amplified single-stranded DNA becomes double-stranded DNA by amplification reaction, and the double strand is denatured into a single strand, one end of which is anchored on the sequencing chip, and the other end is randomly complementary to another primer in the vicinity to be anchored.
  • the DNA clusters were sequenced on the Solexa sequencer while sequencing.
  • the four bases were labeled with different fluorescence, and each base was blocked by a protected base. Only one base could be added to a single reaction. After reading the color of the reaction, the protection group is removed, and the next reaction can be continued. Thus, the exact sequence of the base is obtained.
  • Index (; tag or barcode) is used to distinguish the samples, and after the conventional sequencing is completed, an additional 7 cycles of sequencing for the Index portion can be performed. Up to 1000 different samples can be distinguished in the sequencing ramp.
  • the invention also provides for the use of the high throughput genetic analysis method.
  • SNPs are detected using the methods of the invention, and each reaction can detect hundreds or even thousands of SNP sites.
  • steps are as follows ( Figure 3):
  • Each SNP site is preferably designed with 3 probes, 2 5' allele-specific probes and 1 3'-end shared probe, the last base of each allele-specific probe.
  • the base corresponds to the corresponding allele base, and in order to increase the specificity of the linkage, changing the base at one of the last 2-4 positions of the probe introduces an additional mismatch to increase the specificity of the linkage;
  • the ligation product is PCR-amplified, or directly amplified by enzyme enzymatic digestion, and the amplified products of different samples are mixed for next-generation high-throughput chip sequencing;
  • the CNV is detected using the method of the present invention, and each reaction can detect hundreds or even thousands of gene fragments of interest.
  • the steps are as follows ( Figure 4):
  • Each reaction system contains at least one reference gene fragment, and the reference gene fragment is a gene fragment that is considered to have no copy number polymorphism in the detected species population, and is used to correct sampling differences of different samples;
  • each probe gene or reference gene fragment is designed with two probes, one 5' end probe and one 3' end probe;
  • the ligation product is directly digested with nuclease by PCR amplification or non-amplification, and the amplification products of different samples are mixed and then subjected to next-generation high-throughput chip sequencing;
  • Sequencing data analysis The number of detections of each target gene corresponding to the ligated product is divided by the number of detections of the reference gene fragment ligated product to obtain a correction value R as shown in the figure N T1 /N R1 , and then the R value is divided by the reference sample. The R value is obtained after the R value. If there are more than one reference gene, an RR value is calculated for each reference gene segment, and then the median is the relative copy value of the target gene, and the value is multiplied by the reference. The copy number of the sample is the copy number of the target gene of the sample as shown in the figure CN T1 . Target gene mutation screening
  • the steps are as follows: Since the ligated probe corresponds to the DNA template, if the mutation occurs, the connection efficiency is seriously reduced, and the high density flat is designed for the target region.
  • the probe is probed, and the copy number of each probe region is obtained by the detection step and the data analysis method of CNV detection.
  • the probe region with the copy number deviating from the normal value can be used as a candidate region with a mutation site, which can be performed by conventional sequencing. verification. Analysis of multiple candidate gene expression levels
  • each reaction can detect expression levels of hundreds or even thousands of genes of interest.
  • the steps are as follows: Multiple probes can be designed for each gene, and the expression ratio of different splicing bodies can be distinguished, and the cDNA obtained by reverse transcription or the RNA can be directly used as a template for probe connection.
  • the ligation product is amplified and subjected to next-generation high-throughput chip sequencing. The sequencing results were analyzed. The number of ligation products in each gene region was corrected by multiple reference genes, and the median was used as the relative expression level of the gene, which was used to analyze the difference in expression level of the gene between different samples.
  • Methylation is analyzed using the method of the invention, and each reaction can detect methylation levels of hundreds or even thousands of CpG islands.
  • the method is as follows ( Figure 7):
  • One method is to treat the genomic DNA with a methylation-sensitive restriction endonuclease, and design the probe to detect the amount of uncut genomic DNA at the point of tangency; the other method is to subject the genomic DNA to sulfite treatment. Thereafter, a methylation-specific probe and a non-methylation-specific probe are designed for the target gene fragment, and the methylation level of the target gene segment is estimated by detecting the amount of the junction product of the two probes.
  • the probe ligation product was sequenced for the next generation of high throughput microarrays to obtain the amount of each probe ligation product.
  • the first method it is necessary to select all methylated or hemimethylated regions present in the genome as reference DNA fragments, and select a sample which has not been subjected to restriction endonuclease treatment as a reference sample.
  • a reference DNA sample is required, and the methylation ratio of the DNA sample in all target gene regions is known, and the whole sample can be prepared by using the whole gene amplification product and the methylated modified whole genome.
  • the amplified products are mixed in a certain ratio, usually a 1:1 mixture to obtain a reference sample with a 50% methylation ratio.
  • Pathogenic microorganisms or transgenic plants and animals are identified using the methods of the invention, and each reaction can detect hundreds or even thousands of species-specific gene fragments.
  • probes are designed for each microorganism or transgene, and probes are also designed for incorporation of reference gene fragments.
  • the probe ligation product is used for next generation high throughput chip sequencing. For the amount of each probe ligation product, the incorporation of the reference gene fragment was corrected to confirm the pathogenic microorganism species contained in the test sample and the type of the transgenic crop.
  • the present invention uses sequencing to identify the ligation product, and uses digital counting for quantification, and there is no non-specific hybridization and detection background influence, thereby greatly improving accuracy;
  • the length of all the ligation products of the present invention is relatively uniform, and the difference in amplification efficiency between different fragments is relatively small when the universal primer is used for amplification, compared with capillary electrophoresis which uses different lengths to distinguish the ligation products, in this technique,
  • the ratio of each ligation product in the amplified product is more consistent with the ratio before amplification;
  • the ligation probes were designed for 48 SNP sites, each designed with 3 probes, 2 5' allele-specific probes and 1 3' consensus sequence, and the first half of the 5' probe. Partially coupled to a universal PCR sequence compatible with the mumina second generation sequencing platform, and the second half of the 5' end probe is coupled to another universal PCR sequence compatible with the Illumina second generation sequencing platform.
  • the probes are ligated under the action of Taq DNA ligase in good pairing with the template.
  • the ligation products are amplified by universal PCR primers compatible with the Illumina second-generation sequencing platform, and different samples are amplified by universal primers with different tag sequences.
  • Sequencing reads are read by the software to distinguish different sample sources according to the tag sequence, then determine which connection product each Sequencing read source is, and perform the READS statistics for each connected product. Genotyping was performed based on the proportion of Sequencing reads of the two allele-specific ligation products.
  • the sample was obtained from a routine blood test of a normal individual in Shanghai Ruijin Hospital.
  • the whole blood sample was extracted with phenol chloroform.
  • the DNA was dissolved in 1XTE.
  • the PCR primer mixes Pmixl, Pmix2 and Pmix3 were respectively composed of NGMPCRF and NGMPCRR001, NGMPCRF and NGMPCRR002, NGMPCRF and NGMPCRR003, each primer concentration was 2 ⁇ ; the ligation product ⁇ was used as a template for PCR reaction, the reaction system was 20 ⁇ 1, including 2 ⁇ 1 10x PCR buffer, 2 ⁇ 1 2.5mM dNTP mix, 2 ⁇ 1 Pmixl for SI (; or Pmix2 for S2, or Pmix3 for S3), ⁇ ⁇ Ligation product, 0.2 ⁇ 1 5 ⁇ / ⁇ 1 Taq DNA polymerase, 12.8 ⁇ 1 Milli-Q water; PCR program For: 95 °C 5min ; 8x (94 °C 20s, 54 °C 40s, 72 °C lmin); 26x (94 °C 20s, 68 °C 1.5min); hold at 4 °C;
  • the amplification efficiency is detected by electrophoresis, and then the three PCR products are uniformly mixed according to the product concentration, and the electrophoresis separation is used for tapping.
  • the purified product was quantified and the number of molecules was estimated, and then mixed with other project samples, and then bridged on-chip amplification according to the TmSeq SR Cluster Kit v2 requirement;
  • the amplified product was sequenced with 1x72-7 in Illumina GAIIX using TmSeq SBS Kit v5. Instrument control and data acquisition were performed using Genome Analyzer Data Collection Software SCS2.8, and the selected recipe was GA2-PEM-MP-7+7Cycle-v ⁇ # >;
  • sequenced reads are divided into different samples according to the tag sequence, and then aligned with the expected ligation product libraries; each read is identified as an allelic ligation product, and each allele ligation product is calculated. number;
  • the genotype of the locus is determined according to the ratio of the number of sequencing reads of the two linked products at each locus and the proportional distribution of the different samples: If the specificity of the ligation is strong, one of the allele ligation products is 10 times or more of the other. Below 10, it is usually directly judged to be homozygous for the dominant Allele, if it is not possible to compare in multiple samples to see if there is a clustering phenomenon (for example, divided into 3 categories, that is, corresponding to 3 genotypes).
  • the universal primer sequences used in this example are as follows:
  • NGMPC F SEQ ID NO: 1
  • NGMPCRR001 SEQ ID NO: 2
  • NGMPC R002 (SEQ ID NO: 3)
  • NGMPCRR003 (SEQ ID NO: 4)
  • Each sample is designed with 141 probes, of which 129 are distributed on 79 exons of the DMD gene, 6 reference gene probes, and 6 personality chromosome sex identification probes (3 in X Chromosomes, 3 on the Y chromosome). Two probes are designed for each locus, one 5' end probe and one 3' end probe, and the 5' end probe first half sequence is a universal sequence consistent with subsequent PCR amplification primers, and the latter half.
  • the 5' end of the 3'-end probe is phosphorylated, the first half is the specific sequence that hybridizes to the nucleic acid fragment of interest, and the second half is the universal sequence consistent with the subsequent PCR amplification primers.
  • the probes were ligated under the action of Taq DNA ligase in good pairing with the template, and the ligation products were amplified using universal PCR primers compatible with the Illumina second generation sequencing platform. Different samples were amplified with universal primers with different tag sequences, then uniformly mixed and purified and subjected to 1x72+7 sequencing on an Illumina GAIIx sequencer. Sequencing data was analyzed for subsequent analysis.
  • Sample preparation 2 patients with pseudohypertrophic muscular dystrophy (Pl, P2), 1 female carrier (P3) and 1 normal sample (P4) each taking 2 ml of whole blood, using traditional phenol chloroform method to extract whole blood DNA was used in subsequent experiments.
  • NGMPCRR003 is the same as in Example 1, and the NGMPCRR004 sequence is as follows (SEQ ID NO: 5):
  • AATTAG is a tag sequence used for sequencing with the Ilhmiina second generation sequencer to distinguish sequencing data from different samples.
  • PCR primer mixes Pmixl, Pmix2, Pmix3 and Pmix4 respectively It consists of NGMPCRF and NGMPCRROOl, NGMPCRF and NGMPCRR002, NGMPCRF and NGMPCRR00, NGMPCRF and NGMPCRR004, each primer concentration is 2 ⁇ ;
  • the ligation product ⁇ is used as a template for PCR reaction, the reaction system is 20 ⁇ , including 2 ⁇ 1 10x PCR buffer, 2 ⁇ 1 2.5mM dNTP mix, 2 ⁇ 1 Pmixl for PI (or Pmix2 for P2, Pmix3 for P3, Pmix4 for P4), ⁇ ⁇ ligated product, 0.2 ⁇ 1 5 ⁇ / ⁇ 1 Taq DNA polymerase, 12.8 ⁇ 1 Water;
  • the PCR program was: 95 ° C 5min; 8x ( 94 ° C 20s, 54 ° C 40s, 72 ° C lmin); 26x (94 ° C 20s, 68 ° C 1.5min); hold at 4 ° C;
  • the first correction value (R) is obtained by dividing the detection quantity of the corresponding ligated product of each target gene by the detection quantity of the reference gene fragment ligated product, and then dividing the R value by the R value of the reference sample to obtain a second correction.
  • Value (RR) calculate an RR value for each reference gene fragment, a total of 6 RR values, and then take the median, since the reference sample is a normal male individual, the DMD gene and the gene fragment on the X, Y chromosome The copy number is 1, so that the median is the copy number of the corresponding gene fragment of the test sample.

Abstract

L'invention concerne un procédé pour l'analyse d'un acide nucléique à rendement élevé et son application. Parmi n fragments d'acides nucléiques à analyser, pour chaque fragment d'acide nucléique cible, au moins deux sondes spécifiques reliées à différents domaines de liaison du fragment d'acide nucléique cible sont utilisées. Chaque sonde spécifique comprend un domaine de liaison spécifique et un domaine de séquence universelle, une séquence du domaine de liaison spécifique et une séquence du domaine de liaison du fragment d'acide nucléique cible sont complémentaires, et une séquence du domaine de séquence universelle correspond à une séquence d'une amorce de séquençage.
PCT/CN2013/089131 2012-12-27 2013-12-11 Procédé pour l'analyse d'un acide nucléique à rendement élevé et son application WO2014101655A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210581830.9 2012-12-27
CN201210581830.9A CN103898199B (zh) 2012-12-27 2012-12-27 一种高通量核酸分析方法及其应用

Publications (1)

Publication Number Publication Date
WO2014101655A1 true WO2014101655A1 (fr) 2014-07-03

Family

ID=50989765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/089131 WO2014101655A1 (fr) 2012-12-27 2013-12-11 Procédé pour l'analyse d'un acide nucléique à rendement élevé et son application

Country Status (2)

Country Link
CN (1) CN103898199B (fr)
WO (1) WO2014101655A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112501255A (zh) * 2020-12-03 2021-03-16 昂凯生命科技(苏州)有限公司 一种多位点miRNA联合检测方法
EP3674413A4 (fr) * 2018-01-26 2021-06-09 Amoy Diagnostics Co., Ltd Sonde et procédé destinés à une région cible de capture ciblée par le séquençage à haut débit utilisés pour la détection de mutations de gène ainsi que de types de fusion de gène connus et non connus
WO2022007863A1 (fr) * 2020-07-07 2022-01-13 天昊基因科技(苏州)有限公司 Procédé d'enrichissement rapide d'une région de gène cible

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105803055A (zh) * 2014-12-31 2016-07-27 天昊生物医药科技(苏州)有限公司 一种基于多重循环延伸连接的靶基因区域富集新方法
CN106148482B (zh) * 2015-03-24 2019-12-03 深圳华大智造科技有限公司 一种适用于小型测序仪的测序方法
CN106326689A (zh) * 2015-06-25 2017-01-11 深圳华大基因科技服务有限公司 确定群体中受到选择作用的位点的方法和装置
CN106591425A (zh) * 2015-10-15 2017-04-26 北京寻因生物科技有限公司 基于连接反应的多重靶向检测核酸指标的方法
CN105695577B (zh) * 2016-03-02 2019-03-19 上海易毕恩基因科技有限公司 微量DNA中甲基化CpG岛高通量测序方法
CN105734130B (zh) * 2016-03-11 2019-10-25 中国农业科学院生物技术研究所 全基因组范围的植物基因功能鉴定的方法
CN106834452A (zh) * 2017-01-13 2017-06-13 天昊生物医药科技(苏州)有限公司 微量基因组dna的探针连接扩增检测方法
CN108573127B (zh) * 2017-03-14 2021-04-27 深圳华大基因科技服务有限公司 一种核酸第三代测序原始数据的处理方法及其应用
CN108048528A (zh) * 2017-12-20 2018-05-18 栾图 简单高效实时的遗传信息获取方法及其应用
CN107858411B (zh) * 2017-12-22 2021-07-13 美因健康科技(北京)有限公司 基于高通量测序的三段式探针扩增方法
CN110241177A (zh) * 2019-04-19 2019-09-17 上海三誉华夏基因科技有限公司 基于杂交、延伸连接反应的核酸捕获文库制备方法
CN110241191A (zh) * 2019-06-28 2019-09-17 中国人民解放军第四军医大学 一种基于NGS同时检测mtDNA拷贝数和突变的方法
CN110541025B (zh) * 2019-07-31 2021-05-14 中信湘雅生殖与遗传专科医院有限公司 杜氏肌营养不良基因缺陷的检测方法、引物组合物及试剂盒
CN110904220B (zh) * 2019-12-24 2023-07-25 圣湘生物科技股份有限公司 检测cyp2d6基因多态性和拷贝数的组合物、试剂盒及方法
CN112522381A (zh) * 2020-12-07 2021-03-19 苏州赛美科基因科技有限公司 一种同时检测基因突变与拷贝数变化的高通量方法
CN113106090B (zh) * 2021-06-16 2021-09-17 苏州拉索生物芯片科技有限公司 一种生物芯片的带有复合标签序列的探针及生物芯片
CN113913521A (zh) * 2021-11-05 2022-01-11 复旦大学 一种基因分型检测试剂盒及其应用
CN114277095A (zh) * 2021-12-27 2022-04-05 上海市肺科医院 一种检测基因变异的核苷酸组合物及其构建的高通量测序文库

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395280A (zh) * 2006-03-01 2009-03-25 凯津公司 基于测序的高通量SNPs连接检测技术
CN101921874A (zh) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 基于Solexa测序法的检测人类乳头瘤病毒的方法
CN102409101A (zh) * 2011-11-30 2012-04-11 上海翼和应用生物技术有限公司 通用荧光标记探针的pcr-ldr基因多态性测序方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070092883A1 (en) * 2005-10-26 2007-04-26 De Luwe Hoek Octrooien B.V. Methylation specific multiplex ligation-dependent probe amplification (MS-MLPA)
DK2414544T3 (da) * 2009-04-01 2014-06-16 Dxterity Diagnostics Inc Probe-amplifikations-afhængig kemisk ligering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395280A (zh) * 2006-03-01 2009-03-25 凯津公司 基于测序的高通量SNPs连接检测技术
CN101921874A (zh) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 基于Solexa测序法的检测人类乳头瘤病毒的方法
CN102409101A (zh) * 2011-11-30 2012-04-11 上海翼和应用生物技术有限公司 通用荧光标记探针的pcr-ldr基因多态性测序方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3674413A4 (fr) * 2018-01-26 2021-06-09 Amoy Diagnostics Co., Ltd Sonde et procédé destinés à une région cible de capture ciblée par le séquençage à haut débit utilisés pour la détection de mutations de gène ainsi que de types de fusion de gène connus et non connus
WO2022007863A1 (fr) * 2020-07-07 2022-01-13 天昊基因科技(苏州)有限公司 Procédé d'enrichissement rapide d'une région de gène cible
CN112501255A (zh) * 2020-12-03 2021-03-16 昂凯生命科技(苏州)有限公司 一种多位点miRNA联合检测方法

Also Published As

Publication number Publication date
CN103898199B (zh) 2016-12-28
CN103898199A (zh) 2014-07-02

Similar Documents

Publication Publication Date Title
WO2014101655A1 (fr) Procédé pour l'analyse d'un acide nucléique à rendement élevé et son application
US20190024141A1 (en) Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers
US9133516B2 (en) Methods for identification of alleles using allele-specific primers for amplification
Zilberman et al. Genome-wide analysis of DNA methylation patterns
JP3693352B2 (ja) プローブアレイを使用して、遺伝子多型性を検出し、対立遺伝子発現をモニターする方法
US20140051585A1 (en) Methods and compositions for reducing genetic library contamination
EP1256632A2 (fr) Criblage à haut rendement de polymorphismes
JP2014502513A (ja) ペアエンドランダムシーケンスに基づく遺伝子型解析
US10465241B2 (en) High resolution STR analysis using next generation sequencing
WO2013192292A1 (fr) Analyse de séquence d'acide nucléique spécifique d'un locus multiplexe massivement parallèle
US11261479B2 (en) Methods and compositions for enrichment of target nucleic acids
WO2013106807A1 (fr) Caractérisation échelonnable d'acides nucléiques par séquençage parallèle
WO2014028778A1 (fr) Procédés et compositions pour la réduction de la contamination d'une banque génétique
US11111514B2 (en) Method for multiplexed nucleic acid patch polymerase chain reaction
Qiu et al. Mitochondrial single nucleotide polymorphism genotyping by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry using cleavable biotinylated dideoxynucleotides
WO2013085026A1 (fr) Procédé et kit de détection de mutations nucléotidiques
CN110938681A (zh) 等位基因核酸富集和检测方法
Cheishvili et al. Targeted DNA methylation analysis methods
US9909170B2 (en) Method for multiplexed nucleic acid patch polymerase chain reaction
Van Cauwenbergh et al. Genetic testing techniques
Park et al. DNA Microarray‐Based Technologies to Genotype Single Nucleotide Polymorphisms
Gallardo et al. Application to Assisted Reproductive of Whole-Genome Treatment Technologies
Plongthongkum Probing Interaction of Genome and Methylome by Targeted Bisulfite Sequencing
Tucker et al. Massively parallel sequencing
Maynard Allele-specific gene regulation in humans

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13869022

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13869022

Country of ref document: EP

Kind code of ref document: A1