CN106282161B - Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application - Google Patents

Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application Download PDF

Info

Publication number
CN106282161B
CN106282161B CN201610662853.0A CN201610662853A CN106282161B CN 106282161 B CN106282161 B CN 106282161B CN 201610662853 A CN201610662853 A CN 201610662853A CN 106282161 B CN106282161 B CN 106282161B
Authority
CN
China
Prior art keywords
dna
primer
sequencing
sequence
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610662853.0A
Other languages
Chinese (zh)
Other versions
CN106282161A (en
Inventor
徐凯
罗德伦
唐放
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Nuon Gene Technology Co ltd
Original Assignee
Chengdu Nuon Gene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Nuon Gene Technology Co ltd filed Critical Chengdu Nuon Gene Technology Co ltd
Priority to CN201610662853.0A priority Critical patent/CN106282161B/en
Priority to PCT/CN2016/095818 priority patent/WO2018028001A1/en
Publication of CN106282161A publication Critical patent/CN106282161A/en
Application granted granted Critical
Publication of CN106282161B publication Critical patent/CN106282161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Abstract

The invention discloses a method for specifically capturing and repeatedly copying low-frequency DNA base variation, which belongs to the field of molecular diagnosis and comprises the following steps: (1) performing thermal denaturation on DNA, hybridizing the target DNA by using a primer mixture with a thermodynamic dynamic structure, and then repeatedly copying; (2) specifically extending and tailing the repeatedly copied secondary DNA fragments by using oligonucleotide matched with the end 3 ´ of a determination target, and introducing a common sequence at the end 3 ´ of the secondary DNA fragments; (3) constructing a sequencing library; (4) generating a plurality of sequencing reads; (5) identifying sequence differences between the sequencing reads and the reference sequence; (6) determining whether the sequence variant is present; the invention can achieve the detection sensitivity of low-frequency DNA base variation of 0.01 percent, and is greatly helpful for identifying and clarifying low-frequency nucleic acid variation in samples which may contain a small amount of variant sequences in the background of normal sequences and identifying low-frequency variation in the background of sequencing errors.

Description

Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application
Technical Field
The invention relates to the field of molecular biology and clinical diagnosis, in particular to a method for directly using a structural primer to capture target DNA from a fragmented DNA sample, repeatedly copy and amplify the target DNA for the determination of a second-generation high-throughput parallel sequencer to identify DNA sequence mutation such as base substitution, deletion, insertion or mixed mutation combination with ultralow frequency variation in a DNA fragment and application thereof.
Background
Oncogene variation of tumor tissues is a major factor driving malignant proliferation of tumor cells. Besides one main driving variation, the oncogenes of most tumors have a plurality of other driving variations with lower frequency, and the other driving variations with lower frequency have great influence on the treatment effect of the tumors, and the variations are main targets of the molecular pathological detection of the tumors and are the basis of the targeted treatment of the tumors.
In normal human peripheral blood, a small amount of free DNA (cell-free DNA, abbreviated as cfDNA) is present. In the case of physiological or pathological changes, cfDNA specific to cells of relevant pathological or physiological tissues can be determined from a plasma sample, and therefore, cfDNA can be used as a detection object of 'liquid biopsy' in the case of pathophysiological changes. The liquid biopsy uses free DNA in blood as a detection object, has the characteristics of non-intervention, repeated sampling, high acceptance and the like, has become a hotspot in the research fields of noninvasive prenatal genetic diagnosis, tumor prognosis and the like, and starts to enter clinical application. The identification of cfDNA also has wide application prospects in the fields of early diagnosis, efficacy evaluation, targeted therapy, prognosis evaluation and the like of tumors.
Detection of lung cancer-related gene mutations in fresh tissues, frozen tissues, paraffin-embedded tissues, pleural effusion, plasma, and exfoliated tumor cells has been used to guide personalized targeted drug delivery and has been approved by, for example, the cFDA, FDA, and the European Union drug administration. By 2016, 4 months, the lung cancer targeted drug diagnosis genes identified by regulatory agencies such as FDA have been: mutation hot spots of EGFR, Kras, Braf, Nras genes, PIK3CA gene, EML4-ALK fusion gene, ROS1, ALK/ROS1, BCR-ABL fusion gene, PDGFRA, JAK2, C-KIT mutation gene, etc. In the field of oncology, liquid biopsies of these gene mutation hotspots may be used to monitor tumor burden in the blood, e.g., the detection of Epidermal Growth Factor Receptor (EGFR) -driven mutations in blood samples from lung cancer patients lacking a tissue biopsy diagnosis has been approved by drug administration for concomitant diagnostic methods for EGFR-casein kinase inhibitor class drugs. However, before the liquid biopsy method based on DNA analysis in blood is applied as a conventional cancer diagnosis, the problems of effectiveness, operability and reliability must be solved, but the liquid biopsy method has highlighted a special application prospect in the fields of early diagnosis, drug efficacy evaluation and prognosis evaluation of tumors [ Diehl F, Schmidt K, Choti MA, et al.circ μ latingmuttan DNA to assessment tumor dynamics.nat Med 2008; 14(9) 985-90.
Technically, liquid biopsy has certain limitations in clinical applications due to the low content of free nucleic acids in blood, the susceptibility of free nucleic acids to dilution interference by wild-type nucleic acids, and the limitations on sensitivity of conventional detection techniques. In terms of the sensitivity of mutation point frequency determination, the qPCR technology can reach 0.1%, the digital PCR is 0.01%, and the second generation high-throughput parallel sequencing (short for second generation sequencing) can reach 0.001%. If cfDNA in plasma is used as a detection sample, the requirement on the sensitivity of a mutation detection technology is very high and is required to reach 0.01 percent, and according to the requirement, only the second-generation sequencing technology can be used for high-throughput determination of the cfDNA at present. However, due to the obstacles of sequencing depth and library background signal, the liquid biopsy method based on the second generation sequencing technology is currently only suitable for scientific research and has a certain distance from clinical diagnosis application.
Besides clinical applications in tumor diagnosis, detection of low-frequency variants has enormous application in other biological aspects, such as detection of somatic mutations, sample contamination, and the like. The second-generation sequencing technology has a huge application prospect in detecting low-frequency variation in a mixed sample. However, the limitation of this method is that the error rate of the method itself is high, reaching 0.1-1%, while the low frequency variation in the sample may be as low as 0.01%. The detection of low frequency variants also places high demands on the amount of sample to be loaded, and typical library construction requires at least 50ng of DNA sample. However, the small amount of DNA samples, low mutation frequency, and DNA fragmentation are common features of clinical samples, such that current second generation sequencing techniques used for low frequency variant detection can generate unacceptable false positive rates.
Taking the second generation sequencing platform of Illumina as an example, the base substitution error rate of the MiSeq sequencing platform reaches more than 0.1% [ Performance compliance of NGS platforms MiSeq Ion Torrent-NBT 2012Loman ]. The reasons for this are: base reading errors that occur when DNA sequence is read synthetically; polymerase-induced synthesis errors during DNA strand clustering amplification; 3. base pairing errors introduced by PCR amplification during library preparation; 4. contamination between samples when sequencing libraries. The base reading errors due to the first two causes are related to the machine system, mainly due to the error rate of the polymerase used in the early experiments, and the errors in this respect are greatly reduced with the application of high fidelity synthetases, such as HiFi KAPA polymerase, and the optimization of the sequencing technology platform. The second generation sequencing library construction requires the tailing and amplification of DNA fragments using a PCR chain reaction in which DNA fragments generated in a previous cycle are used for DNA templates in a subsequent reaction cycle, and thus PCR products increase exponentially with the number of PCR cycles. The DNA polymerase used for DNA tailing and amplification will generate certain synthesis errors, such as the commonly used error rate of Taq DNA polymerase of 2.84x10-4Under non-optimized conditions, synthesisOne error occurred at 9000 bases [ Tindall KR, Kunkel TA. Fidelity of DNAsynthesis by the Thermus aquaticus DNA polymerase. biochemistry.19889; 27(16) 6008-13. in the case of the above-described DNA, a single fragment of about 100 bases was synthesized, and the synthesis was erroneously substituted. Ribonucleic acid consists of four bases, and the probability of a mutation of a specific base occurring at a specific position is 2,7000 times lower. Whereas the high fidelity synthesis error rate of KAPAHiFi polymerase is 100 times lower than that of Taq DNA polymerase. On the other hand, in the PCR chain reaction, the DNA fragment generated in the previous reaction cycle is used for the DNA template in the subsequent reaction cycle, so that the DNA synthesis error generated in one reaction cycle is inherited by the subsequent reaction cycle, and the DNA is amplified in an exponential manner. The error generation and generation time in the PCR cycle are random, have great influence on the background noise of the NGS sequencing library, and are the main reasons for generating the background noise by high-throughput sequencing. In addition to using high fidelity DNA polymerase, there are two strategies to reduce the rate of base variation during library preparation, one to reduce the total number of cycles of PCR amplification and one to increase the total number of low frequency variants in a sample. Both must be considered simultaneously to reduce the noise of the background of the method. However, the scarcity of clinical samples and the trace amount of plasma cfDNA are due to the fact that the sample size cannot be increased indefinitely, and an amplification step before sequencing is indispensable. At present, a plurality of methods for amplifying sequencing libraries are available, but the variation frequency is increased in the amplification process, so that the higher false positive rate is caused. The firefly technology of Anguca (AccuraGen) can selectively amplify target fragments at the cost of introducing lower variation by using rolling circle amplification, but the conventional library construction and amplification steps are still required to be completed subsequently, and the technology can only improve the diagnostic sensitivity to variation to 0.02% [ Lin,2015 ].
Another factor that affects the sensitivity of NGS assays is the coverage of the target fragments being tested, i.e., the depth of sequencing. The sequencing depth directly determines the sequencing sensitivity, and a sequencing depth of100 x can only provide 1% of the determination sensitivity at the maximum. Theoretically, the sequencing depth of the second generation sequencing is not limited and can easily exceed 10,000x, for example, the sequencing depth of a single site by using omega plex can exceed 90 ten thousand and even higher, see example 4 of the present invention. Since the genome is too large, about 32 billion, and limited by sequencing throughput, assay price, and data processing capabilities, the above approach is practically infeasible and in practice the library must be selectively enriched. The existing enrichment method can lock target fragments, but cannot reduce library noise signals, and on the contrary, the background can be additionally increased, so that a sequencing result has false positive.
Disclosure of Invention
It is an object of the present invention to provide a method for repeatedly replicating and specifically capturing low-frequency DNA base variations, which solves the above-mentioned problems.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a method for repeated replication and specific capture of low frequency DNA base variations, comprising the steps of:
(1) carrying out thermal denaturation on DNA, hybridizing the target DNA by using a primer mixture with a thermodynamic dynamic structure, carrying out extension replication by using DNA polymerase with the target DNA as a template, and repeating the process to finish repeated replication of the template; the thermal cycler is preferably used for denaturing the DNA;
(2) specifically extending and tailing the repeatedly replicated secondary DNA fragments by using oligonucleotide matched with the 3 'end of a determination target, and introducing a common sequence at the 3' end of a secondary product;
(3) performing PCR amplification by using a primer containing a sequencing barcode sequence to complete the construction of a sequencing library;
(4) performing high-throughput parallel sequencing on the sequencing library to generate a plurality of sequencing reads;
(5) identifying sequence differences between the sequencing reads and the reference sequence;
(6) sequence variants are determined as sequence differences that occur at a frequency of 0.01% or more in a plurality of reads obtained from the nucleic acid sample.
As a preferred technical scheme: the primer with the thermodynamic dynamic structure in the step (1) is an omega primer with a probe length of 12nt-16nt, or a stem-loop primer with a probe length of 12nt-16nt, or a combination of the omega primer with a probe length of 12nt-16nt and the stem-loop primer with a probe length of 12nt-16 nt.
As a preferred technical scheme: the 5' end of the primer with the thermodynamic dynamic structure in the step (1) contains a specific sequence combination, and is at least one of an anchor sequence, a sample barcode sequence or a sequencing primer target sequence required by high-throughput parallel sequencing.
As a preferred technical scheme: the DNA polymerase in the step (1) is high-fidelity DNA polymerase or the combination of the high-fidelity DNA polymerase and high-efficiency polymerase.
As a preferred technical scheme: the hybridization temperature in the step (1) is in the range of 4 ℃ to 35 ℃, and is alternately carried out with the temperature of more than 50 ℃.
As a preferred technical scheme: the repeated replication of the step (1) is a process of completing the hybridization and extension of the primer and the DNA at a low temperature and then performing thermal denaturation at a high temperature; or high temperature heat denaturation after multiple cycles at two or more low temperatures, wherein the repeated replication is one or more repetitions.
As a preferred technical scheme: the primer with the thermodynamic dynamic structure in the step (1) covers two or more specific target fragments, and the coverage is completed in a serial mode.
As a preferred technical scheme: the primer with thermodynamic dynamic structure in step (1) covers a specific double-stranded target fragment for one strand, or for its complementary strand, or for both.
The construction of the second generation sequencing library is that special anchoring sequences, sample bar code sequences and sequencing primer site sequences are assembled at two ends of a DNA segment to be tested. Hybridizing the anchoring sequence with a primer sequence fixed on the surface of a substrate in an Illumina next-generation sequencer to anchor the DNA segment to be detected, amplifying by bridge PCR to form a DNA segment cluster to be detected, synthesizing and reading the base component of each position by DNA polymerase, and recording the sequence of each segment. The invention relates to a principle and a method for constructing a sequencing library by introducing an anchor sequence and a sequencing primer target sequence by using a primer (also called as a structural primer) with a thermodynamic dynamic structure, such as an omega primer (patent application number: PCT/CN2013/070525), a stem-loop primer (Applied Biosystems, Inc, PCT/CN2013/070525) and the like, and is suitable for constructing various second-generation sequencing platform libraries. This method, also known as: the omega plex repeated replication detection method provides multiple conveniences for improving detection sensitivity, enriching specific targets and increasing sequencing depth, as shown in figure 1. The enrichment method of the multiple PCR specific target is usually used for specific amplification of specific target fragments, and the interested fragment set is analyzed in a concentrated manner, so that extremely high sequencing depth can be obtained, but the frequency of background noise cannot be changed, and therefore, the enrichment method of the multiple PCR specific target is not helpful for low-frequency variants. The invention adopts structural omega or stem-loop primers to carry out hybridization capture and copy of target fragment DNA, can carry out repeated copy for up to 200 times by using the same fragment, and equivalently increases the sample amount of the DNA by hundreds of times, as shown in figure 2. Since the replication is repeated using the same DNA fragment as a template, linear amplification is achieved, synthesis errors by DNA polymerase are suppressed, and erroneous amplification is not caused. The error rate of base substitution at any site in Taq DNA polymerase synthesis is 1/9000, i.e., the error rate of specific variation in newly synthesized DNA sequence per time is 0.0037%: 1/(9000x 3). In the repeated copying method, each copying is performed on the same template, so that the error rate of each copying is kept constant and is not accumulated.
Table 1: background mutations (unit: one in ten thousand) erroneously generated by base substitution of DNA polymerase in the repeated replication linear amplification.
Figure BDA0001077239810000041
In Table 1, the low frequency variation samples of10,000 molecular templates containing a mutant are used as an example, and the relationship between the background signal generated by different polymerases and the synthesis error rate of the polymerase itself is derived by mathematical calculation. It can be seen that the random substitution rate of the polymerase at a particular site is independent of the length of the DNA fragment being replicated and is only dependent on the synthesis error rate of the enzyme itself. Using Taq polymerase as an example, a DNA fragment with random mutations was generated, and the probability of mutation occurring at a specific site was constant at 0.0037%. Repeat 100 copies of1 ten thousand DNA targets, generate 37 secondary fragments identical to the target variants, but the total molecular number has been amplified to 100 ten thousand, the error rate of the new variant is still 0.0037%. Thus, if amplified in a repetitive replication fashion, the frequency of newly added specific variants is constant and is only related to the error rate of the polymerase. Therefore, by using a fidelity KAPA HiFi or using different polymerases in combination, optimizing the reaction conditions of the enzymes, etc., the error rate of the polymerases is reduced, and a large amount of amplification of the sample volume can be obtained at the cost of very low background noise. Thereby reducing the number of PCR cycles required for subsequent amplification of the library, resulting in overall reduction of background noise and improved detection sensitivity. In contrast, when Taq DNA polymerase exponentially amplifies 1 ten thousand DNA fragments 100 times by PCR cycles, 6-7 cycles are required, and the probability of mutation of a new fragment varies from 0.0037% to 0.50%, which is related to the number of PCR cycles in which the mutation occurs, because the template of the PCR reaction is the product of the previous cycle, the mutation occurring earlier is inherited by the replication of the subsequent cycle, and the earlier the mutation occurs, the higher the percentage of mutation in the final product, as shown in FIG. 2. The mutation introduced by PCR reaction is the main reason for high false positive rate of next generation sequencing. The repeated replication of the same target can amplify the number of templates to increase the signal intensity, reduce the number of PCR amplification cycles required for library preparation, and reduce the background signal, thereby increasing the detection sensitivity, with fewer mutations, as shown in FIG. 2.
The enrichment method of the specific target of the multiplex PCR uses the linear primers to capture and amplify the specific target, and is difficult to avoid the formation of dimers among the primers, so the method is not suitable for the linear amplification mode of repeated replication. The omega primer is a structural oligonucleotide primer, and contains a base complementary sequence capable of forming a stem loop with the length of 4-12 pairs of bases, a probe with 12 bases at the 3' end and a probe spacer region. The omega primers can avoid the initiation of target spots in a template DNA chain and the dimerization between the primers, can obtain higher primer amplification efficiency and better specificity by using a small amount of primers, and can perform hybridization and extension at lower temperature, thereby improving the synthesis specificity and sensitivity. When the sequence length of the omega probe is extended to 12nt-16nt, the omega primer probe can be hybridized with the complementary site in the strand of the short fragment DNA to start the synthesis extension of the primer, so that the omega primer probe can be used for capturing and copying the fragment DNA. The omega primer has a thermodynamic dynamic structure, forms a stem loop at a lower temperature, and enables the probe region and the 5 'end sequence to be separated independently, so that the 5' end sequence can be adjusted at will without influencing the primer efficiency of the probe as long as the stem loop structure is not damaged.
In a specific example of the invention, a second-generation sequencing P5 anchor sequence and a Read1 sequencing primer combined target sequence are introduced at the 5' end, so that the library building efficiency of a target fragment is improved, and the method is shown in figure 1. Similarly, the stem-loop structure formed at low temperature prevents the formation of dimers between primers, which can be used to replicate templates, and low temperature also helps to improve the sensitivity of hybridization and the balance of sequencing depth between different targets.
One implementation of the present invention is illustrated by the omega primer. The first step in library construction is to mix omega primers, sample DNA (fragments) to be tested, Taq DNA polymerase and the components required for synthesis together and denature them into single strands at 95 ℃. Then, when the temperature is reduced to 60 ℃, a stem-loop structure of the primers is formed, and the hybridization extension reaction between the primers is prevented. The temperature is continuously reduced to the temperature required by hybridization, the probe and the target DNA are hybridized, and the temperature is increased to activate the polymerase activity, so that the synthetic extension of the DNA chain is completed. The above-mentioned denaturation at 95 ℃ and hybridization synthesis at 4 ℃ were repeated to complete linear amplification of the target DNA. The second step is the synthesis of specific target DNA, the reaction proceeds by hybridization and extension with DNA primers containing sequences complementary to the target DNA downstream and sequencing primer sequences, creating a template that can be amplified with the NGS anchor primer. And finally, amplifying by using a PCR primer with a sample bar code, and doping the sample bar code and the anchoring sequence to complete the construction of the sequencing library. The invention adopts the linear repeated replication of the sample, greatly reduces the PCR cycle number required by the subsequent library amplification, completes the library construction work from 1 genome (about 5pg), and only needs 12 PCR cycles. By using low-temperature repeated synthesis, the background signal is reduced to two hundred thousand, and the sensitivity of the library can reach 0.01%. Preferably, a high fidelity polymerase, such as KAPA HiFi polymerase, is used to further reduce background signals, as shown in Table 1. Compared with the current commercial library building kit or technology, the invention can achieve the library quantity of second-generation sequencing and the optimal sensitivity with the least PCR cycles, thereby being greatly helpful for identifying and clarifying low-frequency nucleic acid variation in a sample possibly containing a small amount of variant sequences in a normal sequence background and identifying the low-frequency variation in a sequencing error background.
When the probe length of the structural stem-loop primer is 5nt to 8nt, the probe has the characteristics of avoiding the generation of primer dimer and capturing the end of short DNA 3' at low temperature. In the invention, the stem-loop primer probe can be hybridized with a complementary site in a short segment DNA chain to start the synthesis of DNA after being lengthened to 16 nt. The stem-loop primer optimized by the probe has the characteristics of avoiding the generation of primer dimer and capturing fragment DNA, and can be used for repeated replication of multiplex PCR. Linear primers do not have this property and therefore cannot be used for repeated replication in multiplex PCR. The invention compares the capture sensitivity of single probe omega primer, stem-loop primer and linear primer to fragment DNA, and proves that the 16nt probe sequence can achieve the sensitivity of analyzing 1.6 copy number in a sample. Further tests on omega primers with different probe lengths show that the capture sensitivity of the 14-16nt probe reaches 1.6 copy number, the probe sensitivity of 12nt is 33 copy number, and the probe sensitivity of10 nt is reduced to 3300 copy number. HiSeq and MiSeq sequencing of libraries prepared with omega primers or stem-loop primers demonstrated feasibility, see example 3.
After obtaining sequencing readings from second-generation sequencing, bioinformatics analysis must be performed on the sequencing data to find useful information from the vast amount of data. In the application of the invention, firstly, a FASTQ file is obtained from a MiSeq or HiSeq sequencing result, then quality evaluation is carried out on the sequencing result by using quality detection software, such as FastQC, trimmatic and the like, and sequencing read values with the quality lower than QV30 are removed; the sequences in the FASTQ files are aligned to target reference sequences (e.g., Kras, Braf, and EGFR) using BWA, bowtie, or R software programs, and the alignment results are used to find the repeat region or variation of each sequence and its reference position. The number of variations found to be the same as the number of variations in all sequence reads of a particular target fragment is combined, the variation frequency and noise frequency are calculated, and the likelihood of determining a variation as a positive or negative mutation is confirmed by QV30 analysis and difference analysis of the variation frequency and noise frequency.
Identifying sequence variants includes aligning one or more sequencing reads to a reference sequence to identify differences between the two, and identifying junctions. Typically, alignment is by placing one sequence read along a reference sequence, sequentially scoring each sequence for a match or lack of match, and preferably repeatedly aligning the various positions along the reference sequence. Matches with the best scores were considered as successful pairings and represent an inference as to the degree of relationship between the sequencing read sequence and the reference sequence. The reference sequence to which the sequencing reads are compared is the target reference genome, which may be complete or incomplete. In some embodiments, the reference genome consists only of regions comprising the polynucleotide of interest, e.g., a consensus sequence derived from the reference genome or from sequencing reads analyzed. In some embodiments, the reference sequence consists of only a portion of the reference genome, or a region corresponding to one or more target sequences being analyzed.
In a typical alignment, the mismatch bases in the reference sequence next to the bases in the sequencing reads indicate that a substitution mutation has occurred at that point. Similarly, a deletion mutation is inferred when a sequence is nicked next to the corresponding base in its reference sequence. An insertional mutation is inferred when a sequence has additional bases next to the corresponding bases in its reference sequence. In some embodiments, scoring an alignment involves assigning values for the probability of substitutions and insertions/deletions. When individual bases are aligned, a match or mismatch results in an alignment score that is probabilistic according to the substitution, which can be, for example, 1 for a match and 0 for a mismatch. Gap penalties and substitution probabilities can be based on empirical knowledge or on a priori assumptions about how the sequence mutates. Their values influence the alignment produced. Examples of algorithms for performing the alignment include, but are not limited to, Bowtie algorithm, Smith-Waterman algorithm, Needleman-Wunsch algorithm, Burrows-Wheeler transform based algorithms, and hash function aligners such as Novolalign, ELAND, SOAP, and the like.
In one aspect, the invention provides a method for identifying low frequency variant sequences, and in view of the above, the DNA target fragment may be a single-stranded deoxynucleotide or a double-stranded deoxynucleotide. The target fragment may be an RNA target fragment, either single-stranded or double-stranded. When the target fragment is RNA, synthesis of the complementary strand of DNA is accomplished by a reverse transcription DNA polymerase in a reverse transcription synthesis system. The remaining steps follow the standard method of the present invention.
On the other hand, primers for the same DNA target fragment can be performed simultaneously by using a plurality of omega primers connected in series, and when the primers are used in combination with DNA polymerase (such as Vent (-exo)) having strand displacement activity (strand displacement), the utilization rate of the template is increased, and the detection sensitivity is further improved.
On the other hand, the primer designed by taking the complementary strand of the DNA target segment as a target can simultaneously amplify the same target by using a plurality of omega primers, so that the utilization rate of the template is increased, the detection sensitivity is further improved, the authenticity of mutation can be further verified, and the reliability of low-frequency mutation determination is increased.
On the other hand, the primers designed with the DNA target fragment as the target can amplify the target existing in the sample, but in the case of target template deletion, nonspecific amplification can not be generated, if the fusion gene fragment exists, the primers can be amplified and occupy a certain sequencing depth, and if the fusion gene fragment does not exist in the sample, no nonspecific fragment can not be generated, and the sequencing depth can be saved by the characteristic of the OmegaPlex sequencing method. The hybridization enrichment method inevitably brings non-specific hybridization and amplification of homologous sequences, and wastes sequencing depth.
The primer probe of the stem-loop structure can shield the PCR primer site and initiate the short DNA segment. In one embodiment of the invention, the primer probe with a stem-loop structure is used to replace an omega primer to complete the introduction of the anchoring sequence and the sequencing primer site sequence. The experimental result of the invention proves that the stem-loop primer which is properly modified and designed can be used for repeated replication of the target fragment at low temperature. Stem-loop primers may also be used in the design planning practice of the invention, either in combination with or interchangeably with omega primers. The stem-loop primer design requires the addition of an additional 14-18nt base at the 5' end to form a sequence containing sufficient thermodynamically stable pairing, and in the same primer design, the stem-loop primer will typically be 14-20nt longer than the omega primer.
The thermodynamic dynamic structure primer of the invention can be but is not limited to an omega primer and a stem-loop primer; the nucleotide sample may be single-stranded or double-stranded; the primer set may be multiple tandem, as in FIG. 3, or may be for either single strand of the double strand, as in FIG. 4. In a specific implementation, the reference sequence is a known reference sequence, a consensus sequence formed by aligning sequencing reads thereto
One of the specific applications of the present invention is in the diagnosis of cancerous hotspots in ctDNA in blood. The implementation of the invention, through the capture and amplification of DNA fragments in plasma, with a simplified process, utilizes one or more reactions, conveniently, rapidly, sensitively and accurately identifies the dynamic change process of the lung cancer oncogene, helps clinicians to discover carcinogenesis or drug resistance mutation with extremely low concentration and extremely low mutation abundance, and guides accurate medication in the clinical treatment of tumors, as shown in figure 5.
The second objective of the present invention is to provide an application of the above method, which comprises the following steps: any assay reagent or kit for detecting low frequency nucleic acid variations is prepared. The invention can be used alone for commercialization, and can also be used as a component of a specific application kit. Furthermore, it should be understood that various changes and modifications can be made by those skilled in the art after reading the above-described embodiments of the present invention, and equivalents also fall within the scope of the claims appended to the present application.
Compared with the prior art, the invention has the advantages that: the invention can achieve the library quantity and the best sensitivity of the second-generation sequencing with the least PCR cycles, the detection sensitivity for low-frequency DNA base variation can reach 0.01 percent, the requirement for the sample quantity is smaller, and the corresponding target fragment in the pg-grade sample can be detected, thereby being greatly helpful for identifying and clarifying low-frequency nucleic acid variation in a sample possibly containing a small amount of variant sequences in the normal sequence background and identifying the low-frequency variation in the sequencing error background.
Drawings
FIG. 1: a flow chart of the omega Plex NGS library construction;
FIG. 2: schematic of omega plex repeat replication to reduce library background signal;
FIG. 3: multiple copies of the same target fragment with multiple tandem omega primers;
FIG. 4: the omega primer pair respectively takes two complementary strands of DNA as a replication schematic diagram of a target;
FIG. 5: omega plex low frequency variant detection flow chart;
FIG. 6 DNA of sonicated H1299 cells, H549 cells and Fragmentase disrupted IMR-90 cells
Electrophoresis result chart;
FIG. 7 is a graph comparing the priming efficiency of omega primers, stem-loop primers and linear primers of different probe lengths;
FIG. 8 is a graph of length distribution analysis of the omega Plex library by the conventional BioAnalyzer 2100 chip assay;
FIG. 9 is a graph showing the evaluation of library quality by fluorescence capillary electrophoresis;
FIG. 10 is a graph of the results of the improvement of sensitivity and yield of the method by repeated replication;
FIG. 11: graph of the effect of amplification cycle number on library yield;
FIG. 12: graph of the effect of amplification cycle number on library yield;
FIG. 13 is a graph of results of sequencing depth and sensitivity tests with the addition of internal circulation;
FIG. 14: the detection result of the low-frequency SNP doping of T2663 in the Her2_ V777 region is shown.
Detailed Description
The invention will be further explained with reference to the drawings.
Example 1: determination and analysis process for determining omega Plex low-frequency variation
For fragment DNA capture, 3.5. mu.l of duplicate reaction containing omega primers was added to 2. mu.l of purified DNA samples (DNA sample size range: 5pg-100ng), centrifuged, placed in a PCR instrument, and the following PCR program was run: 1min at 95 ℃, 5 cycles of capture extension (3 min at 8 ℃,15 sec at 55 ℃), 15 sec at 95 ℃, 40-100 times of cycles of capture extension and denaturation, and a final extension of10 min at 72 ℃. Adding 40 mul of target matching PCR reaction solution, mixing evenly, and running the following programs in a PCR thermal cycler: 95 ℃ for 2 minutes, 5 cycles of conversion extension (65 ℃ for 3 minutes, 72 ℃ for 1 minute, 95 ℃ for 15 seconds), and finally extension at 72 ℃ for 10 minutes. PCR samples were purified using streptavidin-coupled magnetic beads (Life Technologies, DynaBeads T1/C1), washed 2 times, and 20. mu.l of sample barcode amplification PCR reaction was added. Mix well and run the following program on a PCR thermal cycler: 95 ℃ for 2 min, 8-12 amplification cycles (65 ℃ for 15 sec, 72 ℃ for 15 sec, 95 ℃ for 15 sec), and finally extension at 72 ℃ for 10 min. Mu.l of the PCR product was taken, 2. mu.l of USB exoAp-IT was added thereto, mixed well, and reacted according to the manufacturer's instructions. The purified product was quantified using a qubit2.0DNA Quantification Assay (Life Technologies), and the fragment size and distribution of the DNA product were analyzed by quality control using fluorescent capillary electrophoresis methods such as DNA agarose gel electrophoresis analysis, Agilent BioAnalyzer 2100 chip analysis, and ABI 3730. Therefore, the construction of the DNA sequencing template of the second-generation sequencing sample is completed, and the DNA product passing through the quality control can be directly used for the sequencing analysis of the corresponding second-generation sequencer. The principle analysis of the omega Plex sequencing library construction is shown in FIG. 1, and the flow analysis is shown in FIG. 5.
After the second-generation sequencing finishes sequencing reading, obtaining a FASTQ file from a MiSeq or HiSeq sequencing result, and then using quality detection software, such as FastQC, trimmatic and the like, to perform quality evaluation on the sequencing result and eliminate the reading result with the quality lower than QV 30. The sequences in the file are aligned to targeted reference sequence combinations (e.g., Kras, Braf and EGFR) using a BWA, bowtie or R software program to find the repeat region or variation of each sequence and its reference position. The number of variations found to be the same in all sequence reads for a particular target fragment is combined, the variation frequency and noise frequency are calculated, and the likelihood of a variation being a positive or negative mutation is confirmed by QV30 analysis and differential analysis of the variation frequency and noise frequency. The process may be automated by a computer language.
The reaction solution and the DNA polymerase vary in different experiments depending on the purpose of the experiment, and in this example, the representative solution components are:
replicate reactions containing omega primers: 10 μ l of 2 Xjumpstart PCR buffer, 2 μ l of100nM omega primer or mixed primer, 1 μ l of JumpStart Taq,1 μ l of RNase A (0.05ug), 2 μ l H20; wherein, omega primer refers to the patent application number: primers disclosed in PCT/CN 2013/070525;
target matching PCR reaction solution: 15 μ l 2xPCR buffer,0.2 μ l LS1980,0.2 μ l LS1976,0.6 μ l Jumpstart Taq,15 μ l H2O, containing a primer set matching a specific target;
sample barcode amplification PCR reaction solution: 15 μ l of 2xPCR buffer,0.3 μ l of LS1985,0.3 μ l of LS1959,0.6 μ l of Jumpstart Taq,15 μ l H2O.
Example 2: DNA preparation, ultrasonication and enzyme fragmentation
The extraction of DNA from plasma, tissue or cultured cells used in the examples of the present invention was performed by treating the DNA with DNeasy Blood tissue kit (Qiagen). The extraction of plasma DNA was carried out exactly according to the methods recommended by the manual. The procedure for DNA extraction of cultured cells was slightly modified based on the supplier's manual.
The method comprises the following specific steps: 1. resuspend the cells cultured in 100mm dishes in 200. mu.l PBS, add to 2ml tube, label separately, add 20. mu.l PK to each tube; 2. adding 200 mul of buffer AL, shaking up and down for 15 seconds,centrifuging the solution, incubating at 56 deg.C for 10min, and shaking and centrifuging once every 3 min; 3. crushing by an ultrasonic crusher under the condition of strength of 40, starting for 15 seconds/stopping for 45 seconds, and keeping for 2 minutes in total; 4. adding 200 mul of ethanol (96-100%) into each tube, shaking up and down for 15 seconds, and centrifuging; 5. transferring the mixed solution into a 2ml collection tube with a filter column, and centrifuging at 6000g for 1 minute; 6. placing the filter column in a new 2ml collection tube, adding 500 μ l buffer AW1, centrifuging at 6000g for 1min, discarding the waste liquid and collection tube; 7. placing the filter column in a new 2ml collection tube, adding 500 μ l buffer AW2, centrifuging at 14000rpm for 3min, discarding the waste liquid and the collection tube; 8. the filter column was placed in a centrifuge tube with low DNA adsorption, 50. mu. l H was added2O, incubating at room temperature for 1min, and centrifuging at 6000g for 1 min; 9. taking out 2 μ l, and measuring the concentration by using qubit; 10. running 100V for 40 min by 2% agarose gel electrophoresis, and loading 1-2 μ l;
then 5 μ g of the obtained product is treated by Fragmentase endonuclease, and the treatment system is as follows: mu.l of endonuclease buffer, 1. mu.l of Fragmentase, 8. mu.l of DNA sample, incubated at 37 ℃ for 30 minutes. The reaction was stopped with EDTA (2.5. mu.l of 0.5M EDTA was added to a 10. mu.l system). The digested sample was mixed with PBS to 200. mu.l, added with 20. mu.l proteinase K, mixed well and centrifuged. The DNA fragment obtained after repurification with DNeasy blood Tissue Kit was 700-1000bp in average size, as shown in FIG. 6. In FIG. 6, 1:2ngH1299 cell DNA fragments; 2:2ng A549 cell DNA fragment; 3:2ng IMR-90 cell DNA fragment, fragment length range: 500-1,000 bp.
Example 3: comparison of priming efficiency of omega primers, stem-loop primers and Linear primers of different Probe lengths
To determine the priming sensitivity and priming efficiency of omega primers for different primers, and different probe lengths. We used copy numbers of 1)3,300; 2) 330; 3) 33; 4) 6.6; 5) 1.6H 1299 cell debris DNA as a template, and performing amplification and library building on adjacent sequences at the position of EGFR G719N by using different primers and omega primers with different probe lengths. The primer sequences used are shown in Table 5.
Table 5: primer sequences used in example 3
Figure BDA0001077239810000091
AATGATACGGCGACCACCGAGATCT is an Illumina second generation sequencing platform P5 anchoring sequence in omega primer sequences; ACACTCTTTCCCTACACGACGCTCTTCCGATCT is Read1 sequencing primer site; CAAGCAGAAGACGGCATACGAGAT is the P7 anchor sequence; GTGACTGGAGTTCAGACGTGTGCTCT is the Read2 sequencing primer site.
In this example, the reaction solution was replicated repeatedly: 10 μ l of 2 XjumpStart PCR buffer, 2 μ l of a single entity of omega primers or stem-loop primers or linear primers of100nM probes of different lengths, 1 μ l of jumpStart Taq,1 μ l of RNase A (0.05ug), 2 μ l H20;
target matching PCR reaction solution: 15 μ l 2xPCR buffer,0.2 μ l LS1984,0.2 μ l LS2060,0.6 μ l Jumpstart Taq,15 μ l H2O, containing a primer set matching the specific target.
Mu.l of duplicate reaction solution containing various primers was added to 2. mu.l of DNA samples containing different copy numbers, centrifuged, placed in a PCR instrument, and the following PCR program was run: 95 ℃ for 1min, 40 cycles of capture extension (8 ℃ for 3min, 55 ℃ for 15 sec), 72 ℃ for 10 min. Add 40. mu.l of target matched PCR reaction and run the following program on a PCR thermal cycler: 95 ℃ for 2 min, 40 amplification cycles (65 ℃ for 3min, 72 ℃ for 1min, 95 ℃ for 15 sec), and finally extension at 72 ℃ for 10 min. Analysis by 2% agarose gel electrophoresis, conditions of electrophoresis: 100v 30 min, the results are shown in FIG. 7. In fig. 7, samples are fragment DNA of H1299 cells with different copy numbers, and the loading amounts are respectively: 1)3,300 copies; 2)330 copies; 3)33 copies; 4)6.6 copies; 5)1.6 copies.
In order to objectively compare the sensitivity of each primer, this example does not use a repeated replication method, but uses 40 cycles at a low temperature to determine the sensitivity of the primer. From the results of FIG. 7, it can be seen that the probe length of the omega primers has a large influence on the capture of the target DNA. The omega primers for the 14-16nt probes can achieve sensitivity to a single number of copies. There was already a significant reduction in the omega primers for the 12nt probe. The sensitivity of the probe of10 nt is 330 copies, and the probe has no use value. Both stem-loop and linear primers with 16nt probes achieved single copy number sensitivity. Thus, both can be used interchangeably with omega primers in a well-designed multiplex PCR reaction of single or small targets. However, as the number of targets increases, the difficulty of design increases in a logarithmic progression, so that linear primers no longer have practical experimental significance, and in particular cannot be used for repeated replication of templates at low temperature. In the present invention, the stem-loop primer and the omega primer can be used interchangeably or in combination, and one of the adverse factors for the application of the stem-loop primer is that the 5 'end of the stem-loop primer needs to be complementarily paired with the sequence in front of the 3' end probe, so that an extra number of bases are needed to form a double-stranded stem structure, and the same design needs 10-20 bases more than the omega primer. This adds cost to the detection of multiple targets and has an effect on the overall quality of the primers. Because primer synthesis is a complex process of chemical polymerization, the longer the primer, the more errors are introduced.
Example 4: stem-loop primer for construction and sequencing analysis of second-generation sequencing library of Kras G12N fragment
In the present invention, as an attempt to test the principle of the omega plex method, we used a stem-loop primer to perform assembly synthesis of a sequencing library type at the site of a single target gene (Kras G12NG13N) of DNA of H1299 cells and a549 cells. The primer sequences used are shown in Table 6. Primer LS1953 is designed according to the principle of stem-loop primer (Applied Biosystems, Inc, PCT/CN2013/070525), and the underlined sequences can be paired with each other at low temperature to form stable double-stranded folding, so that the whole primer has the structure of stem, loop and probe.
Table 6: primer sequences used in example 4
Figure BDA0001077239810000111
In this example, the reaction solution was replicated repeatedly: 10 μ l of 2 XjumpStart PCR buffer, 2 μ l of100 nMLS1953 stem-loop primer, 1 μ l of jumpStart Taq,1 μ l of RNase A (0.05ug), 2 μ l H20;
target matching PCR reaction solution: 15 μ l 2xPCR buffer,0.2 μ l LS1980,0.2 μ l LS1976,0.6 μ l Jumpstart Taq,15 μ l H2O;
sample barcode amplification PCR reaction solution: 15 μ l 2xPCR buffer,0.3 μ l LS1985,0.3 μ l sample barcode primer LS1959-LS1963,0.6 μ l Jumpstart Taq,15 μ l H2O;
sample 1: 50ng H1299 cell debris DNA;
sample 2: 50ng A549 cell fragment DNA;
sample 3: 100ng H1299 cell fragment DNA +1ng A549 cell fragment DNA;
sample 4: 100ng H1299 cell fragment DNA +0.1ng A549 cell fragment DNA;
sample 5: 100ng H1299 cell fragment DNA +0.01ng A549 cell fragment DNA;
to 2. mu.l of the purified DNA sample/sample mixture was added 3.5. mu.l of the duplicate reaction containing the stem-loop primer, centrifuged, placed and run the following PCR program with a PCR machine: 95 ℃ for 1min, 40 cycles of capture extension (8 ℃ for 3min, 55 ℃ for 15 sec), 95 ℃ denaturation for 15 sec, and a final extension of 72 ℃ for 10 min. Adding 40 mul of target matching PCR reaction solution, mixing evenly, and running the following programs in a PCR thermal cycler: 95 ℃ for 2 minutes, 5 cycles of conversion extension (65 ℃ for 3 minutes, 72 ℃ for 1 minute, 95 ℃ for 15 seconds), and finally extension at 72 ℃ for 10 minutes. PCR samples were purified using streptavidin-conjugated magnetic beads (Life Technologies, DynaBeads T1/C1), washed 2 times, and 20. mu.l of sample barcode amplification PCR reaction was added. Mix well and run the following program on a PCR thermal cycler: 95 ℃ for 2 min, 15 amplification cycles (65 ℃ for 15 sec, 72 ℃ for 15 sec, 95 ℃ for 15 sec), and finally extension at 72 ℃ for 10 min. Mu.l of the PCR product was taken, 2. mu.l of USB exoAp-IT was added thereto, mixed well, and reacted according to the manufacturer's instructions. The purified product was quantified using the Qubit2.0DNA Quantification Assay (Life Technologies).
The length distribution analysis of the H1299Kras G12N library fragment using the BioAnalyzer 2100DNA fragment assay chip showed very good specificity. Libraries constructed by conventional ligation methods typically contain small amounts of single-stranded primers and macromolecular heterobands, as shown in FIG. 8B; whereas the omega plex method results in a lower background signal for the library, see a in figure 8.
Figure BDA0001077239810000121
Through HiSeq rapid chip analysis, 120-140 ten thousand sequencing readings are respectively obtained, the matching rate with the reference sequence is more than 83%, the sequencing depth is more than 90 ten thousand, and the results are shown in Table 2. Table 3 lists the frequencies determined for Kras G12N in DNA of A549 cells and H1299 cells. The G12 site of H1299 cell DNA is predominantly GGT, but there are also low frequencies of CGT, TGT and AGT. The G12 site of A549 cell DNA is also predominantly GGT, but has 4.75% CGT, 4.62% TGT and 38.08% AGT. When H1299 cell DNA was mixed in with 1% of a549 cell DNA, the frequency of CGT measurements in the mixture was 0.36%, 0.05% more than with pure H1299DNA, which corresponds exactly to 0.0475% of the contribution of 1% of a549 cell DNA. The frequency of TCGT in the mixture was 1.95% more than 0.08% with pure H1299 cell DNA, which corresponds exactly to 0.0462% contributed by 1% of a549 cell DNA. The frequency of the mixture AGT was determined to be 3.86% more than 0.53% with pure H1299 cell DNA, which corresponds to 0.38% of the contribution of 1% of the DNA of A549 cells. Thus, it was confirmed that 0.05% resolution could be measured by the omega Plex sequencing method. In this example, Taq DNA polymerase is used, which has a higher synthesis error rate, and when replacing with a high fidelity polymerase, such as KAPA HiFi, the background signal can be further reduced.
TABLE 3 OmegaPlex sequencing analysis of the Kras G12N fragment
Figure BDA0001077239810000131
Example 5: comparison of DNA quality of OmegaPlex library by agarose gel electrophoresis, Bioanalyzer 2100, fluorescence capillary electrophoresis
Agarose gel electrophoresis, Bioanalyzer 2100, is commonly used for quality control of second generation sequencing libraries, but the procedure is laborious and time consuming and has low throughput. The PCR primers are fluorescently labeled, and the omega Plex product is more finely evaluated by a fluorescence capillary electrophoresis method. Compared with the former two methods, the method not only provides higher analysis precision, but also has the advantages of automatic operation, low price, high flux and high measurement speed. FIG. 9 shows the results of the analysis by omega Plex, which shows the quantification and distribution of the different bands.
Example 6: increased library yield by repeated replication
With a copy number of 1) 330; 2) 33; 3) 6.6; 4) 3.3H 1299 cell debris DNA as template, and using 16nt omega primer of probe to amplify and pool the adjacent sequence of EGFR G719N site. The primer sequences used are shown in Table 7. In this test, the effect of repeated replication and conventional one-time replication approaches on library yield was compared. The specific procedure was the same as described in test experiment 1 except for the conditions described below.
Table 7: primer sequences used in example 6
Figure BDA0001077239810000141
The capture extension conditions for conventional replication were: 95 ℃ for 1min, 40 cycles of capture extension (8 ℃ for 3min, 55 ℃ for 30 sec), 72 ℃ for 10 min;
the capture extension conditions for duplicate replication were: 95 ℃ for 1min, 20 cycles of capture extension (8 ℃ for 3min, 55 ℃ for 30 sec, 95 ℃ for 15 sec), 72 ℃ for 10 min.
The results show that a sample of 33 copies can be measured in the conventional replication mode, while a sensitivity of 3.3 copies can be achieved in the repeated replication mode. Both approaches show a positive correlation of library yield with the number of templates. Repeated 20 replicates increased yield by nearly 10-fold, see FIG. 10.
Example 7: increasing the amplification consistency of libraries by internal circulation
This example verifies the number of target matching PCR cycles, and the number of cycles of sample barcode amplification PCR reactions versus library yield and quality, and the relationship between repeat replication temperature and yield to optimize the optimal ratio. The primer sequences used are shown in tables 8a, 8 b.
Table 8 a: OmegaPlex-1 primer set
Figure BDA0001077239810000142
Figure BDA0001077239810000151
Figure BDA0001077239810000161
Table 8 b: LPM-314 primer set
Figure BDA0001077239810000162
Figure BDA0001077239810000171
The method comprises the following steps: for fragment DNA capture, 3.5. mu.l of replicate reaction containing omega primers was added to 2. mu.l (5ng) of IMR-90 cell fragment DNA samples, centrifuged, placed and run the following PCR program with a PCR machine: 95 ℃ for 1min, 5 cycles of capture extension (35 ℃ for 3min, 55 ℃ for 15 sec), 95℃ denaturation for 15 sec, 100 cycles of capture extension and denaturation, and final extension at 72 ℃ for 10 min. Adding 40 mul of target matching PCR reaction solution, mixing evenly, and running the following programs in a PCR thermal cycler: 2 min at 95 ℃ and n cycles of conversion extension (3 min at 65 ℃, 1min at 72 ℃,15 sec at 95 ℃) and finally 10min at 72 ℃. PCR samples were purified using streptavidin-conjugated magnetic beads (DynaBeads T1/C1, Life Technologies), washed 2 times and 20. mu.l of sample barcode amplification PCR reaction was added. Mix well and run the following program on a PCR thermal cycler: 95 ℃ for 2 min, m amplification cycles (65 ℃ for 15 sec, 72 ℃ for 15 sec, 95 ℃ for 15 sec), and finally extension at 72 ℃ for 10 min. Mu.l of the PCR product was taken, 2. mu.l of USB exoAp-IT was added thereto, mixed well, and reacted according to the manufacturer's instructions. The purified product was quantified using the Qubit2.0 DNAQuantication Assay (Life Technologies), and the fragment size and distribution of the DNA product were analyzed by quality control using fluorescent capillary electrophoresis methods such as DNA agarose gel electrophoresis analysis and ABI 3730. n is the number of cycles equal to 5, 10, or 15, and m is the number of cycles equal to 5, 10, or 15.
In this example, the reaction solution was replicated repeatedly: 10 u l 2x Jumpstart PCR buffer, 2 u l of100nM OmegaPlex-1, 1 u l Jumpstart Taq,1 u l RNase A (0.05ug), 2 u l H20;
target matching PCR reaction solution: mu.l of 5xHiFi buffer, 0.3. mu.l of LS1980[ biotin ],0.3uM LPM-314, 0.6. mu.l of KAPA HiFi, 1. mu.l of dNTP (10mM), 24. mu. l H2O. A primer set containing a specific target match;
sample barcode amplification PCR reaction solution: 6 μ l of 5xHiFi buffer,0.3 μ l LS1985,0.3 μ l of sample barcode primer, 0.6 μ l of KAPA HiFi,1 μ l dNTP (10mM),24 μ l H2O;
using the Illumina platform as an example, the required sample was 2 ng/. mu.l. The experimental results show that the omega Plex yield over the minimum 10 cycles (5 matching cycles +5 amplification cycles) is 1.75 ng/. mu.l, and the total volume is 20. mu.l, which already meets the loading requirements. Increasing the total number of PCR cycles increased the library yield as well, see FIG. 11. And simultaneously, the sample is analyzed by fluorescence capillary electrophoresis, the influence of the cycle number on the yield of each fragment is small within the tested range, the difference can be ignored, and the result is shown in figure 12. An increase in each PCR cycle always results in an increase in background level, and therefore controlling the total number of PCR cycles is one of the effective ways to reduce the background level. In the test, the amount of the DNA sample is 5ng, the yield of the DNA fragment obtained by 10-15 cycles of amplification is 1.75 ng/mul-8.11 ng/mul, and the subsequent quality control and sequencing requirements are completely met.
In addition, the addition of low temperature internal cycling during repeated replication also has a beneficial effect on the sequencing depth and sensitivity of the library, see FIG. 13. Sample 1 and sample 13 were the same sample, 5ng H1299 cell debris DNA, sample 1 underwent 100 repeated replication cycles (16 ℃ C. times.1 sec. -55 ℃ C. for 10 sec-95 ℃ C. for 15 sec), while sample 13 underwent 100 repeated replication cycles, each cycle incorporating 5 internal cycles of 16 ℃ to 55 ℃ C. [ (16 ℃ C. times.1 sec. -55 ℃ C. for 10 sec.) x.5 cycles-95 ℃ C. for 15 sec ]. The results of the fluorescence capillary electrophoresis analysis of the samples show that the homogeneity of the sample 13 added with internal circulation is superior to that of the sample 1 without internal circulation.
Example 8: OmegaPlex sequencing accuracy and depth
Using the same reaction conditions as in example 7, the same primer mixture (primer sequences are shown in Table 8a and Table 8b), we prepared DNA mixed samples of IMR-90 cells, and IMR-90 cells and A549 cells, and a library of plasma DNA samples, through 5 cycles of pairing and 15 cycles of barcode amplification. The prepared library was analyzed by MiSeq, and the results are shown in Table 4.
FASTQ files were obtained from MiSeq and, by fastQC quality analysis, QV30> 85%. 50-230 ten thousand effective sequencing readings are respectively obtained from 4 libraries, the matching rate with a reference sequence is over 31 percent except for a plasma DNA sample, and the average sequencing depth is more than 1 ten thousand, which is shown in Table 4. The sequencing depth of the Her2_ V777L fragment reaches more than 8 ten thousand, and the statistical results of the noise frequency of 4 samples are respectively as follows: the background frequency average for N > G transitions was 0.0050% (0.0045%, 0.0066%, 0.0040%, 0.0048%); n > background frequency average of C transition: 0.0050% (0.0041%, 0.0072%, 0.0045%, 0.0044%). N > background frequency average of T transition: 0.0064% (0.0047%, 0.0116%, 0.0047%, 0.0048%), background frequency average of N > a transitions: 0.0054% (0.0039%, 0.0100%, 0.0040%, 0.0038%). The mutation frequency of the first base immediately adjacent to the omega probe was relatively abnormal and was significantly higher than the mean, and the effect of the frequency at this site on the overall background was statistically excluded. By analyzing an IMR-90 sample in a control way, the gene SNP T2663 of A549 doped at low frequency can be detected. SNP T2663 was detected at frequencies of 0.032% and 0.03%, see FIG. 14. Although 9 pairs of omega primers for the fusion gene fragment of EML4-ALK were added to the experiment, no matched fragment was found in all samples.
Table 4: results of OmegaPlex sequencing of cellular and plasma DNA
Figure BDA0001077239810000201
It should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the embodiments of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.

Claims (8)

1. A method for specifically capturing and repeatedly replicating low frequency DNA base variations for non-diagnostic purposes, the method comprising the steps of:
(1) performing thermal denaturation on DNA, hybridizing the target DNA by using a primer mixture with a thermodynamic dynamic structure, performing extension replication by using DNA polymerase with the target DNA as a template, and repeating the denaturation and hybridization processes to finish repeated replication of the template;
(2) specifically extending and tailing the repeatedly replicated secondary DNA fragments by using oligonucleotide matched with the 3 end of a determination target, and introducing a common sequence into the 3 end of a secondary product;
(3) performing PCR amplification by using a primer containing a sequencing bar code to complete the construction of a sequencing library;
(4) performing high-throughput parallel sequencing on the sequencing library to generate a plurality of sequencing reads;
(5) identifying sequence differences between the sequencing reads and the reference sequence;
(6) determining a sequence variant as a sequence difference that occurs at a frequency of 0.01% or more from a plurality of reads obtained from said nucleic acid sample;
wherein, the primer with the thermodynamic dynamic structure in the step (1) is an omega primer with a probe length of 12nt-16nt, or a stem-loop primer with a probe length of 12nt-16nt, or a combination of the omega primer with a probe length of 12nt-16nt and the stem-loop primer with a probe length of 12nt-16 nt.
2. The method of claim 1, wherein: the 5 end of the primer with the thermodynamic dynamic structure in the step (1) contains a specific sequence combination, and is at least one of an anchoring sequence, a sample barcode sequence or a sequencing primer target sequence required by high-throughput parallel sequencing.
3. The method of claim 1, wherein: the DNA polymerase in the step (1) is high-fidelity DNA polymerase or the combination of the high-fidelity DNA polymerase and high-efficiency polymerase.
4. The method of claim 1, wherein: the hybridization temperature in the step (1) is in the range of 4 ℃ to 35 ℃, and is alternately carried out with the temperature of more than 50 ℃.
5. The method of claim 1, wherein: the repeated replication of the step (1) is a process of completing the hybridization and extension of the primer and the DNA at a low temperature and then performing thermal denaturation at a high temperature; or high temperature heat denaturation after multiple cycles at two or more low temperatures, wherein the repeated replication is one or more repetitions.
6. The method of claim 1, wherein: the primer with the thermodynamic dynamic structure in the step (1) covers two or more specific target fragments, and the coverage is completed in a serial mode.
7. The method of claim 1, wherein: the primer with thermodynamic dynamic structure in step (1) covers a specific double-stranded target fragment for one strand, or for its complementary strand, or for both.
8. Use of the method according to any one of claims 1 to 7, characterized in that: preparing a test reagent or kit set for detecting low-frequency DNA base variation for non-diagnostic purposes.
CN201610662853.0A 2016-08-12 2016-08-12 Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application Active CN106282161B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610662853.0A CN106282161B (en) 2016-08-12 2016-08-12 Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application
PCT/CN2016/095818 WO2018028001A1 (en) 2016-08-12 2016-08-18 Method for specifically capturing and repeatedly replicating low-frequency dna base variation and use thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610662853.0A CN106282161B (en) 2016-08-12 2016-08-12 Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application

Publications (2)

Publication Number Publication Date
CN106282161A CN106282161A (en) 2017-01-04
CN106282161B true CN106282161B (en) 2020-10-30

Family

ID=57669343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610662853.0A Active CN106282161B (en) 2016-08-12 2016-08-12 Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application

Country Status (2)

Country Link
CN (1) CN106282161B (en)
WO (1) WO2018028001A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106282161B (en) * 2016-08-12 2020-10-30 成都诺恩基因科技有限公司 Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application
CN110699426B (en) * 2019-01-02 2022-01-28 上海臻迪基因科技有限公司 Gene target region enrichment method and kit
CN111440846B (en) * 2020-04-09 2020-12-18 江苏先声医学诊断有限公司 Position anchoring bar code system for nanopore sequencing library building
CN111534569A (en) * 2020-05-29 2020-08-14 安徽安龙基因科技有限公司 Oligonucleotide primer, kit and application
CN114250269A (en) * 2021-12-28 2022-03-29 上海市肺科医院 Probe composition, second-generation sequencing library based on probe composition and application of second-generation sequencing library
CN115125314A (en) * 2022-06-01 2022-09-30 四川大学华西医院 Detection method for base heterozygosis in heterogeneous drug-resistant bacteria

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104120184A (en) * 2014-07-28 2014-10-29 成都诺恩生物科技有限公司 Method for measuring short chain RNA by amplifying length polymorphism of DNA fragment
CN104153004A (en) * 2014-08-11 2014-11-19 上海美吉生物医药科技有限公司 Database-building method for amplicon sequencing

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020115082A1 (en) * 2000-09-01 2002-08-22 Vince Phillips Methods and compositions for polynucleotide analysis using generic molecular beacons
JP5126877B2 (en) * 2007-06-26 2013-01-23 独立行政法人理化学研究所 Method for detecting single nucleotide variants
CN102618651B (en) * 2012-01-19 2014-06-18 成都诺恩生物科技有限公司 Omega structure oligonucleotide primer for detecting short chain ribonucleic acid (RNA) and application thereof
AU2014362227B2 (en) * 2013-12-11 2021-05-13 Accuragen Holdings Limited Compositions and methods for detecting rare sequence variants
CN106282161B (en) * 2016-08-12 2020-10-30 成都诺恩基因科技有限公司 Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104120184A (en) * 2014-07-28 2014-10-29 成都诺恩生物科技有限公司 Method for measuring short chain RNA by amplifying length polymorphism of DNA fragment
CN104153004A (en) * 2014-08-11 2014-11-19 上海美吉生物医药科技有限公司 Database-building method for amplicon sequencing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MicroRNA-derived Fragment Length Polymorphism Assay;Xie et al;《SCIENTIFIC REPORTS》;20150320;1-8 *

Also Published As

Publication number Publication date
CN106282161A (en) 2017-01-04
WO2018028001A1 (en) 2018-02-15

Similar Documents

Publication Publication Date Title
CN106282161B (en) Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application
JP6664025B2 (en) Systems and methods for detecting rare mutations and copy number variations
US20220267845A1 (en) Selective Amplfication of Nucleic Acid Sequences
CN108004301B (en) Gene target region enrichment method and library construction kit
US11913063B2 (en) Systems and methods for combined detection of genetic alterations
CN108138209B (en) Method for preparing cell-free nucleic acid molecules by in situ amplification
CN113186287B (en) Biomarker for non-small cell lung cancer typing and application thereof
CN109266744A (en) Multiple PCR primer, kit and the method for targeting sequencing detection lung cancer gene based on UMI unimolecule label noise reduction technology
CN111073961A (en) High-throughput detection method for gene rare mutation
JP2004504059A (en) Method for analyzing and identifying transcribed gene, and finger print method
CN106757379A (en) Lung cancer polygenic variation library constructing method
CN110938693A (en) Primer group, kit and method for detecting BRAF gene mutation
CN106520917A (en) Gene large fragment deletion/duplication detection method
CN106480078A (en) One group of gastric cancer peritoneum Metastatic Marker and application thereof
CN108103143B (en) Method for constructing multiple PCR and rapid library in target region
CN111748628B (en) Primer and kit for detecting thyroid cancer prognosis related gene variation
WO2021018127A1 (en) Library creation method and application
CN114277114B (en) Method for adding unique identifier in amplicon sequencing and application
WO2023159817A1 (en) Genetic diagnosis probes and use thereof
Chen et al. Establishment of multiplex allele-specific blocker PCR for enrichment and detection of 4 common EGFR mutations in non-small cell lung cancer
KR20220130591A (en) Methods for accurate parallel quantification of nucleic acids in dilute or non-purified samples
CN110603334A (en) PCR primer pair and application thereof
CN110612355A (en) Composition for quantitative PCR amplification and application thereof
CN116397027B (en) Composite amplification system and kit for combined detection of KRAS/NRAS/BRAF genotyping
Al-Turkmani et al. Molecular assessment of human diseases in the clinical laboratory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 610041 B6 501, 88 Keyuan South Road, hi tech Zone, Chengdu, Sichuan.

Applicant after: Chengdu Nuoen Gene Technology Co Ltd

Address before: 610041 B6 501, 88 Keyuan South Road, hi tech Zone, Chengdu, Sichuan.

Applicant before: Chengdu Nuoen Biological Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant