CN116121243A - DNA tag and application thereof - Google Patents

DNA tag and application thereof Download PDF

Info

Publication number
CN116121243A
CN116121243A CN202310265316.2A CN202310265316A CN116121243A CN 116121243 A CN116121243 A CN 116121243A CN 202310265316 A CN202310265316 A CN 202310265316A CN 116121243 A CN116121243 A CN 116121243A
Authority
CN
China
Prior art keywords
dna
sequence
sequencing
nucleic acid
adapter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310265316.2A
Other languages
Chinese (zh)
Inventor
柴相花
甄贺富
袁玉英
张现东
张爱萍
张红云
刘娜
尹烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to CN202310265316.2A priority Critical patent/CN116121243A/en
Publication of CN116121243A publication Critical patent/CN116121243A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07HSUGARS; DERIVATIVES THEREOF; NUCLEOSIDES; NUCLEOTIDES; NUCLEIC ACIDS
    • C07H21/00Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids
    • C07H21/04Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids with deoxyribosyl as saccharide radical
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biochemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Pathology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

There is provided a DNA tag for detecting minor variations, the tag having a sequence selected from at least one of: (1) HHATHHHTCACCHHATHHH; or (2) HHHTAHHTAHHHTAHH, wherein H represents A, T or C.

Description

DNA tag and application thereof
PRIORITY INFORMATION
Without any means for
Technical Field
The invention relates to the field of biological sequencing, in particular to a DNA tag, a DNA linker, a method for constructing a sequencing library, a sequencing library and a sequencing method.
Background
The rapid development of high throughput sequencing technology has brought genomic level research into a new period. It can not only carry out large-scale genome sequencing, but also be used for gene expression analysis, identification of non-coding small analysis RNA, and the like. In the medical field, the high-flux sequencing technology breaks the flux limit in the disease research process, so that multi-layer and comprehensive research on diseases is possible, and an effective means is provided for preventing, diagnosing and treating the diseases. In genome, gene expression research or medical genetics detection, DNA sequencing determination, DNA molecular quantification, RNA abundance analysis and the like are of great significance. However, since the high throughput sequencing technology requires PCR amplification of sample DNA/RNA prior to sequencing, PCR generally has problems of amplification bias, amplification error, etc., and meanwhile, based on a specific sequencing platform and sequencing environment, sequencing error may also occur during the sequencing process, so that about 1% of bases cannot be correctly identified, thereby limiting detection of rare variations and low frequency variations.
The single molecule tag (Unique Molecular Identifiers, UMI) technique is to record the original DNA/RNA information of a sample by randomly adding a synthetic sequence (typically 5-12 bp) to the end of a DNA/RNA molecule fragment as the unique tag for identifying the DNA fragment. As early as 2011, the unique identifier (Unique Identifier, UID) technique was used by Isaac Kinde, jian Wu et al to detect rare mutations, which is similar to the UMI technique. Immediately in 2012, to resolve the determination of two non-samples in a single sampleAbsolute quantification of relative abundance of equimolecular or polymolecules, by Teemu Kivioja, anna
Figure SMS_1
The first time that single molecule labeling (UMI) techniques were used by et al to count the absolute amounts of multiple molecules. The detection of very rare mutations was performed using further UMI and duplex sequencing (Duplex Sequencing, DS) techniques by Michael W.Schmitt et al. Also Scott R Kennedy, michael W Schmitt et al provided detailed protocols in 2014 for efficient DS linker synthesis, library preparation, target enrichment, and data analysis flow summaries. Rare mutations in the ABL1 gene were then detected by DS technology in 2015, michael W Schmitt, et al.
However, the detection of minimal variations in the genome still requires further development.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art.
The inventor of the application develops a set of genome trace mutation detection and verification system based on an original UMI sequence. The mutation frequency which can be detected by the system can reach 0.01% at the lowest, and early screening of cancers, neurodegenerative diseases, cardiovascular diseases and the like which are induced by accumulated mutation of somatic cells, stem cells and the like can be realized.
In a first aspect of the invention, the invention proposes a DNA tag. According to an embodiment of the invention, the tag has a sequence selected from at least one of the following: (1) HHATHHHTCACCHHATHHH (SEQ ID NO: 10); and (2) HHHTAHHTAHHHTAHH (SEQ ID NO: 11), wherein H represents A, T or C. The label according to the embodiment of the invention can realize detection and verification of very small amount (mutation frequency is as low as 0.01%) of mutation, and has important significance for early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
In a second aspect of the invention, the invention provides a DNA adaptor. According to an embodiment of the invention, the DAN linker contains the DNA tag described above. By constructing a sequencing library by using the DNA linker according to the embodiment of the invention and sequencing the sequencing library, the trace mutation can be detected, and the detection sensitivity of the trace mutation or rare mutation with the mutation frequency as low as 0.01% is high. The DNA joint provided by the embodiment of the invention has very important significance for early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
In a third aspect of the invention, the invention provides the use of the DNA tag described above and the DNA adaptor described above for detecting minor variations. The label and the connector can be used for detecting and verifying very small amount (mutation frequency is as low as 0.01%) of mutation, and have important significance for early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
In a fourth aspect of the invention, the invention provides a method of constructing a sequencing library. According to an embodiment of the invention, the method comprises subjecting the nucleic acid molecules with the DNA adaptors described above attached thereto to an enrichment process in order to obtain a sequencing library. The sequencing library constructed by the method according to the embodiment of the invention can be used for detecting very small amount of variation, and the mutation frequency of the very small amount of variation can be as low as 0.01%.
In a fifth aspect of the invention, the invention provides a sequencing library. According to an embodiment of the invention, the sequencing library is obtained by the method of constructing a sequencing library as described previously. The sequencing library is subjected to high-throughput sequencing, the lowest detectable mutation frequency can reach 0.01%, and early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like can be realized.
In a sixth aspect of the invention, the invention provides a sequencing method. According to an embodiment of the invention, the method comprises subjecting the sequencing library described previously to sequencing and data analysis processes. By using the sequencing method provided by the embodiment of the invention, the detection and verification of low-frequency mutation can be realized, and meanwhile, the mutation frequency which can be detected by different UMI technologies according to the sequencing depth can reach 0.01%, so that the method can be effectively applied to early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of the overall analysis of a trace variation detection system according to one embodiment of the present invention;
FIG. 2 is a flow chart of a data analysis process according to one embodiment of the invention;
FIG. 3 is a diagram of purification quantification of PCR products and Sanger sequencing validation according to one embodiment of the present invention;
FIG. 4 is a graph of the results of detecting a linker prepared with the "T" plus "strategy using detection 2100, according to one embodiment of the invention;
FIG. 5 is a graph of the results of detecting a linker prepared by the ganchor strategy using assay 2100 according to one embodiment of the invention;
FIG. 6 is a graph of the results of a linker prepared using the assay 2100 to detect cleavage strategy according to one embodiment of the invention;
FIG. 7 is a graph of the results of detecting a sequencing library using a detection 2100, according to one embodiment of the invention;
FIG. 8 is a cumulative depth profile of a sample according to one embodiment of the invention;
FIG. 9 is a depth profile of a sample according to one embodiment of the invention;
fig. 10 is a UMI sequence set profile of a sample according to an embodiment of the present invention; and
fig. 11 is a diagram of the result of constructing a duplex consistency sequence according to an embodiment of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
It should be noted that the terms "first," "second," and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying a number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. Further, in the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.
For the nucleic acids mentioned in the description and claims of the invention, the person skilled in the art will understand that virtually any one or both of the complementary double strands are included. For convenience, in the present description and claims, although only one strand is shown in most cases, the other strand complementary thereto is actually disclosed. For example, reference is made to SEQ ID NO:1, actually including the complement thereof. One skilled in the art will also appreciate that one strand may be used to detect another strand and vice versa.
DNA tag
In a first aspect of the present invention, the present invention proposes a DNA tag for detecting minor variations. According to an embodiment of the invention, the tag has a sequence selected from at least one of the following: (1) HHATHHHTCACCHHATHHH; and (2) HHHTAHHTAHHHTAHH, wherein H represents A, T or C. The label according to the embodiment of the invention can realize detection and verification of very small amount (mutation frequency is as low as 0.01%) of mutation, and has important significance for early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
DNA adaptor
In a second aspect of the invention, the invention provides a DNA adaptor. According to an embodiment of the invention, the DAN linker contains the DNA tag described above. By constructing a sequencing library by using the DNA linker according to the embodiment of the invention and sequencing the sequencing library, the trace mutation can be detected, and the detection sensitivity of the trace mutation or rare mutation with the mutation frequency as low as 0.01% is high. The DNA joint provided by the embodiment of the invention has very important significance for early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
According to a further embodiment of the invention, the linker has a sticky end dT. Furthermore, the effective and rapid connection of the adaptor and the gene fragment to be sequenced can be realized through rapid T-A connection.
According to a specific embodiment of the present invention, the DNA adaptor further comprises: an anchor sequence formed between the sticky end dT and the tag sequence. When the anchor sequence and the tag sequence are subjected to annealing reaction, the two sequences are complementarily paired until the 3' -terminal protruding base T of the anchor sequence is terminated. In molecular cloning, the stability of the protruding base formed by flat end connection is relatively poor, and a certain failure rate exists at the same time; and annealing reaction is carried out through two sequences (wherein the anchor sequence is more than dT base), and the two sequences are complementarily paired to finally form a protruding dT end, and the protruding dT end has no requirement of connection reaction, so that two primers are complementarily paired one by one, and the introduction of the anchor sequence is more efficient and firmer than the connection of the common 3' -end flat end with dT.
According to a specific example of the invention, the anchoring sequence has the sequence of SEQ ID NO:1, and a nucleotide sequence shown in the specification. CTATGTCGATGC (SEQ ID NO: 1). The anchoring sequence according to the embodiments of the present invention is not strictly complementary to sequences other than its complementary sequence, and is not itself easily linked. In addition, dtp contains no dC base, and thus the extension reaction is terminated, so that the complementary structure of the anchor sequence can be effectively protected from being destroyed.
According to an embodiment of the present invention, the sticky end dT is formed at the 3' -end of the DNA tag. And then can be connected with the fragment to be sequenced of which the 5' -end is connected with A in a rapid and efficient way.
According to a specific embodiment of the invention, the linker with the anchor sequence attached is obtained by gradient annealing, dtp extension and alcohol purification complementary nick treatment in sequence. The specific steps are as follows:
1. the gradient annealing comprises the following specific steps:
1) Adding ddH according to pipe wall mol parameter 2 O (OAB buffer) was diluted to 150uM and then 12. Mu.l of each of the three sequences was mixed in equal volumes, see Table 1;
table 1:
Figure SMS_2
remarks: experiments show that dT is added in the synthesis of the anchoring sequence, and the stability and the efficiency of dT are better than that of the dT which is connected after the preparation of the connector, so that dT is added in the synthesis of the anchoring sequence when the connector connected with the anchoring sequence is prepared.
2) Placing the sample in a PCR instrument for annealing reaction;
3) After the reaction is completed, the mixture is placed at the temperature of minus 20 ℃ for preservation, and marked as pre-Mix-ac;
2. dDTP extension, the specific steps include:
1) 35 μl of pre-Mix-ac was added with the reagents and blown and mixed to give the system shown in Table 2:
table 2:
pre-Mix-ac 35μl
10×Blue buffer 5μl
dDTP(25mM each) 5μl
Klenow(3’→5’exo-)(5U/μl) 5μl
In Total 50μl
2) Incubating for 1h at 37 ℃;
3) Purifying with alcohol, eluting with ddH2O to obtain 50 μl;
4) Stored at-20℃and labeled ac-Adafter-1. T.1.
3. Alcohol purification and nick supplementing specifically comprises the following steps:
1) Taking 45 μl of ac-adhesive-1. T.1, adding the following reagents, blowing and mixing uniformly, and obtaining a system shown in Table 3;
table 3:
ac-Adpater-1.T.1 45μl
2x Rapid ligation buffer 50μl
T4 DNA Ligase(600U/μl) 5μl
In Total 50μl
2) Incubate at 37℃for 30min.
4. Purification of alcohol with ddH 2 O eluted 30. Mu.l. 1 μl was diluted for assay 2100;
5. preserving at-20deg.C after the reaction.
According to a specific example of the present invention, the DNA adaptor further includes: a cleavage sequence formed at the end of the DNA tag, wherein the cleavage sequence carries a restriction enzyme recognition site adapted to generate a cohesive end dT. The endonuclease cleaves 8 bases after the sense strand cleaves the recognition site and 7 bases after the antisense strand cleaves the recognition site, forming a cohesive end with a 3' end protruding 1 dT base. The linker with the cleavage sequence attached forms a 3' -terminal overhang T-terminal structure that is more stable.
According to still another specific example of the present invention, the cleavage sequence is an HphI specific recognition site. After HphI specific recognition site is specifically recognized and digested by HphI, a sticky end dT can be generated at the 3' end of the DNA linker, so that the binding site can be quickly and efficiently connected with the fragment to be sequenced.
According to still another specific example of the present invention, the linker to which the HphI-specific recognition site cleavage sequence is ligated is obtained by gradient annealing treatment, dDTP elongation treatment, and HphI cleavage treatment in this order. Specifically, the PCR can be obtained by two methods, namely short PCR after digestion and long PF after digestion.
The mode of short PCR after cleavage is specifically as follows:
1. the gradient annealing comprises the following specific steps:
1) Adding ddH2O (OAB buffer) according to the pipe wall mol parameter to dilute to 100uM, and then respectively taking 20 mu l of equal volume to mix;
2) Placing the sample in a PCR instrument for annealing reaction;
3) After the reaction is completed, the mixture is placed at-20 ℃ for storage and marked as pre-Mix-S.
2. dDTP extension, the specific steps include:
1) 35ul of pre-Mix-S is taken, the reagent is added, and the mixture is blown and uniformly mixed, and the system is shown in Table 4;
table 4:
pre-Mix-S 35μl
10×Blue buffer 5μl
dDTP(25mM each) 5μl
Klenow(3’→5’exo - ) 5μl
In Total 50μl
2) Incubating for 1h at 37 ℃;
3) Alcohol purification, eluting with ddH2O in an amount of 20. Mu.l, and diluting 1. Mu.l for detection of 2100 hypersensitivity;
4) Stored at-20℃and labeled pre-Adafter-S.
Hphl enzyme digestion specifically comprises the following steps:
1) The volumes of the tables were taken separately and pre-Adafter-S was added and mixed, the system is shown in Table 5;
table 5:
Figure SMS_3
2) Incubating for 16h at 37 ℃ and 20min at 65 ℃ for inactivation;
3) Purification of alcohol with ddH 2 O was eluted at 30. Mu.l, diluted at 1. Mu.l for detection of 2100 hypersensitivity;
4) Preserving at-20deg.C after the reaction.
The mode of the long PF after cleavage (PCR-Free) is specifically as follows:
the gradient annealing comprises the following specific steps:
1) Adding ddH2O (OAB buffer) according to the pipe wall mol parameter to dilute to 100uM, and then respectively taking 20 mu l of equal volume to mix;
2) Placing the sample in a PCR instrument for annealing reaction;
3) After completion of the reaction, the mixture was stored at-20℃and labeled pre-Mix-L57.
dDTP extension, the specific steps include:
1) 35ul of pre-Mix-L57 is taken, the following reagents are added, and the mixture is blown and uniformly mixed, and the system is shown in Table 6;
table 6:
pre-Mix-L57 35μl
10×Blue buffer 5μl
dDTP(250nM each) 5μl
Klenow(3’→5’exo - ) 5μl
In Total 50μl
2) Incubating for 1h at 37 ℃;
3) Purification of alcohol with ddH 2 O was eluted at 20. Mu.l, diluted at 1. Mu.l, and used to detect 2100 hypersensitivity;
4) Storing at-20deg.C, and labeling with pre-distributor-L57.
Hphl enzyme digestion specifically comprises the following steps:
1) The volumes of the tables were taken out and pre-Admixture-L57 was added to the mixture to mix them, and the system was as shown in Table 7;
table 7:
Figure SMS_4
Figure SMS_5
2) Incubating for 16h at 37 ℃; incubating at 65 ℃ for 20min for inactivation;
3) Purification of alcohol with ddH 2 O was eluted at 30. Mu.l, diluted at 1. Mu.l for detection of 2100 hypersensitivity;
4) After the reaction is completed, the mixture is placed at-20 ℃ for preservation.
Use of DNA tags and DNA linkers in detecting microscale variations
In a third aspect of the invention, the invention provides the use of the DNA tag described above and the DNA adaptor described above for detecting minor variations. The detection and verification of very small amount (mutation frequency is as low as 0.01%) of variation can be realized by using the tag and the connector according to the embodiment of the invention, and in scientific research, the very small amount (mutation frequency is as low as 0.01%) of variation is detected by using the tag and the connector according to the embodiment of the invention, and the scientific research of the very small amount of variation provides reliable detection means, such as somatic mitochondrial mutation rate detection, rare DNA variation detection (such as detection of a novel susceptibility site), research of accurately calculating DNA/RNA copy number by using single molecule counting, genetic diseases, aging research (such as detection of an aging-related methylation site) and the like. In addition, the method has important significance for early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulation mutation of somatic cells, stem cells and the like.
Method for constructing sequencing library
In a fourth aspect of the invention, the invention provides a method of constructing a sequencing library. According to an embodiment of the invention, the method comprises subjecting the nucleic acid molecules with the DNA adaptors described above attached thereto to an enrichment process in order to obtain a sequencing library. The sequencing library constructed by the method according to the embodiment of the invention can be used for detecting very small amount of variation, and the mutation frequency of the very small amount of variation can be as low as 0.01%.
Specifically, according to an embodiment of the present invention, the nucleic acid molecule is obtained by: (1) Performing PCR amplification on the nucleic acid sample to be detected so as to obtain a nucleic acid sample fragment; (2) Subjecting the nucleic acid sample fragment to 3' end addition A treatment; (3) Ligating the aforementioned DNA adaptor to the nucleic acid sample fragment obtained in step (2) to obtain the nucleic acid molecule to which the aforementioned DNA adaptor is ligated.
According to a further embodiment of the invention, the enrichment is achieved by PCR enrichment after ligation of the sample fragment to be tested with a DNA linker having only the sticky end dT or an anchor sequence between the sticky end dT and the tag sequence. The specific steps are as follows:
1) And (5) preparing experiments. Compiling a PCR reaction table according to the experimental task list and the sample number;
2) And (5) adding a template. DNA samples were added to a 96-well PCR plate according to the typesetting sequence of the PCR reaction table, 3. Mu.L of batch samples per well, and 5. Mu.L of re-amplified samples per well. Checking whether the DNA information is consistent with the PCR reaction table, sampling the bottom of the tube or the wall, centrifuging for 30s at 2000rpm after sealing by a sealing film, and checking the sampling condition of the bottom of the tube for later use;
3) Mix split charging. And subpackaging the prepared mix into a reaction plate to be used, adding 22 mu L of batch samples into each hole, adding 20 mu L of re-amplified samples into each hole, and suspending to add mix. After the rubber pad is covered, the mixture is centrifuged for 30 seconds at 1500rpm, and the PCR instrument is immediately carried out for circular amplification;
4) Circularly amplifying by a PCR instrument;
5) And detecting amplified products, namely, carrying out 2,000rpm and 30 seconds short centrifugation on the amplified products, and transferring the amplified products to an electrophoresis room for detection. When the following products cannot be detected in time, the products are stored at 4 ℃.
According to another embodiment of the present invention, the enrichment treatment can be performed by the above-mentioned PCR enrichment method after the adaptor having the cleavage sequence is ligated with the sample to be tested. According to still another specific example of the present invention, when the linker having the cleavage sequence is obtained by the above-described method of the post-cleavage long PF (PCR-Free), the enrichment process may be omitted after the linker having the cleavage sequence is attached to the sample to be tested.
According to a specific example of the present invention, the enrichment treatment is preceded by a purification treatment of the nucleic acid molecule to which the aforementioned DNA linker is attached. Specifically, the purification treatment may be performed by magnetic bead purification. The purification treatment process can remove relevant enzymes and relevant buffers in the connection treatment process, so that interference on subsequent enrichment treatment is eliminated, and the enrichment success rate and the enrichment efficiency of connection products are remarkably improved.
Sequencing library
In a fifth aspect of the invention, the invention provides a sequencing library. According to an embodiment of the invention, the sequencing library is obtained by the method of constructing a sequencing library as described previously. According to an embodiment of the invention, the sequencing library is obtained by the method of constructing a sequencing library as described previously. The sequencing library is subjected to high-throughput sequencing, the lowest detectable mutation frequency can reach 0.01%, and early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like can be realized.
Sequencing method
In a sixth aspect of the invention, the invention provides a sequencing method. According to an embodiment of the invention, the method comprises subjecting the sequencing library described previously to sequencing and data analysis processes. By utilizing the sequencing method provided by the embodiment of the invention, the detection and verification of the low-frequency mutation can be realized, and the sequencing method can be effectively applied to early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
According to a specific embodiment of the invention, the sequencing is performed by the Hiseq2500 platform. High-throughput sequencing is carried out on a Hiseq2500 platform, so that the cost can be greatly reduced, the stability of experimental data and analysis results is ensured, and more importantly, the mutation frequency which can be detected according to different UMI technologies of sequencing depth can reach 0.01%.
According to a specific example of the present invention, the data analysis processing flow refers to fig. 2, and is specifically described as follows:
1) And (5) preprocessing data. Preprocessing the original sequencing data, including filtering low-quality reads, extracting UMI (unified Messaging infrastructure) linker sequences, counting reads information, UMI linker sequence information and the like;
2) And (5) comparison. Alignment of reads after pretreatment onto a reference sequence using BWA (V0.5.9-r 16);
3) And (5) filtering the comparison result. Counting and filtering the comparison result;
4) And (5) sequencing. Ranking the results using samtools (V0.1.16) comparison;
5) Single-stranded consensus sequences were constructed. Constructing a single-chain consistency sequence according to the UMI sequence set;
6) And (5) sequencing. Sequencing single stranded consensus sequences using samtools (V0.1.16);
7) A duplex consistency sequence is constructed. Constructing a duplex consistency sequence according to the complementary sequences in the UMI sequence set;
8) And (5) sequencing. Ordering the duplex consensus sequence using samtools (V0.1.16);
9) Filtering and sequencing. Filtering the duplex consistency sequence by using samtools (V0.1.16), and sequencing the filtered results;
10 Locally aligned). Local alignment of duplex consensus sequences using GATK (V2.4-9);
11 Mutation information analysis. And analyzing and counting mutation information according to the set mutation rate.
In summary, by using the DNA tag, the DNA adapter, the method for constructing the sequencing library, the sequencing library and the sequencing method according to the embodiments of the present invention, detection and verification of low-frequency mutation can be achieved, and the mutation frequency that can be detected can reach 0.01% at the lowest, so that the method can be effectively applied to early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like. The specific steps are as follows: because the present invention employs a specific library preparation and analysis strategy, i.e., the prepared linker sequence is used to join the sample DNA, although the linker sequence contains 10 degenerate bases, specifically to each molecule, or to a specific sequence thereof. After the sample DNA is ligated, the original sequencing templates are obtained, and the tail end of each template is added with a molecular tag of 19 bases, and the left and right ends of each template are added with a molecular tag of 38 bases. Each degenerate base has 3 choices, 20 bases is 3A 20, which is equal to approximately 3.5 hundreds of millions of possibilities. This ensures that each original template is unique in the original library. PCR amplifies the original library, each template would form a family of molecules complementary based on the 2 intermediate sequences of the original template: forward and reverse. Based on this library preparation and sequencing strategy, some false positive mutation sites can be excluded in the specific analysis by the following strategy:
1) Mutations occur only once, or a small number of times, in a family of molecules. Moreover, the complementary family of molecules did not show the same mutation, which indicates that the mutation is a random error, or a replication error introduced later in the PCR process, or a misinterpretation of the base by the Hiseq machine. At the same time, the sample was shown to have no mutation at this position;
2) The general appearance in one family of molecules, but not in the complementary family of molecules, suggests that this mutation is a replication error introduced during the first cycle of PCR;
3) Are present in general in the molecular family, and corresponding mutations occur in the complementary strand. This suggests that this mutation is true and authentic.
The scheme of the present invention will be explained below with reference to examples. It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The specific techniques or conditions are not noted in the examples and are carried out according to the techniques or conditions described in the literature in the art (for example, refer to J. Sam Brookfield et al, code Huang Peitang et al, molecular cloning Experimental guidelines, third edition, scientific Press) or according to the product specifications. The reagents or apparatus used are conventional products available commercially, such as those available from Illumina corporation, without the manufacturer's knowledge.
Examples of the present invention PCR of the target region was performed with 2 sets of DNA samples, and after determining the respective specific base points by Sanger sequencing, the molar ratio was 1:1,1:100,1:1000,1:10000 are respectively mixed into 4 groups of products, and finally, the products are sequentially tested corresponding to three UMI strategies, and the details are shown in table 8.
Table 8:
Figure SMS_6
the target area is shown in table 9.
Target sequence is DRB1 x 01:01:01 (it is explained that the sequence corresponding to this type is a reference sequence of the DRB1 gene, and the sequence shown below is a sequence of this type within the target region), for example, the sequence is as follows:
ATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATGACAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGAC(SEQ ID NO:4)。
table 9:
gene name Exons Initial position Termination position Sequence length
DRB1 Exome1 211 310 100bp
The implementation of each step is described in turn below according to the overall analysis flowchart of the minimal variation detection system (see fig. 1).
1. The specific steps of DNA extraction are as follows:
(1) Add 20. Mu.L of proteinase K solution to a 1.5mL centrifuge tube;
(2) 200 μl of blood sample was added to the tube;
(3) 200 mu L of buffer AL is added into the tube, vortex oscillation is carried out for 15 seconds, and the mixture is fully mixed;
(4) Water bath at 56 ℃ for 10 minutes;
(5) Proper centrifugation in a micro centrifuge to lower all liquid to the bottom of the tube;
(6) Adding 200 mu L of absolute ethyl alcohol, carrying out vortex oscillation for 15 seconds, uniformly mixing, and properly centrifuging in a micro-centrifuge to enable all liquid to sink to the bottom of a pipe;
(7) Carefully transferring all the liquid obtained in the previous step into a purification column without wetting the edge, centrifuging for 1 min at 8000rpm in a high-speed centrifuge, discarding the collecting pipe, and replacing a new collecting pipe;
(8) Carefully opening the cap, adding 500 μl buffer AW1 without wetting the edges, centrifuging at 8000rpm in a high-speed centrifuge for 1 min, discarding the collection tube, and replacing the new collection tube;
(9) The tube cap was opened, 500. Mu.L of buffer AW2 was added, and the mixture was centrifuged at 14000rpm for 3 minutes in a high-speed centrifuge;
(10) Discarding the collecting pipe, replacing a new centrifuge pipe, and centrifuging at 14000rpm in a high-speed centrifuge for 1 seed separation;
(11) Discarding the collecting tube, placing the purification column into a 1.5mL centrifuge tube, airing for 3 minutes, adding 50 mu L of buffer AE or ultrapure water, standing for 5 minutes at room temperature, centrifuging for 1 minute at 8000rpm in a high-speed centrifuge, discarding the purification column, and covering the centrifuge tube;
(12) OD value is measured on nanodrop 2000, and measurement results are recorded;
(13) Marking the extracted DNA, and storing in a refrigerator at-20deg.C.
2. The PCR amplification comprises the following specific steps:
(1) Designing a primer;
and determining a specific and conserved region as a candidate region of the primer design through bioinformatics analysis in the upstream and downstream regions of the target region, and completing the primer design according to the primer design principle. In order to improve the data utilization rate, the PCR primer amplification region is as short as possible under the condition of meeting the coverage of the target region.
According to the design principle in the technical scheme, the primer sequences are finally determined for the target areas as shown in the table 10.
Table 10:
gene name Exons Forward primer Reverse primer Amplification length
DRB1 Exome1 CCCTGGAGGCTCCTG(SEQ ID NO:5) CACCCRCAATGTGCA(SEQ ID NO:6) 75bp
(2) And carrying out PCR amplification on the DNA sample by adopting high-fidelity PCR enzyme and prepared primers to realize enrichment of target sequences.
3. Purification quantification of PCR products and Sanger sequencing verification (see FIGS. 3a and b for details);
4. performing terminal repair, namely taking more than 200ng of products for terminal repair, and purifying;
5. adding dA at the 3 '-end, namely adding' A 'at the 3' -end, and purifying;
6. the UMI joint is added, and the specific steps are as follows:
(1) The preparation of the joint comprises the following specific steps of sequentially introducing according to three strategies shown in fig. 1:
I. the T strategy is added, namely dT is added to the 3' end, and the specific steps are as follows:
1) The gradient annealing comprises the following specific steps:
a) Adding ddH according to pipe wall mol parameter 2 O (OAB buffer) was diluted to 100. Mu.M and then mixed in 20. Mu.l equal volumes, as shown in Table 11;
table 11:
Figure SMS_7
b) Placing the mixture in a PCR instrument for annealing reaction;
c) After completion of the reaction, the mixture was stored at-20℃and labeled pre-Mix-T.
2) dDTP extension, the specific steps include:
a) 35 μl of pre-Mix-T was added with the reagents, the system was as shown in Table 12, and the mixture was blown and mixed well:
table 12:
pre-Mix-T 35μl
10×Blue buffer 5μl
dDTP(25mM each) 5μl
Klenow(3’→5’exo-)(5U/μl) 5μl
In Total 50μl
b) Incubating for 1h at 37 ℃;
c) Alcohol was purified and eluted with ddH2O at 42ul.
3) dT is added, and the specific steps comprise:
a) To the product of the last step, the reagents of Table 13 were added.
Table 13:
the product of the last step 42μl
10×Blue buffer 5μl
dTTP(10mM) 1μl
Klenow(3’→5’exo - )(5U/μl) 2μl
In Total 50μl
b) Incubate at 37℃for 30min.
4) Purified with alcohol, dissolved in 30. Mu.l of ddH 2O. 1 μl was diluted for assay 2100; (see FIG. 4 for details)
5) After completion of the reaction, the reaction mixture was stored at-20℃and labeled dT-Adafter-T.
II, adding an anchor strategy, which comprises the following specific steps:
1) The gradient annealing comprises the following specific steps:
a) Adding ddH2O (OAB buffer) to dilute to 150uM according to pipe wall mol parameters, and then respectively taking 12 mu l of equal volume for mixing, wherein the specific details are shown in Table 1;
b) Placing the sample in a PCR instrument for annealing reaction;
c) After the reaction is completed, the mixture is placed at the temperature of minus 20 ℃ for preservation, and marked as pre-Mix-ac;
2) dDTP extension, the specific steps include:
a) 35 μl of pre-Mix-ac was added with the reagents and the system was blown as shown in Table 2 and mixed well:
b) Incubating for 1h at 37 ℃;
c) Purified with alcohol, and eluted with ddH2O in an amount of 50. Mu.l.
d) Stored at-20℃and labeled ac-Adafter-1. T.1.
3) Alcohol purification and nick supplementing specifically comprises the following steps:
a) Taking 45 μl of ac-adhesive-1. T.1, adding the following reagents, blowing and mixing uniformly, wherein the system is shown in Table 14;
table 14:
ac-Adpater-1.T.1 45μl
2x Rapid ligation buffer 50μl
T4 DNA Ligase(600U/μl) 5μl
In Total 50μl
b) Incubate at 37℃for 30min.
4) Purified with alcohol, dissolved in 30. Mu.l of ddH 2O. 1 μl was diluted for assay 2100; (see FIG. 5 for details)
5) After completion of the reaction, the reaction was stored at-20℃and labeled ac-Addata.
III, the cleavage strategy, namely Hphl cleavage, comprises a short sequence scheme (S) and a long sequence scheme (L), namely a PCR scheme and a PCR-Free scheme, and comprises the following specific steps:
1) The gradient annealing comprises the following specific steps:
a) Adding ddH according to pipe wall mol parameter 2 O (OAB buffer) was diluted to 100uM and then mixed in 20 μl equal volume;
the primers for the short sequence protocol are shown in Table 15:
table 15:
Figure SMS_8
the primers for the long sequence scheme are shown in table 16:
table 16:
Figure SMS_9
b) Placing the sample in a PCR instrument for annealing reaction;
c) After completion of the reaction, the mixture was kept at-20℃and labeled pre-Mix-S and pre-Mix-L57, respectively.
2) dDTP extension, the specific steps include:
a) Taking pre-Mix-S and pre-Mix-L57 respectively, adding a reagent, blowing and mixing uniformly, wherein the system is shown in Table 17;
table 17:
pre-Mix-S/pre-Mix-L57 35μl
10×Blue buffer 5μl
dDTP(25mM each) 5μl
Klenow(3’→5’exo-)(5U/μl) 5μl
In Total 50μl
b) Incubating for 1h at 37 ℃;
c) Purified with alcohol, and eluted with ddH2O in an amount of 20. Mu.l.
d) Stored at-20deg.C and labeled pre-Adafter-S and pre-Adafter-L57, respectively.
3) Hphl enzyme digestion specifically comprises the following steps:
a) a) mixing the systems shown in Table 18 and Table 19 respectively;
table 18:
Figure SMS_10
/>
table 19:
Figure SMS_11
b) b) incubation at 37℃for 16h and at 65℃for 20min for inactivation.
4) Purified with alcohol, dissolved in 30. Mu.l of ddH 2O. 1 μl was diluted for assay 2100; (see FIGS. 6a and b for details).
5) After completion of the reaction, the reaction was stored at-20℃and labeled Adafter-S and Adafter-L, respectively.
(2) Connecting prepared UMI joints
(3) Magnetic bead purification
7. PCR enrichment (Long sequence scheme of cleavage, i.e.PCR-Free omitted this step), magnetic bead purification
8. Library quantitation, i.e., assay 2100 (see a, b, c, d of FIG. 7 for details) and QPCR quantitation, the QPCR quantitation results are shown in Table 20, followed by waiting for sequencing on-press.
Table 20:
Figure SMS_12
9. PE sequencing
10. Data analysis
The following examples are presented at length by way of example only.
1) The data of PE90 of the Hiseq2500 platform sequencing machine is preprocessed and UMI sequences are extracted.
2) The primer sequences were deleted and aligned (BWA (V0.5.9-r 16);
3) The results of the comparison are processed and counted, and the cumulative depth profile and depth profile of the sample are shown in fig. 8 and 9, respectively, and the results of UMI-LT57-1 are shown at a limited scale.
4) Sequencing the comparison results after treatment (samtools (V0.1.16));
5) Constructing a single-stranded consensus sequence, wherein the UMI sequence set distribution diagram of the sample is shown in FIG. 10, and the UMI-LT57-1 only results are shown at length;
6) Sequencing single stranded consensus sequences (samtools (V0.1.16));
7) Constructing a duplex consistency sequence, storing the construction result in a SAM file format, wherein a screenshot of the construction result is shown in figure 11, and the method is limited to showing only the result of UMI-LT57-1 at a certain length;
8) Sorting, filtering and re-sorting (samtools (V0.1.16));
9) Local alignment (GATK (V2.4-9));
10 Mutation information analysis, statistical results are shown in tables 22-25, limited to the space showing only the region containing the predetermined mutation site.
Table 21: UMI-LT57-1 mutation information analysis results table
Chr Ref Pos Total_Depth Eff_Depth Total_Mut A_Mut_Fre T_Mut_Fre C_Mut_Fre G_Mut_Fre
D_ref C 243 22612 22546 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 244 22615 22450 2 0->0.0000 0->0.0000 1->0.0000 1->0.0000
D_ref T 245 22616 22410 1 1->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref G 246 22617 22550 2 0->0.0000 1->0.0000 0->0.0000 0->0.0000
D_ref A 247 22620 22416 18128 0->0.0000 0->0.0000 0->0.0000 18128->0.8087
D_ref C 248 22621 22533 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 249 22612 22296 2 0->0.0000 0->0.0000 0->0.0000 1->0.0000
D_ref G 250 22498 22440 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref C 251 22403 22123 17802 0->0.0000 17802->0.8047 0->0.0000 0->0.0000
D_ref G 252 22393 22180 17846 0->0.0000 17845->0.8046 1->0.0000 0->0.0000
D_ref C 253 22391 22335 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
Remarks: all the defined mutation sites were detected, and the bolded representation: a247G, C251T, G252T, respectively; meaning of each column of the table heading: chr represents a reference sequence identifier; ref represents a reference base; pos represents position information on the reference sequence; total_Depth represents the Total Depth; eff_Depth represents the effective Depth; total_Mut represents the Total number of mutant bases; A_Mut_Fre represents the ratio of the number of bases at which A base mutation occurs to the effective depth occupied by the number of bases; T_Mut_Fre represents the ratio of the number of bases at which T base mutation occurs to the effective depth occupied by the number of bases; C_Mut_Fre represents the ratio of the number of bases at which a C base mutation occurs to the effective depth occupied by the number of bases; G_Mut_Fre represents the ratio of the number of bases at which G base mutation occurs to the effective depth occupied by the number of bases. The following three tables are similar.
Table 22: UMI-LT57-2 mutation information analysis results table
Chr Ref Pos Total_Depth Eff_Depth Total_Mut A_Mut_Fre T_Mut_Fre C_Mut_Fre G_Mut_Fre
D_ref C 243 12877 12827 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 244 12878 12734 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref T 245 12878 12649 1 1->0.0001 0->0.0000 0->0.0000 0->0.0000
D_ref G 246 12880 12830 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 247 12884 12683 591 0->0.0000 0->0.0000 0->0.0000 591->0.0466
D_ref C 248 12884 12829 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 249 12885 12672 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref G 250 12884 12817 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref C 251 12882 12785 587 0->0.0000 587->0.0459 0->0.0000 0->0.0000
D_ref G 252 12882 12762 587 0->0.0000 587->0.0460 0->0.0000 0->0.0000
D_ref C 253 12882 12823 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
Table 23: UMI-LT57-3 mutation information analysis results table
Figure SMS_13
Figure SMS_14
Table 24: UMI-LT57-4 mutation information analysis result table
Chr Ref Pos Total_Depth Eff_Depth Total_Mut A_Mut_Fre T_Mut_Fre C_Mut_Fre G_Mut_Fre
D_ref C 243 5273 5252 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 244 5273 5199 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref T 245 5273 5193 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref G 246 5286 5247 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 247 5288 5187 1 0->0.0000 0->0.0000 0->0.0000 1->0.0002
D_ref C 248 5288 5258 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref A 249 5288 5161 1 0->0.0000 0->0.0000 0->0.0000 1->0.0002
D_ref G 250 5288 5261 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
D_ref C 251 5288 5241 2 0->0.0000 2->0.0004 0->0.0000 0->0.0000
D_ref G 252 5288 5246 1 0->0.0000 1->0.0002 0->0.0000 0->0.0000
D_ref C 253 5288 5253 0 0->0.0000 0->0.0000 0->0.0000 0->0.0000
From the analysis results, the detection ratio has a good correspondence with the sample mixing ratio, and the determined mutation site can be correctly detected even when the mixing ratio is 10000:1. Therefore, UMI sequences designed by the present system can detect mutations with mutation rates of 0.01%.
Industrial applicability
The method can be effectively applied to detection and verification of low-frequency mutation, the lowest detectable mutation frequency can reach 0.01%, and the method can be effectively applied to early screening of cancer, neurodegenerative diseases, cardiovascular diseases and the like induced by accumulated mutation of somatic cells, stem cells and the like.
Although specific embodiments of the invention have been described in detail, those skilled in the art will appreciate. Numerous modifications and substitutions of details are possible in light of all the teachings disclosed, and such modifications are contemplated as falling within the scope of the present invention. The full scope of the invention is given by the appended claims and any equivalents thereof.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Claims (19)

1. A DNA tag comprising the sequence:
HHHTAHHTAHHHTAHH,
wherein H represents A, T or C.
2. A DNA adaptor comprising the DNA tag of claim 1.
3. The DNA adaptor of claim 2, wherein the adaptor has a cohesive end dT.
4. The DNA adapter of claim 3, further comprising:
an anchor sequence formed between the sticky end dT and the tag sequence.
5. The DNA adapter of claim 4, wherein the anchoring sequence has the sequence set forth in SEQ ID NO:1, and a nucleotide sequence shown in the specification.
6. A DNA adaptor according to claim 3, wherein the sticky end dT is formed at the 3' end of the DNA tag.
7. The DNA adapter of claim 5, wherein said adapter is obtained by gradient annealing, dtp extension, and alcohol purification complementary k treatment in this order.
8. The DNA adapter of claim 3, further comprising:
an enzyme digestion sequence formed at the end of the DNA tag,
wherein,,
the cleavage sequence carries a restriction enzyme recognition site suitable for generating a cohesive end dT.
9. The DNA adapter of claim 8, wherein the cleavage sequence is an HphI specific recognition site.
10. The DNA adapter of claim 9, wherein the adapter is obtained by gradient annealing, dtp extension, and Hphl cleavage in that order.
11. Use of the DNA tag of claim 1 and the DNA linker of any one of claims 2-10 for detecting microscale variations.
12. A method of constructing a sequencing library, characterized in that a nucleic acid molecule to which the DNA adapter of any one of claims 2 to 10 is attached is subjected to enrichment treatment so as to obtain a sequencing library.
13. The method of claim 12, wherein the nucleic acid molecule is obtained by:
(1) Performing PCR amplification on the nucleic acid sample to be detected so as to obtain a nucleic acid sample fragment;
(2) Subjecting the nucleic acid sample fragment to 3' end addition A treatment;
(3) Ligating the DNA adaptor of any one of claims 2 to 10 to the nucleic acid sample fragment obtained in step (2) so as to obtain the nucleic acid molecule to which the DNA adaptor of any one of claims 2 to 10 is ligated.
14. The method of claim 12, wherein the enrichment treatment is achieved by PCR enrichment.
15. The method of claim 12, further comprising, prior to the enriching, purifying the nucleic acid molecule to which the DNA adapter of any one of claims 2-10 is attached.
16. The method of claim 15, wherein the purification treatment is performed by magnetic bead purification.
17. A sequencing library obtained by the method of any one of claims 12 to 16.
18. A method of sequencing comprising sequencing and data analysis of the sequencing library of claim 17.
19. The method of claim 18, wherein the sequencing is performed by a Hiseq2500 platform.
CN202310265316.2A 2017-04-27 2017-04-27 DNA tag and application thereof Pending CN116121243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310265316.2A CN116121243A (en) 2017-04-27 2017-04-27 DNA tag and application thereof

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/CN2017/082281 WO2018195878A1 (en) 2017-04-27 2017-04-27 Dna identifier and use thereof
CN201780083033.9A CN110168087B (en) 2017-04-27 2017-04-27 DNA tag and application thereof
CN202310265316.2A CN116121243A (en) 2017-04-27 2017-04-27 DNA tag and application thereof

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201780083033.9A Division CN110168087B (en) 2017-04-27 2017-04-27 DNA tag and application thereof

Publications (1)

Publication Number Publication Date
CN116121243A true CN116121243A (en) 2023-05-16

Family

ID=63917812

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201780083033.9A Active CN110168087B (en) 2017-04-27 2017-04-27 DNA tag and application thereof
CN202310265316.2A Pending CN116121243A (en) 2017-04-27 2017-04-27 DNA tag and application thereof

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201780083033.9A Active CN110168087B (en) 2017-04-27 2017-04-27 DNA tag and application thereof

Country Status (2)

Country Link
CN (2) CN110168087B (en)
WO (1) WO2018195878A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012008831A1 (en) * 2010-07-13 2012-01-19 Keygene N.V. Simplified de novo physical map generation from clone libraries
EP3058091B1 (en) * 2013-10-18 2020-03-25 The Broad Institute, Inc. Spatial and cellular mapping of biomolecules in situ by high-throughput sequencing
PL3089822T3 (en) * 2013-12-30 2022-09-19 Atreca, Inc. Analysis of nucleic acids associated with single cells using nucleic acid barcodes
CN104946639B (en) * 2015-07-01 2017-10-31 益善生物技术股份有限公司 Build the primer and method and kit of gene mutation sequencing library
WO2017044893A1 (en) * 2015-09-11 2017-03-16 The Broad Institute, Inc. Dna microscopy
CN106048009B (en) * 2016-06-03 2020-02-18 人和未来生物科技(长沙)有限公司 Label joint for ultralow frequency gene mutation detection and application thereof

Also Published As

Publication number Publication date
CN110168087A (en) 2019-08-23
WO2018195878A1 (en) 2018-11-01
CN110168087B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
AU2021200391B2 (en) Differential tagging of RNA for preparation of a cell-free DNA/RNA sequencing library
Salk et al. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations
US20210355537A1 (en) Compositions and methods for identification of a duplicate sequencing read
EP3763825B1 (en) High multiplex pcr with molecular barcoding
CN106048009B (en) Label joint for ultralow frequency gene mutation detection and application thereof
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
CN111471754B (en) Universal high-throughput sequencing joint and application thereof
WO2019144582A1 (en) Probe and method for high-throughput sequencing targeted capture target region used for detecting gene mutations as well as known and unknown gene fusion types
EP3702457A1 (en) Reagents, kits and methods for molecular barcoding
US20220364169A1 (en) Sequencing method for genomic rearrangement detection
WO2018148289A2 (en) Duplex adapters and duplex sequencing
CN113502287A (en) Molecular tag joint and construction method of sequencing library
CN113373524B (en) ctDNA sequencing tag joint, library, detection method and kit
JP2015500012A (en) Methods and kits for characterizing RNA in compositions
CN108359723B (en) Method for reducing deep sequencing errors
US20180100180A1 (en) Methods of single dna/rna molecule counting
CN110651050A (en) Targeted enrichment method and kit for detecting low-frequency mutation
CN110168087B (en) DNA tag and application thereof
CN107406891A (en) Pcr method
WO2018081666A1 (en) Methods of single dna/rna molecule counting
CN114774522A (en) Method and kit for constructing high fidelity sequencing library and application
CN114350782A (en) Method for positioning gene mutation site
CN114277111A (en) Method for introducing label sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination