CN110938625A - Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing - Google Patents

Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing Download PDF

Info

Publication number
CN110938625A
CN110938625A CN201811115000.0A CN201811115000A CN110938625A CN 110938625 A CN110938625 A CN 110938625A CN 201811115000 A CN201811115000 A CN 201811115000A CN 110938625 A CN110938625 A CN 110938625A
Authority
CN
China
Prior art keywords
bases
sequence
tag sequence
sequencing
generation sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811115000.0A
Other languages
Chinese (zh)
Inventor
黄标
骆备
黄金
吴传文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Bgi Medical Laboratory Co Ltd
Original Assignee
Wuhan Bgi Medical Laboratory Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Bgi Medical Laboratory Co Ltd filed Critical Wuhan Bgi Medical Laboratory Co Ltd
Priority to CN201811115000.0A priority Critical patent/CN110938625A/en
Publication of CN110938625A publication Critical patent/CN110938625A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A label sequence, a linker sequence, a kit and a third generation sequencing and database building method for third generation sequencing. The tag sequence consists of several consecutive bases, of which at least some are methylated bases. The linker sequence includes the tag sequence described above. The methylated tag sequence can be used for splitting third-generation sequencing data which cannot be split by a conventional splitting method, so that the splitting rate of the third-generation sequencing data is greatly improved, and the integral splitting rate of the third-generation sequencing data can reach about 85%.

Description

Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing
Technical Field
The invention relates to the technical field of third-generation sequencing, in particular to a label sequence, a linker sequence, a kit and a third-generation sequencing and database building method for third-generation sequencing.
Background
The third generation sequencing (such as Pacbio platform sequencing) is based on the principle of sequencing while synthesizing, and uses an SMRT (single molecule real-time fluorescence sequencing technology) chip as a carrier to perform sequencing reaction. Genomic DNA was fragmented into many small fragments for sequencing, and droplets were made and dispersed into different ZMW (zero-mode waveguides) nanopores. When the polymerization reaction at the bottom of the ZMW nanopore occurs, nucleotides labeled with different fluorescence are retained by polymerase in the fluorescence detection region of the pore, and the type of the base composition of the template DNA can be determined according to the type of fluorescence and the duration of the fluorescence.
Each SMRT chip on the Pacbio platform has 100 ten thousand ZMW sequencing wells, which can generate 5-15G data on average, but for species with smaller genomes, the amount of data required is small (data requirement is less than 1G), and each sample is often added with different molecular tags (also called "tag sequences"), mixed sequencing is performed, and finally the sequence of each sample is split through the tag sequences.
In order to fully utilize the sequencing data of the second generation platform, scientists design a DNA tag sequence (barcode), connect known tag sequences at two ends of a DNA library, then mix samples with different tag sequences together for sequencing, and split the sequencing data into tag sequences to corresponding samples. At present, since second-generation sequencing can only read and identify four bases of ATGC in a sequencing process, scientists design a tag sequence by randomly using the four bases of ATGC, and the number of bases of the tag sequence is different for different sequencing platforms. For example, the BGIseq500 platform typically incorporates 10 base tag sequences (ATCG, four bases randomly distributed) at each end of the library.
Third generation sequencing refers to the second generation sequencing platform tag sequence design method in designing tag sequences, and uses ATGC four bases to design tag sequences (for example, tags with length of 16 bases) to be connected to two ends of a library, and then splits sequencing data according to the tag sequences.
As shown in FIG. 1, the linker for third generation sequencing is a circular linker, with a 16 base tag sequence (barcode) located between the insert (insert) and the linker. When the insert of the library is short, the library can be read repeatedly, the tag sequence can be identified repeatedly, and the corresponding information of the sublibrary can be split according to the tag sequences at the two ends of the library. When the library is long, the polymerase may not read the tag sequence, and most of the data cannot be split into each sub-library, which results in data waste. The length of the insert fragment of the third-generation sequencing library is approximately 5-8kb, the advantage of the third-generation sequencing read length (the read length is 15-20kb at present) is not fully exerted, and meanwhile, due to the problem of the design of the tag sequence, the resolution rate is approximately about 60-70%, 30-40% of data waste is caused, the sequencing cost is invisibly increased, and the development of a third-generation sequencing platform is limited.
Disclosure of Invention
The invention provides a tag sequence, a linker sequence, a kit and a third-generation sequencing database building method for improving the resolution ratio of third-generation sequencing data.
According to a first aspect, in one embodiment there is provided a tag sequence for third generation sequencing, the tag sequence consisting of a plurality of contiguous bases, at least some of which are methylated bases.
In a preferred embodiment, all of at least one type of the bases are methylated bases; preferably, only one type of base among the above bases is all a methylated base; more preferably, all of the adenine bases in the above bases are 6-methyladenine (6 mA); alternatively, all of the cytosine bases in the above bases may be 4-methylcytosine (4mC) or 5-methylcytosine (5 mC).
Preferably, the tag sequence consists of 6 to 20 bases; preferably, the tag sequence consists of 16 bases.
According to a second aspect, in one embodiment there is provided a third generation sequencing adaptor sequence comprising a tag sequence and a further sequence linked to said tag sequence, said tag sequence consisting of a plurality of contiguous bases, at least some of said bases being methylated bases.
In a preferred embodiment, at least one type of all bases in the bases of the tag sequence is a methylated base; preferably, only one type of bases among the bases of the above tag sequence are all methylated bases; more preferably, all of the adenine bases in the tag sequence are 6-methyladenine (6 mA); alternatively, all cytosine bases in the tag sequence may be 4-methylcytosine (4mC) or 5-methylcytosine (5 mC).
Preferably, the tag sequence consists of 6 to 20 bases; preferably, the tag sequence consists of 16 bases.
According to a third aspect, in one embodiment there is provided a kit for third generation sequencing, the kit comprising the linker sequence of the second aspect; optionally, a reagent component for library construction is also included.
According to a fourth aspect, there is provided in one embodiment the use of a tag sequence of the first aspect or an adaptor sequence of the second aspect in the construction of a third generation sequencing library.
According to a fifth aspect, in one embodiment there is provided a method of third generation sequencing library construction, the method comprising ligating nucleic acid fragments to be ligated using an adaptor sequence of the second aspect to form a sequencing library carrying the adaptor sequence described above.
As a preferred technical solution, the method further comprises: before the connection of the adaptor sequence, carrying out end repair or end repair on the nucleic acid fragment to be connected and adding A base reaction to form a nucleic acid fragment suitable for being connected with the adaptor sequence; after ligating the above linker sequences, the unligated nucleic acid fragments and unligated linker sequences are digested with the digestive enzymes.
According to a sixth aspect, there is provided in one embodiment a third generation sequencing method, the method comprising:
third generation sequencing library construction, which comprises using the adaptor sequence of the second aspect to connect with the nucleic acid fragment to be connected to form a sequencing library with the adaptor sequence; and
and (3) carrying out third-generation on-machine sequencing on the sequencing library.
As a preferred technical scheme, the third-generation machine sequencing is Pacbio platform sequencing.
The methylated tag sequence can be used for splitting third-generation sequencing data which cannot be split by a conventional splitting method, so that the splitting rate of the third-generation sequencing data is greatly improved, and the integral splitting rate of the third-generation sequencing data can reach about 85%.
Drawings
FIG. 1 is a structural diagram of a third generation sequencing library according to the present invention, in which a circular linker is connected to both ends, and a tag sequence (barcode) in the linker is located between an insert (insert) and the circular linker.
FIG. 2 is a schematic diagram of the base reading principle of the third generation sequencing using a tag sequence containing 6-methyladenine (6mA) in the example of the present invention, with Time (Time) on the abscissa and fluorescence intensity (fluorescence intensity) on the ordinate; the figure shows that when 6mA of methylated modified bases are encountered (upper panel), the light intensity is longer in duration and weaker; when no modification is present on the base (lower panel), the intensity is relatively strong and the duration is short.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The invention relates to the technical field of third-generation single molecule sequencing, and aims to increase the integral resolution of an effective sample to over 85% by designing a methylated linker.
As the third-generation sequencing is a single-molecule real-time fluorescence sequencing technology, the fluorescence intensity and duration of polymerase can be detected and directly read to the methylation modification in the sequencing process. In the process of library construction and sequencing, any library discharge is not required, and the method is a method for directly obtaining methylation modification.
Specifically, the third generation of single molecule sequencing technology can directly detect base modification by using the kinetic change of polymerase reaction in the sequencing process while acquiring the ultra-long read length. The principle is that the polymerase has a time period for synthesizing each base, and when the template base is modified, the polymerase slows down as the polymerase encounters a road block in the driving process, so that the IPD (interpulse duration) value, which is the ratio of the distance between two adjacent pulse peaks with the modified base and the distance of the reference sequence, is larger than 1, and thus the modification at the position can be inferred. As shown in FIG. 2, the A base to be base-paired with T on the template in the upper graph is 6-methyladenine (6mA), and the polymerase will slow down when encountering the 6mA base, i.e., the graph shows that the distance between two adjacent pulse peaks of A and T becomes longer; in the lower panel, the A base to be base-paired with T on the template is an unmodified base, and thus the distance between two adjacent pulse peaks is normal. Therefore, it can be judged whether the site is a methylation modification site or not according to the ratio of the distance between adjacent pulse peaks to the distance of the reference sequence.
It should be noted that any methylation modification of any base can be used in the present invention to achieve the same purpose and achieve the same technical effect based on the principle of the present invention that methylation modification on the base slows down polymerase enzyme like roadblock encountered during driving. Therefore, the methylation modification of a base in the present invention is not limited to 6-methyladenine (6mA), and may be a methylation modification such as 4-methylcytosine (4mC) or 5-methylcytosine (5 mC). Meanwhile, methylation modification of any one or more of the four bases of ATGC can be used in the invention.
Accordingly, in one embodiment of the present invention, a tag sequence for third generation sequencing is provided, the tag sequence consisting of a plurality of consecutive bases, at least some of which are methylated bases.
In the present invention, the "tag sequence" refers to a molecular tag that can distinguish different sample sources, and in the third generation of single molecule sequencing technology, refers to a molecular tag that can distinguish each single molecule sample source.
In the present invention, there is no limitation on the number and type of methylated bases, and all bases may be methylated or only some bases may be methylated; all base types (e.g., four bases of ATGC) may be methylated, or only one, two or three bases may be methylated. In a preferred embodiment, AT least one (i.e., one, two or three) types of bases are all methylated bases, for example, all A bases are methylated bases, all T bases are methylated bases, all G bases are methylated bases, all C bases are methylated bases, all AT bases are methylated bases, all GC bases are methylated bases, or all ATG bases are methylated bases. In a preferred embodiment of the present invention, only one type of the above-mentioned bases is all methylated bases, and more preferably, all of the adenine bases (A) in the bases are 6-methyladenine (6 mA); alternatively, the cytosine bases in the base are all 4-methylcytosine (4mC) or 5-methylcytosine (5 mC).
In the present invention, the length of the tag sequence is not particularly limited, and usually the tag sequence consists of 6 to 20 bases. In a preferred embodiment of the invention, the tag sequence consists of 16 bases.
In one embodiment of the present invention, a third generation sequencing adaptor sequence is provided, the adaptor sequence comprising a tag sequence and other sequences linked to the tag sequence, the tag sequence consisting of a plurality of contiguous bases, at least some of the bases being methylated bases.
It is to be noted that all the above description regarding the tag sequence applies to the tag sequence portion in the linker sequence. In addition, the "other sequence" linked to the tag sequence may be any sequence, preferably a sequencing platform linker sequence, more preferably a Pacbio sequencing platform linker sequence, which belongs to sequences well known in the art. The connection and position relationship between the "other sequence" and the tag sequence are not particularly limited, the tag sequence may be located at both ends of the other sequence to form a reverse complementary structure, that is, two reverse complementary tag sequences are respectively connected at both ends of the other sequence between the two reverse complementary tag sequences, and at least one of the two reverse complementary tag sequences is the methylation modification described above, preferably, both the two reverse complementary tag sequences are the methylation modification described above. In other embodiments, the tag sequence may be located within the other sequence, i.e., the tag sequence separates the other sequence into two or more segments, and the tag sequence may or may not form reverse complementarity.
In one embodiment of the invention, a kit for third generation sequencing is provided, the kit comprising a linker sequence of the invention.
In addition to the linker sequences of the invention, the kits of the invention may also include components of reagents for the construction of libraries, such as enzymes for disrupting genomic DNA, e.g., Tn5 transposase, and the like; enzymes for repairing the disrupted DNA, such as T4 polynucleotide kinase and the like; a ligase for ligating the linker sequence and the nucleic acid fragment to be ligated, such as T4 DNA ligase and the like; a digestive enzyme for digesting unligated nucleic acid fragments and unligated linker sequences, and the like.
In one embodiment of the invention there is provided the use of a tag sequence of the invention or an adaptor sequence of the invention in the construction of a third generation sequencing library, in particular the use of a tag sequence of the invention in the preparation of an adaptor sequence of a third generation sequencing library.
In one embodiment, the invention provides a third generation sequencing library construction method, which comprises using the adaptor sequence of the invention to ligate with the nucleic acid fragment to be ligated to form a sequencing library with the adaptor sequence. The nucleic acid fragment to be ligated generally refers to a DNA fragment, and the present invention is preferably a long fragment, for example, a fragment of 15 to 20kb or 5 to 8kb, and such a DNA fragment may be a fragment obtained by disrupting the genome, or a fragment obtained by inverting mRNA, or the like.
For the disrupted DNA fragment, prior to ligation of the linker sequence, it may be end-repaired or end-repaired and subjected to an A base reaction to form a nucleic acid fragment suitable for ligation to the linker sequence. Meanwhile, in order to avoid the effect of the unligated nucleic acid fragment and unligated linker sequence on the subsequent sequencing, after the linker sequence is ligated, the unligated nucleic acid fragment and unligated linker sequence may be digested with a digestive enzyme.
In one embodiment of the present invention, a third generation sequencing method is provided, the method comprising: third generation sequencing library construction, which comprises the steps of connecting the adaptor sequence of the invention with the nucleic acid fragment to be connected to form a sequencing library with the adaptor sequence; and performing third generation on-machine sequencing of the sequencing library, such as Pacbio platform sequencing. And then performing data splitting on the sequencing data according to the sample source.
The tag sequence on the existing third-generation sequencing joint is not subjected to methylation modification, and is split according to a conventional method, and the splitting rate can only reach about 65%. The conventional splitting method generally comprises analyzing the off-line data of the whole chip, and according to the information of the sub-sample library corresponding to the tag sequences at two ends (generally 16bp), if the tag sequence information at one end is read, the system directly regards the tag sequence information as the unmatched sequencing read length (reads). However, in the sequencing process, the activity of sequencing polymerase is often insufficient to read the length of the whole insert, and only one end of the tag sequence information can be read, and in this case, the sequencing data can be classified as invalid information, so that the resolution rate of the sample is only about 65%.
According to the method, a tag sequence on a third generation sequencing joint is subjected to methylation modification, after sequencing, data is firstly split according to a conventional method, about 65% of data can be split, and about 35% of data can be classified as 'invalid information'. Then, according to the methylation modified tag sequence, the sequencing read length which only reads the tag sequence at one end is screened, and the data is split again according to the methylation sites, so that the originally wasted 'invalid information' data achieves the effect of secondary utilization, and the integral splitting rate can reach about 85% through secondary splitting. Specifically, in the case of using 6mA on the tag sequence, when 6mA is present on the DNA, the color and duration of fluorescence will differ from those of the conventional unmodified base. When a synthetic tag sequence is designed, 6mA modification is added to all (or part) A bases, the positions and the number of the A bases on each tag sequence are different, namely each tag sequence has own unique 6mA information, and the information which cannot be correctly split originally can be reused according to the number and the position information of 6 mA. Similarly, for any methylated base, the position and number of methylated bases on each tag sequence are different, corresponding to each tag sequence having its own unique methylated base information.
The invention can be applied to various products of the current third generation sequencing platform (Pacbio platform), including products of Denovo animals and plants, Denovo microorganisms, 16S full-length sequencing, full-length transcription group, PCR products and the like. The invention can make the third generation sequencing platform more flexible in sequencing and can reduce the cost of library construction and sequencing, and the market prospect is wider.
The technical solutions of the present invention are described in detail below by way of comparative examples and examples, which should be understood as being merely exemplary and not limiting the scope of the present invention.
Comparative example 1
Commercial library building kit used: 100, 991, 900, SMRTbellTMTemplate preparation kit (500bp-20Kb) brand @ PACIFIC BIOSCIENCES/C/Specification&10 reactions/cassettes ((PACIFIC BIOSCIENCES, lot: 0101995217)). Purifying magnetic beads: novozam VAHTSTM DNA Clean Beads (lot: N411).
The experimental procedure was as follows:
(1) the tag sequence-bearing linkers published by the official Pacbio platform were synthesized as shown in Table 1 below, in which the underlined portions are the tag sequences and the other portions are the sequencing platform linker sequence portions.
TABLE 1 linkers with tag sequences
Figure BDA0001810318180000091
Figure BDA0001810318180000101
The tagged linkers in Table 1 were annealed in the annealing system and annealing conditions shown in Table 2 below.
TABLE 2 annealing System and annealing reaction conditions
Figure BDA0001810318180000102
(2) The genomic DNA was disrupted using Covaris g-tube, samples were purified using 0.45-fold Pacbio magnetic beads, and then assayed for concentration and fragment distribution.
(3) The cleaved DNA was digested with enzyme 7.
(4) The nicks in the DNA were repaired with a damage-repairing enzyme and the sample was then purified using 0.45-fold Pacbio magnetic beads.
(5) Ligation was performed using T4 DNA ligase, 10X temp buffer, ATP low and annealed tagged sequence linkers.
(6) After linker ligation was completed, T4 DNA ligase was inactivated by high temperature, and then digestive enzyme 3 and digestive enzyme 7 were added and purified with 0.45Pacbio magnetic beads.
(7) And (4) performing concentration and fragment size quality inspection on the products after separation and purification, and then mixing according to rules.
(8) After ligation of the sequencing primers and sequencing polymerase, Pacbio platform sequencing was performed.
Data resolution was performed after sequencing, and the results are shown in table 3 below:
TABLE 3
Figure BDA0001810318180000111
Table 3 the data shows: the non-resolved invalid data was 32.7% and the resolution was 67.3% for 4 bacterial samples, which were run according to the published protocol of conventional Pacbio.
Example 1
Commercial library building kit used: 100, 991, 900, SMRTbellTMTemplate preparation kit (500bp-20Kb) brand @ PACIFIC BIOSCIENCES/C/Specification&10 reactions/cassettes ((PACIFIC BIOSCIENCES, lot: 0101995217)). Purifying magnetic beads: novozam VAHTSTM DNA Clean Beads (lot: N411).
The experimental procedure was as follows:
(1) a linker with a methylated tag sequence was synthesized, and the linker sequence was shown in Table 1 and was different from the comparative examples in that the A base in the tag sequence (underlined portion) was a 6mA modified base; and annealing of the joint was completed according to the annealing system and the annealing reaction conditions shown in table 2.
(2) The genomic DNA was disrupted using Covaris g-tube, and the sample was purified using 0.6-fold Novozam magnetic beads, and then assayed for concentration and fragment distribution.
(3) The cleaved DNA was digested with enzyme 7.
(4) The gap above the DNA was repaired with a damage repair enzyme and then the DNA was purified with 0.6 fold nuozan magnetic beads.
(5) Ligation was performed using T4 DNA ligase, 10X temp buffer, ATP low and annealed tagged sequence linkers.
(6) After the linker ligation was completed, the T4 DNA ligase was inactivated by high temperature, then the digestion enzyme 3 and the digestion enzyme 7 were added, and after purifying the sample with 0.6 times Novozam magnetic beads, the fragment sorting and its purification were performed.
(7) And (4) performing concentration and fragment size quality inspection on the products after separation and purification, and then mixing according to rules.
(8) After ligation of the sequencing primers and sequencing polymerase, Pacbio platform sequencing was performed.
Data resolution was performed after sequencing, with the results shown in table 4 below:
TABLE 4
Figure BDA0001810318180000121
Figure BDA0001810318180000131
Table 4 the data shows: in 4 bacterial samples, the tag sequence modified by adding 6mA methylation is used, the data which cannot be resolved is 12%, so the effective data resolution is 88%.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.
SEQUENCE LISTING
<110> Wuhanhua university medical laboratory Co., Ltd
<120> tag sequence for third generation sequencing, linker sequence, kit and third generation sequencing and library building method
<130>18I26828
<160>16
<170>PatentIn version 3.3
<210>1
<211>77
<212>DNA
<213> Artificial sequence
<400>1
cacatatcag agtgcgatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tcgcactctg atatgtg 77
<210>2
<211>77
<212>DNA
<213> Artificial sequence
<400>2
acacacagac tgtgagatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tctcacagtc tgtgtgt 77
<210>3
<211>77
<212>DNA
<213> Artificial sequence
<400>3
cacgcacaca cgcgcgatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tcgcgcgtgt gtgcgtg 77
<210>4
<211>77
<212>DNA
<213> Artificial sequence
<400>4
acagtcgagc gctgcgatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tcgcagcgct cgactgt 77
<210>5
<211>77
<212>DNA
<213> Artificial sequence
<400>5
acacacgcga gacagaatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
ttctgtctcg cgtgtgt 77
<210>6
<211>77
<212>DNA
<213> Artificial sequence
<400>6
acgcgctatc tcagagatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tctctgagat agcgcgt 77
<210>7
<211>77
<212>DNA
<213> Artificial sequence
<400>7
acactagatc gcgtgtatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tacacgcgat ctagtgt 77
<210>8
<211>77
<212>DNA
<213> Artificial sequence
<400>8
ctcactacgc gcgcgtatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tacgcgcgcg tagtgag 77
<210>9
<211>77
<212>DNA
<213> Artificial sequence
<400>9
cgcatgacac gtgtgtatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tacacacgtg tcatgcg 77
<210>10
<211>77
<212>DNA
<213> Artificial sequence
<400>10
catagagaga tagtatatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tatactatct ctctatg 77
<210>11
<211>77
<212>DNA
<213> Artificial sequence
<400>11
cacacgcgcg ctatatatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tatatagcgc gcgtgtg 77
<210>12
<211>77
<212>DNA
<213> Artificial sequence
<400>12
tcacgtgctc actgtgatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tcacagtgag cacgtga 77
<210>13
<211>77
<212>DNA
<213> Artificial sequence
<400>13
acacactcta tcagatatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tatctgatag agtgtgt 77
<210>14
<211>77
<212>DNA
<213> Artificial sequence
<400>14
cacgacacga cgatgtatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tacatcgtcg tgtcgtg 77
<210>15
<211>77
<212>DNA
<213> Artificial sequence
<400>15
ctatacatag tgatgtatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tacatcacta tgtatag 77
<210>16
<211>77
<212>DNA
<213> Artificial sequence
<400>16
cactcacgtg tgatatatct ctctcttttc ctcctcctcc gttgttgttg ttgagagaga 60
tatatcacac gtgatgt 77

Claims (10)

1. A tag sequence for third generation sequencing, wherein said tag sequence consists of a plurality of contiguous bases, at least some of said bases being methylated bases.
2. The tag sequence of claim 1, wherein at least one type of the bases are all methylated bases; preferably, only one type of base among the bases is all methylated bases; more preferably, all of the adenine bases in the base are 6-methyladenine (6 mA); alternatively, the cytosine bases in the base are all 4-methylcytosine (4mC) or 5-methylcytosine (5 mC).
3. The tag sequence of claim 1, wherein the tag sequence consists of 6 to 20 bases; preferably, the tag sequence consists of 16 bases.
4. An adaptor sequence for third generation sequencing, wherein the adaptor sequence comprises a tag sequence and other sequences linked to the tag sequence, the tag sequence consists of a plurality of consecutive bases, and at least some of the bases are methylated bases.
5. The linker sequence of claim 4 wherein at least one type of bases in the tag sequence are all methylated bases; preferably, only one type of bases of the tag sequence are all methylated bases; more preferably, all of the adenine bases in the tag sequence are 6-methyladenine (6 mA); alternatively, the cytosine bases in the tag sequence are all 4-methylcytosine (4mC) or 5-methylcytosine (5 mC).
6. The linker sequence of claim 4 wherein the tag sequence consists of 6 to 20 bases; preferably, the tag sequence consists of 16 bases.
7. A kit for third generation sequencing, comprising the linker sequence of any one of claims 4 to 6; optionally, a reagent component for library construction is also included.
8. Use of a tag sequence according to any one of claims 1 to 3 or an adaptor sequence according to any one of claims 4 to 6 in the construction of a third generation sequencing library.
9. A third generation sequencing library construction method, wherein the method comprises the steps of using the adaptor sequence of any one of claims 4-6 to connect with a nucleic acid fragment to be connected to form a sequencing library with the adaptor sequence;
preferably, the method further comprises: before the adaptor sequence is connected, performing end repair or end repair on the nucleic acid fragments to be connected and adding A base for reaction to form nucleic acid fragments suitable for being connected with the adaptor sequence; after ligating the adaptor sequences, the unligated nucleic acid fragments and unligated adaptor sequences are digested with a digestive enzyme.
10. A third generation sequencing method, the method comprising:
a third generation sequencing library construction comprising ligation of the adaptor sequence of any one of claims 4-6 to the nucleic acid fragments to be ligated to form a sequencing library with the adaptor sequence; and
performing third-generation on-machine sequencing on the sequencing library;
preferably, the third generation on-machine sequencing is Pacbio platform sequencing.
CN201811115000.0A 2018-09-25 2018-09-25 Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing Pending CN110938625A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811115000.0A CN110938625A (en) 2018-09-25 2018-09-25 Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811115000.0A CN110938625A (en) 2018-09-25 2018-09-25 Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing

Publications (1)

Publication Number Publication Date
CN110938625A true CN110938625A (en) 2020-03-31

Family

ID=69904530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811115000.0A Pending CN110938625A (en) 2018-09-25 2018-09-25 Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing

Country Status (1)

Country Link
CN (1) CN110938625A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112760371A (en) * 2021-03-09 2021-05-07 上海交通大学 Primer, kit and analysis method for detecting MUC1 gene mutation
CN113403388A (en) * 2021-07-21 2021-09-17 上海交通大学 Primer, kit and analysis method for ADTKD gene mutation detection
CN116625999A (en) * 2023-05-17 2023-08-22 中国科学院苏州生物医学工程技术研究所 Method for evaluating effective sample loading of zero-mode waveguide hole and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107058300A (en) * 2016-12-22 2017-08-18 哈尔滨博泰生物科技有限公司 The tape label joint built for single-molecule sequencing library template and its application
CN107475403A (en) * 2017-09-14 2017-12-15 深圳因合生物科技有限公司 The analysis method of the method for detection Circulating tumor DNA, kit and its sequencing result from peripheral blood dissociative DNA

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107058300A (en) * 2016-12-22 2017-08-18 哈尔滨博泰生物科技有限公司 The tape label joint built for single-molecule sequencing library template and its application
CN107475403A (en) * 2017-09-14 2017-12-15 深圳因合生物科技有限公司 The analysis method of the method for detection Circulating tumor DNA, kit and its sequencing result from peripheral blood dissociative DNA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JOHN BEAULAURIER等: "Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation", 《NAT BIOTECHNOL》 *
曹晨霞等: "第三代测序技术在微生物研究中的应用", 《微生物学通报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112760371A (en) * 2021-03-09 2021-05-07 上海交通大学 Primer, kit and analysis method for detecting MUC1 gene mutation
CN113403388A (en) * 2021-07-21 2021-09-17 上海交通大学 Primer, kit and analysis method for ADTKD gene mutation detection
CN116625999A (en) * 2023-05-17 2023-08-22 中国科学院苏州生物医学工程技术研究所 Method for evaluating effective sample loading of zero-mode waveguide hole and application thereof
CN116625999B (en) * 2023-05-17 2023-11-28 中国科学院苏州生物医学工程技术研究所 Method for evaluating effective sample loading of zero-mode waveguide hole and application thereof

Similar Documents

Publication Publication Date Title
US20220213533A1 (en) Method for generating double stranded dna libraries and sequencing methods for the identification of methylated
CN110036117B (en) Method for increasing throughput of single molecule sequencing by multiple short DNA fragments
CN103088433B (en) Construction method and application of genome-wide methylation high-throughput sequencing library and
US8318434B2 (en) Method for introducing a sample specific DNA tag into a plurality of DNA fragments from a plurality of samples
CN110938625A (en) Tag sequence, linker sequence, kit and third-generation sequencing and library building method for third-generation sequencing
CN108138175B (en) Reagents, kits and methods for molecular barcode encoding
CN110079592B (en) High throughput sequencing-targeted capture of target regions for detection of genetic mutations and known, unknown gene fusion types
CN108611398A (en) Genotyping is carried out by new-generation sequencing
AU2005212393B2 (en) CpG-amplicon and array protocol
US20120316075A1 (en) Sequence preserved dna conversion for optical nanopore sequencing
WO2007087312A2 (en) Molecular counting
US20240294901A1 (en) Sequencing method
CN105734048A (en) PCR-free sequencing library preparation method for genome DNA
CN113668068B (en) Genome methylation library and preparation method and application thereof
EP2531610A1 (en) Complexitiy reduction method
CN110869515B (en) Sequencing method for genome rearrangement detection
EP3174996A1 (en) Improved nucleic acid sample analysis using convertible tags
CN113574181A (en) Nucleic acid sequence for direct RNA library construction, method for direct construction of sequencing library based on RNA sample and application
AU733924B2 (en) Characterising DNA
JP3789317B2 (en) Isometric primer extension method and kit for detecting and quantifying specific nucleic acids
CN105002570A (en) Method for one-time preparation of n-size-fragment mate pair library
US20190153438A1 (en) Methods and compositions for preparing polynucleotide libraries
CN107794573B (en) Method for constructing DNA large fragment library and application thereof
EP3652346A1 (en) Assay methods and compositions for detecting contamination of nucleic acid identifiers
CN111455033A (en) Technology sequencing platform based on Illumina

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40015745

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20200331

RJ01 Rejection of invention patent application after publication