CN112048543A - Novel sequencing method of plasmid or DNA fragment with high flux, low cost and high base accuracy - Google Patents

Novel sequencing method of plasmid or DNA fragment with high flux, low cost and high base accuracy Download PDF

Info

Publication number
CN112048543A
CN112048543A CN202010503754.4A CN202010503754A CN112048543A CN 112048543 A CN112048543 A CN 112048543A CN 202010503754 A CN202010503754 A CN 202010503754A CN 112048543 A CN112048543 A CN 112048543A
Authority
CN
China
Prior art keywords
sequence
dna
sequencing
transposase
primer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010503754.4A
Other languages
Chinese (zh)
Inventor
赖锦盛
高翔
宋伟彬
赵海铭
莫伟鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Publication of CN112048543A publication Critical patent/CN112048543A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Abstract

The invention discloses a novel sequencing method of plasmid or DNA fragment with high flux, low cost and high base accuracy. The construction of the sequencing library used in the sequencing method disclosed by the invention comprises the following steps: 1) breaking the target DNA by using a transposase complex to obtain a DNA breaking product; the transposase complex comprises transposase and a transposable element, wherein the transposable element consists of joints with names of A and B respectively; a consists of a1 and c1, a1 is obtained by connecting a sequence named as primer 1, a barcode sequence named as barcode sequence 1 and a recognition sequence of transposase in sequence; c1 is the complement of the recognition sequence; b consists of B1 and c1, and B1 is obtained by connecting a sequence named as a primer 2, a barcode sequence named as a barcode sequence 2 and an identification sequence of transposase in sequence; 2) and carrying out PCR amplification on the DNA breaking product to obtain a PCR product, namely the sequencing library. The method of the invention has high application value.

Description

Novel sequencing method of plasmid or DNA fragment with high flux, low cost and high base accuracy
Technical Field
The invention relates to a novel sequencing method of plasmid or DNA fragment with high flux, low cost and high accuracy in the technical field of high-throughput sequencing.
Background
DNA sequencing is crucial for many aspects of biological research as well as medical diagnostics. Traditional Sanger sequencing was developed 40 years ago, and subsequently, further optimization and automation of Sanger sequencing enabled the initiation and completion of draft genomic sequences in humans, the model plant arabidopsis thaliana, and many other important species such as chimpanzees, mice, rice, sorghum, and maize. In addition, extensive EST and full-length cDNA cloning has been accomplished in several important species using Sanger sequencing. However, the cost of Sanger sequencing is still relatively high, its throughput is far from being met, and even today it is still widely used to obtain the sequence of any given single plasmid clone or DNA fragment. For example, for association analysis studies, target genes in hundreds to thousands of individuals were sequenced by Sanger sequencing, or plasmid clones were sequenced.
In the past decade, massive parallel or next generation (second) sequencing, represented by the introduction of Illumina technology, has made great progress, with sequencing costs per base pair decreasing rapidly. Second generation sequencing has generalized genomic analysis to the "personal genome" era, but it also has some limitations, i.e., short sequencing length, which results in poor or ambiguous alignments for repeat elements, limited ability to span InDel or Structural Variants (SVs), and alignment errors. In recent years, single molecule sequencing platforms have emerged, as typified by PacBio and Nanopore sequencing, which, although of lower sequencing base quality, can yield longer reads. These second and third generation sequencing technologies are still under active development. Notably, the longest read length for Nanopore sequencing has been reported to exceed 1Mb in humans. Despite their advantages, all next generation sequencing technologies share the common feature that millions of DNA fragments are sequenced simultaneously and the identity of individual fragments cannot be traced, however, there is still a great need in many research and medical diagnostics, requiring precise tracing of each clone or DNA fragment.
Therefore, there is an urgent need for a sequencing strategy that has lower price, higher throughput compared to traditional Sanger sequencing, and longer sequence length compared to second generation sequencing.
Disclosure of Invention
A novel sequencing method (HITAC-seq) for high-throughput, low-cost, high-accuracy plasmids or DNA fragments is developed, which comprises a novel barcode strategy and a corresponding sequence assembly tool. The sequencing system can realize high-throughput, low-cost and high-base-accuracy sequence determination. Based on the innovative bar code technology, the sequencing system can efficiently distinguish and identify sequencing samples of each plasmid or DNA fragment and generate corresponding sequence determination results aiming at the samples of each plasmid or DNA fragment. The method can be applied to the fields of genotype detection of agricultural materials, genetic disease site detection in the medical field, precise medical treatment, individual site diagnosis and the like.
The invention firstly provides a method for introducing an index sequence into a sequencing fragment by using Tn5 transposase to construct a sequencing library and obtaining a sample single sequence by assembling, wherein the method comprises A1) and A2):
A1) breaking the target DNA by using a transposase complex to obtain a DNA breaking product;
the transposase complex comprises a transposase and a transposable element, wherein the transposable element consists of joints with names of A and B respectively;
the A is composed of single-stranded DNA with names of a1 and c1, the a1 is obtained by sequentially connecting a sequence with a name of primer 1, a barcode sequence with a name of barcode sequence 1 and an identification sequence of the transposase, and the barcode sequence 1 is a random sequence; c1 is the complement of the recognition sequence;
the B consists of single-stranded DNA named B1 and the c1, the B1 is obtained by sequentially connecting a sequence named as a primer 2, a barcode sequence named as a barcode sequence 2 and a recognition sequence of the transposase, and the barcode sequence 2 is a random sequence;
A2) and carrying out PCR amplification on the DNA breaking product to obtain a PCR product, wherein the PCR product is a sequencing library.
The invention also provides another method for constructing a sequencing library, which is the following I or II:
I. a method for constructing a sequencing library, wherein the DNA of the sample to be tested is 1, and the method comprises the following B1) and B2):
B1) breaking the DNA of a sample to be detected by using a transposase complex to obtain a DNA breaking product;
the transposase complex comprises a transposase and a transposable element, wherein the transposable element consists of joints with names of A and B respectively;
the A is composed of single-stranded DNA with names of a1 and c1, the a1 is obtained by sequentially connecting a sequence with a name of primer 1, a barcode sequence with a name of barcode sequence 1 and an identification sequence of the transposase, and the barcode sequence 1 is a random sequence; c1 is the complement of the recognition sequence;
the B consists of single-stranded DNA named B1 and the c1, the B1 is obtained by sequentially connecting a sequence named as a primer 2, a barcode sequence named as a barcode sequence 2 and a recognition sequence of the transposase, and the barcode sequence 2 is a random sequence;
B2) carrying out PCR amplification on the DNA breaking product to obtain a PCR product, wherein the PCR product is a sequencing library;
II. A method for constructing a sequencing library, wherein the DNA of a sample to be tested is n, and n is a natural number which is more than or equal to 1, and the method comprises the following steps of C1) or C2):
C1) respectively breaking the DNA of m samples to be tested by using n transposase complexes to obtain n DNA breaking products; m is a natural number less than or equal to n;
the transposase complex comprises a transposase and a transposable element, wherein the transposable element consists of joints with names of A and B respectively;
the A is composed of single-stranded DNA with names of a1 and c1, the a1 is obtained by sequentially connecting a sequence with a name of primer 1, a barcode sequence with a name of barcode sequence 1 and an identification sequence of the transposase, and the barcode sequence 1 is a random sequence; c1 is the complement of the recognition sequence;
the B consists of single-stranded DNA named B1 and the c1, the B1 is obtained by sequentially connecting a sequence named as a primer 2, a barcode sequence named as a barcode sequence 2 and a recognition sequence of the transposase, and the barcode sequence 2 is a random sequence;
in the n transposase complexes, at least one of the barcode sequence 1 and the barcode sequence 2 is different between any two transposase complexes;
C2) the following C2a) or C2 b):
c2a) mixing the n DNA cleavage products to obtain a mixed cleavage product; carrying out PCR amplification on the mixed interrupt product to obtain a PCR product, wherein the PCR product is a sequencing library;
c2b) carrying out PCR amplification on the n DNA interruption products respectively to obtain n PCR products; and mixing the n PCR products to obtain a mixture, wherein the mixture is a sequencing library.
In II, the transposase complexes used for the DNAs of the respective samples to be tested are different.
The primer 1 or its complementary sequence and the primer 2 or its complementary sequence can be used to amplify the sequencing library.
In the above method, the transposase may be Tn5 transposase. The amino acid sequence of the Tn5 transposase can be sequence 1 in the sequence table.
The ratio of the transposase complex to the target DNA may be (12.5ng:1ng) - (0.1ng:1 pg). Further, the ratio of the transposase complex to the target DNA may be (12.5ng:1ng) to (0.5ng:10 pg). Furthermore, the ratio of the transposase complex to the target DNA may be (12.5ng:1ng) to (2.5ng:100 pg).
The ratio of the transposase complex to the target DNA may be specifically 12.5ng:1 ng.
In II, the ratio of the transposase complex to the DNA of the sample to be sequenced can be specifically the ratio of each transposase complex to the DNA of each sample to be sequenced.
The reaction temperature for the disruption may be 37 ℃ to 55 ℃. The reaction time for interruption is 5-30 min, such as 55 ℃ and 10 min. Or 37 ℃ for 30 min.
The disruption reaction can be stopped with 5 × stop buffer. The 5 Xstop buffer solution is composed of a solvent and a solute, wherein the solvent is water, and the solute and the concentration thereof are respectively 10% (mass ratio) SDS and 250mM EDTA. The termination of the breaking reaction can be completed by reacting at 55-60 ℃ for 15-25 min (such as 20 min).
In the above method, the sequence of the primer 1 may be 1-30 of the sequence 4.
The transposase recognition sequence can be from positions 36-54 of SEQ ID NO. 4.
The sequence of the primer 2 can be 1-30 bits of the sequence 5.
The length of the barcode sequences 1 and 2 can be adjusted according to specific needs.
In the above method, the barcode sequence 1 and the barcode sequence 2 may each have a length of m1), m2), or m 3):
m1)5-30bp;m2)5-10bp;m3)5bp。
in the above method, the sequence of a1 may be sequence 4 in the sequence table. The sequence of b1 can be sequence 5 in the sequence table.
In the sequences 4 and 5, n is any one of a, t, c and g.
In one embodiment of the invention, the sequences of a1 are 11, i.e., adapterA-1-11 in Table 1, and the sequences of b1 are 11, i.e., adapterB-1-11 in Table 1.
In the method, the primers used for PCR amplification consist of a forward primer and a reverse primer, the forward primer contains the primer 1 and a primer sequence for second-generation sequencing, and the reverse primer contains the primer 2 and a primer sequence for second-generation sequencing.
The reverse primer may also contain a barcode sequence named barcode sequence 3, which barcode sequence 3 is a random sequence.
The barcode sequences 3 may all be n1), n2), or n3) in length:
n1)6-30bp;
n2)6-10bp;
n3)6bp。
the forward primer can be single-stranded DNA shown as a sequence 6 in a sequence table, the reverse primer can be single-stranded DNA shown as a sequence 7 in the sequence table, and n is any one of a, t, c and g.
In one embodiment of the present invention, the reverse PRIMERs are 101, i.e., PRIMER-R-1E in Table 6
PRIMER-R-101。
The method may further comprise the step of further amplifying the PCR product. The products of the further amplification are the sequencing library. The further amplification can be completed by adopting two single-stranded DNAs shown as sequences 8 and 9 in a sequence table to carry out PCR amplification.
The PCR amplification and the amplification can be performed in the same reaction system. The same reaction system may be the reaction system shown in Table 5.
The reaction conditions for the PCR amplification and the amplification can be 72 ℃ for 3 min; 30s at 98 ℃; 15s at 98 ℃ and 30s at 60 ℃; 3min at 72 ℃ and 15 cycles; 5min at 72 ℃.
The present invention provides a DNA sequencing method comprising: and constructing a sequencing library of more than or equal to 1 sample DNA to be detected by using the method for constructing the sequencing library, and then sequencing the sequencing library to complete the sequencing of the more than or equal to 1 sample DNA to be detected.
The DNA sequencing method of the present invention may further comprise sample classification of the obtained sequencing data using the barcode sequences 1, 2 and 3.
The DNA sequencing method of the present invention may further comprise analyzing the sequencing data using a data processing system and obtaining a sequence corresponding to each sample.
The invention also provides any one of the following products:
x1) the transposase complex;
x2) kit comprising the transposase complex and primers for the PCR amplification;
x3) the data processing system.
The kit can be used for constructing a sequencing library and can also be used for DNA sequencing.
The kit may consist of the transposase complex and the primers for PCR amplification.
The data processing system (HITAC-assembler) comprises the steps of integrating published software and scripts thereof to split, control and assemble HITAC-seq data according to a barcode sequence of Tn5, and finally selecting a complete and accurate sequence of a sample.
The invention also provides the application of the product in any one of the following Y1) -Y16):
y1) preparing and constructing a sequencing library product;
y2) preparing a DNA sequencing product;
y3) preparing a genotypic product for identifying the plant or animal;
y4) preparing products for identifying or assisting in identifying disease pathogenic sites;
y5) preparing products for diagnosing or assisting in diagnosing human genetic diseases;
y6) preparing a product for treating or assisting in treating human genetic diseases;
y7) to prepare products for use in the prevention or co-prevention of genetic diseases in humans;
y8) constructing a sequencing library;
y9) DNA sequencing;
y10) DNA sample mixed pool sequencing;
y11) identifying the genotype of the plant or animal;
y12) identifying or assisting in identifying disease-causing sites;
y13) diagnosing or aiding in the diagnosis of a human genetic disorder;
y14) for the treatment or co-treatment of genetic disorders in humans;
y15) in the prevention or co-prevention of genetic diseases in humans;
y16) plasmid and DNA fragment sequencing;
or the like, or, alternatively,
the method for constructing a sequencing library or the use of the method for DNA sequencing in any one of Y9) -Y16) described above.
The product may be a kit.
The method for constructing a sequencing library by using the Tn5 transposase added with the barcode, namely a sequencing method (HITAC-seq) based on the method, can insert the barcode into a plasmid or a DNA fragment to be tested by using the high-activity Tn5 added with the barcode, so that a plurality of DNA samples can be sequenced simultaneously, and each sample can be distinguished according to the specific barcode. The barcode-inserted samples were pooled to construct libraries that can be sequenced on any second-generation sequencing platform. The technology can be used for sequencing plasmid samples and DNA fragments, and can remarkably reduce the sequencing cost. For a 2Kb plasmid insert, the sequencing depth is 100 multiplied (the assembly accuracy and the integrity of the fragment can reach more than 99%), the total data volume needs 200Mb according to the proportion of the data in the total data volume, namely according to the existing sequencing platform, the sequencing cost of 96 samples is only 12 yuan, the library building cost is 20 yuan, the average cost of each sample only needs 0.33 yuan, and the first generation sequencing light sequencing cost is at least 20 yuan. For one-generation sequencing, the cost will increase exponentially as the length of the sequenced fragment increases, but the cost increase for the HITAC-seq is negligible. For example, the fragment to be detected is 4Kb, and the first-generation sequencing requires at least 4 reactions to be detected, and at least 40 yuan. But only 0.46 yuan is required for the HITAC-seq sequencing cost plus the library construction cost. When the sample amount is increased, the advantages of the HITAC-seq are more obvious, and the labor cost can be reduced because the method applies the high-throughput thought in both the library building method and the sequencing method.
The HITAC-seq can avoid the defects that the first-generation sequencing is sensitive to the sequence structure and the second-generation sequencing length is short. When the sequenced plasmid fragment or PCR product has a special structure such as poly (A) structure or GC content is high, the sequencing reaction is usually interrupted and the sequence cannot be read. Because the HITAC-seq is based on the second-generation sequencing, the library sequencing is constructed by randomly breaking the detected fragment, so that the problem is avoided; and the HITAC-seq is based on the data and is assembled and then compared, so that a sequence with the length of thousands of bases can be obtained, and the problems that insertion deletion and structural variation of dozens of bases cannot be analyzed due to the length of second-generation sequencing reading are solved.
The HITAC-seq of the present invention is based on the core concept of labeling the highly active Tn5 transposon, allowing high throughput parallel sequencing of libraries with specific tags added to each individual plasmid or DNA fragment sample to be sequenced. Then, each plasmid or DNA fragment library can be merged together and sequenced using any next generation sequencing platform. Tn5 transposition introduced barcode sequences, the workflow of pool mixing and PCR amplification enabled the library to be completed for sequencing within 3 hours. Further assembly produced long sequences with relatively longer and higher accuracy than Sanger sequencing and second generation sequencing. Implementation of the HITAC-seq technology of the present invention can significantly improve throughput and reduce cost compared to traditional Sanger sequencing. Meanwhile, the HITAC-assembler is provided for analyzing data generated by the HITAC-seq, so that data splitting, quality control and sequence assembly can be completed, and a sample sequence with high accuracy and integrity is obtained.
The present invention has developed a new sequencing system, called HITAC-seq. The system includes a completely new barcode strategy and corresponding sequence assembly tools. The sequencing system can realize high-throughput, low-cost and high-base-accuracy sequence determination. Based on the innovative bar code technology, the sequencing system can efficiently distinguish and identify sequencing samples of each plasmid or DNA fragment and generate corresponding sequence determination results aiming at the samples of each plasmid or DNA fragment. The HITAC-seq developed by the invention uses the high-activity Tn5 transposase added with the bar code to break and mark a sequencing sample, so that a sub-sequencing library can be constructed for the plasmid or DNA fragment sample, and then a highly parallel mixed pool sequencing can be carried out on the sample by using an NGS sequencing platform. And (3) the generated data is used for distinguishing the data of each sequencing sample according to the bar code added to each sample by using the HITAC-assembler, and the data is assembled to obtain a final sequence determination result with extremely high accuracy. The ability to sequence plasmid clones and to sequence DNA fragments of different species of different length using HITAC-seq is demonstrated in the present invention.
Drawings
FIG. 1 is a flow chart of HITAC-seq.
FIG. 2 shows sequencing analysis of 96 cDNA clones. (a) Distribution graph of the HITAC-seq accuracy, wherein the accuracy refers to the percentage of nucleotides in a sequencing sequence which are the same as a reference sequence to all nucleotides in the reference sequence, the length of the sequencing sequence is the same as the length of the reference sequence, and the ordinate means the number of samples. (b) Relationship between sequencing sequence GC content and HITAC-seq sequencing accuracy of cDNA clones. (c) Sequence representation of sequences containing the ploy (A) structure by HITAC-seq and Sanger sequencing. Sanger-1 to Sanger-4 are 4 repeats of Sanger sequencing.
FIG. 3 results of sequencing of two yeast DNA fragments using HITAC-seq and Sanger, respectively. (a) Sequencing result of DNA fragment 195bp in length (b) sequencing result of DNA fragment 4235bp in length.
Detailed Description
The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The experimental procedures in the following examples are conventional unless otherwise specified. Materials, reagents, instruments and the like used in the following examples are commercially available unless otherwise specified. The quantitative tests in the following examples, all set up three replicates and the results averaged. In the following examples, unless otherwise specified, the 1 st position of each nucleotide sequence in the sequence listing is the 5 'terminal nucleotide of the corresponding DNA, and the last position is the 3' terminal nucleotide of the corresponding DNA.
Tn5 storage buffer (pH 7.5) consisted of a solute and a solvent, the solvent being 50mM Tris-HCl, the solute and its concentration being 100mM NaCl, 0.1mM EDTA, 1mM DTT, 50% (by volume) glycerol, pH 7.5, respectively.
The annexing buffer solution consists of a solute and a solvent, wherein the solvent is 10mM Tris-HCl, and the solute and the concentration thereof are 1mM EDTA and 25mM NaCl respectively, and the pH value is 8.0.
Coupling buffer solution from 29.2% (volume ratio) ddH2O, 30.5% (v/v) glycerol and 40.3% (v/v) Tn storage buffer (without NaCl). Wherein, the Tn storage buffer solution (without NaCl) consists of a solvent and a solute, the solvent is 50mM Tris-HCl, and the solute and the concentration thereof are respectively 0.1mM EDTA, 1mM DTT and 50% (volume ratio) glycerol.
Tn5 dilution buffer consisted of 20% (v/v) Tn5 storage buffer (pH 7.5), 14% (v/v) annexing buffer and 33% (v/v) Coupling buffer.
The 5 Xtag buffer solution comprises solvent and solute, wherein the solvent is water, and the solute and the concentration thereof are 50mM TAPS and 25mM MgCl respectively2And 50% (mass ratio) DMF, pH 8.5.
The 5 Xstop buffer was composed of a solvent and a solute, the solvent being water, and the solute and its concentration being 10% (mass ratio) SDS and 250mM EDTA, respectively.
Example 1 HITAC-seq System
Preparation of Tn5 transposase
The Tn5 transposase is obtained by using an escherichia coli expression system for expression, the amino acid sequence of the Tn5 transposase is a sequence 1 in a sequence table, and the coding gene of the Tn5 transposase is a DNA molecule shown as a sequence 2 in the sequence table.
Preparation of transposase Complex
1. Joint preparation
Synthesis of linker sequence (i.e. single stranded DNA): tn-common, adapterA-1 ~ 11 and adapterB-1 ~ 11. the sequence of Tn-common is shown in the sequence 3 in the sequence table, the 5' end of the Tn-common is provided with phosphorylation modification, and the sequence of the Tn-common is complementary with the recognition sequence of Tn5 transposase. The sequence of the adapterA-1-11 is
Figure BDA0002525789660000063
(sequence 4 in the sequence table), N is any one of A, T, C and G, and the specific sequences of the adapterA-1-11 are shown in Table 1. The sequence of the adapterB-1-11 is
Figure BDA0002525789660000062
(sequence 5 in the sequence table), N is any one of A, T, C and G, adThe specific sequences of apterB-1-11 are shown in table 1. In the adapterA-1-11 and the adapterB-1-11, the bold nucleotide sequence is the identification sequence of Tn5 transposase, the underlined sequence is a barcode sequence used for sequence marking, the non-bold and non-underlined sequence is the sequence of a second-generation sequencing Y joint, and the adapterA-1-11 is different from the adapterB-1-11.
TABLE 1 linker sequences
Figure BDA0002525789660000061
Figure BDA0002525789660000071
Each linker in Table 1 was dissolved in an analing buffer at a concentration of 100. mu.M for each single-stranded DNA. Respectively mixing the solution of tn-common with solutions of the adapterA-1-11 and the adapterB-1-11 in equal volume (25 mu L of each) to obtain 22 mixed solutions, respectively reacting the 22 mixed solutions according to the reaction conditions of the table 2 to obtain 22 adaptor products, namely 11 adaptor A adaptor products formed by the respective adapterA-1-11 and the tn-common and 11 adaptor B adaptor products formed by the respective adapterB-1-11 and the tn-common, respectively, and respectively mixing the 11 adaptor A adaptor products with the 11 adaptor B adaptor products in equal volume one by one to carry out permutation and combination to obtain 121 adaptor mixtures.
TABLE 2 reaction conditions
Figure BDA0002525789660000072
2. Preparation of transposase Complex
Reaction systems are prepared in a sterilized PCR tube according to the table 3, each reaction system is a joint mixture, the mixture is fully and uniformly mixed and reacts for 1 hour at the temperature of 30 ℃, reaction products are obtained and are marked as B Tn5(Barcoded Tn5), each reaction product contains a transposase complex, and 121 kinds of B Tn5 are obtained in total. Each B Tn5 was diluted 32-fold with Tn5 dilution buffer and the resulting dilution was named DB Tn5 for a total of 121 DB Tn5 stored at-20 ℃.
TABLE 3 reaction System
Figure BDA0002525789660000081
Construction of library
1. Tagged transposase treated DNA samples
Preparing a reaction system according to the reaction system in the table 4, then incubating the reaction system at 55 ℃ for 10min to obtain an interrupted product, and realizing interruption of sample DNA to be sequenced (the interrupted sample DNA template to be sequenced can be DNA fragments of plants, animals and humans, and can also be plasmid DNA) by using Tn5 transposase. In the reaction system, the proportion of the transposase complex to the DNA of the sample to be sequenced is 12.5ng:1ng, when the interruption reaction is carried out, DB Tn5 used by each DNA of the sample to be sequenced can be randomly selected from the 121 obtained DB Tn5, if a plurality of samples to be sequenced exist, DB Tn5 used by the DNA of each sample to be sequenced are different, and each sample DNA to be sequenced obtains an interruption product thereof, so that the embodiment can interrupt the DNA of the 121 samples to be sequenced at most, and further obtain the interruption products of the DNA of the 121 samples to be sequenced.
TABLE 4 reaction System
Figure BDA0002525789660000082
2. Mixing pool
2.1 adding 2.5 μ L of 5 × stop buffer solution into the disrupted product obtained in the step 1, blowing, mixing uniformly, centrifuging, and incubating for 20min at 55 ℃;
2.2 after step 2.1 is completed, the product can be directly purified by using DNA Clean Beads (Nonunza, cat.no. N411)) (1 x) to obtain a purified product, or a plurality of sample DNA break products to be sequenced can be respectively absorbed by 5 mu L to be mixed, and then 1 XDNA Clean Beads which are fully mixed and balanced to room temperature are added for purification to obtain a purified product.
3. Enrichment by PCR
A PCR reaction system was prepared according to the reaction system shown in Table 5.
TABLE 5 PCR reaction System
Figure BDA0002525789660000083
In Table 5, PRTMER-R-X means PRIMER-R-1 to PRIMER-R-101 in Table 6, and each reaction system contained 1 kind of PRTMER-R-X. The sequence of PRTMER-F is sequence 6 in the sequence table, the sequence of PRTMER-R-X is CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (sequence 7 in the sequence table), N is either A, T, C or G, and the specific sequences of PRIMER-R-1 to PRIMER-R-101 are shown in Table 6. The underlined sequence in PRIMER-F is identical to positions 1-30 of adapterA-1-10, and the underlined sequence in PRIMER-R-1 is identical to positions 1-30 of adapterB-1-10. The sequences of PPM-F and PPM-R are respectively sequences 8 and 9 in the sequence table, and the primer pair can further amplify PCR products of PRTMER-F and PRTMER-R-X.
Table 6: PCR primer sequences
Figure BDA0002525789660000091
Figure BDA0002525789660000101
Figure BDA0002525789660000111
And carrying out PCR reaction on each reaction system to obtain a PCR product. The PCR reaction conditions were all as follows:
Figure BDA0002525789660000112
and (3) carrying out electrophoresis on the PCR product, and sorting out 200-500bp DNA fragments according to the detection result of gel electrophoresis to obtain one or more sequencing libraries of the DNA of the sample to be sequenced. And sequencing and data analysis can be carried out on the obtained sequencing library, so that the simultaneous sequencing of the DNA of a plurality of samples to be sequenced is realized.
Four, sequence Assembly
In order to process the HITAC-seq data, the inventor also develops a one-stop tool named HITAC-assembler, and the data obtained by sequencing the mixed pool is split into separate samples according to the unique tag sequence information and is assembled respectively. In addition, it can effectively remove contamination in a sample (sample contamination or amplification product non-specificity). The HITAC-assembler is a user-friendly tool that integrates the latest software including trimmatic (raw data processing), BWA-MEM (alignment and filtering of clone sequencing data) and SPAdes (data assembly). The K-mer setting for clone sequencing was 111, the K-mer setting for DNA fragments less than 200bp was 31, and the K-mer setting for DNA fragments greater than 200bp was 51.
The method of constructing a sequencing library and assembling the next generation sequencing data to obtain a sample sequence by using the method in this embodiment is denoted as HITAC-seq.
The HITAC-seq flow diagram is shown in fig. 1.
Example 2 sequencing of clones Using HITAC-seq of example 1
The samples to be detected are 96 corn cDNA clone bacteria liquid.
Constructing a cDNA library: homologous recombination of maize cDNA according to the clonminer II cDNA Library Construction Kit into pDNOR222 vector (Invitrogen), each maize cDNA plasmid Library being a mixture of multiple recombinant plasmids containing the same seed cDNA; introducing the cDNA plasmid library of corn seeds into escherichia coli, culturing to obtain recombinant escherichia coli containing the corn cDNA plasmid library, obtaining a cDNA bacterial liquid library, randomly selecting 96 corn cDNA clones from the cDNA bacterial liquid library, and extracting plasmids.
Then, 96 corn cDNA plasmids are used as sample DNA to be sequenced to be interrupted by the method of step 1 in the third step of the embodiment 1, each corn cDNA plasmid is randomly selected from 121 DB Tn5 obtained in the second step of the embodiment 1 to be interrupted, DB Tn5 used for interrupting the cDNA bacterial plasmids of each corn is different to obtain 96 interrupted products, 5 mu L of each of the 96 interrupted products is mixed, and DNA Clean Beads are used for purification to obtain purified products; then, PCR amplification is carried out by using the PCR enrichment method of the third step in the example 1, and PCR products, namely a cDNA sequencing library of 96 corns, are obtained.
The PCR products were sequenced using the Nova-seq sequencing platform (also applicable to other second generation sequencing platforms) yielding 2.2G data in which 39.41% was aligned to the E.coli genome, 33.04% was aligned to the pDNOR222 vector, and 27.55% was aligned to the maize genome.
The sequences aligned to the E.coli genome and the pDNOR222 vector were filtered out, and the remaining sequences were analyzed using the HITAC-assembler procedure: the sequencing results of different samples were first resolved according to barcode sequence. The data can be evenly distributed among 96 samples. And then assembling the split data. And comparing the assembled sequence with a reference sequence, wherein the result shows that the assembled sequence can be accurately compared with the corresponding sample. HITAC-seq is higher and more stable from the aspect of accuracy, and is more stable to the aspect of GC content, cDNA clones all contain poly (A) structure of 18bp, and the cDNA clones are randomly selected and sequenced by Sanger. It can be seen that the presence of poly (A) in the sequence leads to mismatches and even termination (-99.52% identity and-96.32% coverage). In contrast, the HITAC-seq performed well, yielding 100% accuracy and coverage.
The results of sequencing analysis of 96 maize cDNA clones are shown in FIG. 2.
Example 3 sequencing of Yeast DNA fragments Using the HITAC-seq of example 1
And (3) carrying out PCR amplification on the yeast DNA fragment shown by the sequence 10 (with the length of 195bp) in the sequence table by utilizing PCR to obtain the DNA fragment to be sequenced. After purifying the DNA fragment to be sequenced, respectively treating the DNA fragment by using two randomly selected DB Tn5 of example 1 to obtain two break products; mixing the two break products by taking 5 mu L of each break product, and purifying by using DNA Clean Beads to obtain a purified product; and then, carrying out PCR amplification by using the PCR enrichment method of the third step in the embodiment 1 to obtain a PCR product, and sequencing the obtained PCR product by using a Nova-seq sequencing platform. Sequencing the DNA fragment to be sequenced obtained by PCR amplification by utilizing Sanger sequencing, and comparing.
From the read coverage per nucleotide in the sequencing results, the coverage of HITAC-seq was 99% with 100% accuracy. Coverage for one generation sequencing (Sanger sequencing) was 96% with 99% accuracy.
According to the method, the DNA fragment shown in the sequence 10 is replaced by the DNA fragment shown in the sequence 11 (the length is 4235bp), and PCR amplification is carried out to obtain the DNA fragment to be sequenced. After the DNA fragment to be tested is purified, the two randomly selected DB Tn5 of the embodiment 1 are respectively used for processing to obtain two break products; mixing the two break products by taking 5 mu L of each break product, and purifying by using DNA Clean Beads to obtain a purified product; and then, carrying out PCR amplification by using the PCR enrichment method of the third step in the embodiment 1 to obtain a PCR product, and sequencing the obtained PCR product by using a Nova-seq sequencing platform. Sequencing the DNA fragment to be sequenced obtained by PCR amplification by utilizing Sanger sequencing, and comparing.
The results show that the coverage of the HITAC-seq is 99% with an accuracy of 100%. Coverage of the first generation sequencing was 97% with an accuracy of 99%. The average sequencing read length of one-generation sequencing is 800bp, and at least 5 more primers need to be designed for sequencing for a DNA fragment of 4235bp, but the sequence can be detected at one time for HITAC-seq.
The HITAC-seq is analyzed in terms of cost, accuracy and coverage, and the results show that the cost (Table 7), accuracy and coverage of the HITAC-seq are all significantly better than those of Sanger sequencing.
TABLE 7 comparison of different length DNA fragments HITAC-seq with Sanger sequencing
Figure BDA0002525789660000121
Figure BDA0002525789660000131
The cost of HITAC-seq was obtained using the data volume of the Illumina sequencing platform with an average sequencing depth of 50X per DNA sample, and the coverage and accuracy were also calculated at this data volume. The cost of Sanger is the cost of including sequencing primers and sequencing costs when the fragment is fully sequenced.
The results of sequencing two DNA fragments using HITAC-seq and Sanger, respectively, are shown in FIG. 3.
Example 4 sequencing of mammalian or human DNA fragments Using the HITAC-seq of example 1
The suitability of HITAC-seq was tested by PCR amplification of both the human genes MRE11 (SEQ ID NO: 12) and ESCO1 (SEQ ID NO: 13). And carrying out PCR amplification on the sequence 12 and the sequence 13 to obtain a DNA fragment to be sequenced, purifying the DNA fragment to be sequenced, preparing a reaction system according to the reaction system in the table 4, and then incubating the reaction system at 55 ℃ for 10min to obtain an interruption product, thereby realizing the interruption of the DNA of a sample to be sequenced by using Tn5 transposase. Two cleavage products were obtained by treatment with DB Tn5 from example 1; adding 2.5 μ L of 5 × stop buffer solution into the obtained disrupted product, blowing, mixing, centrifuging, and incubating at 55 deg.C for 20 min; after completion, the product can be directly purified by using DNA Clean Beads (nunoprazan, cat. No. n411) (1 ×) to obtain a purified product, or the disrupted product of the sample DNA to be sequenced can be mixed by sucking 10 μ L each, and then 1 × DNA Clean Beads (20 μ L) which are well mixed and equilibrated to room temperature are added to purify to obtain a purified product. And then, carrying out PCR amplification by using the PCR enrichment method of the third step in the embodiment 1 to obtain a PCR product, and sequencing the obtained PCR product by using a Nova-seq sequencing platform.
And (3) splitting sequencing data by using a HITAC-assembler according to the introduced DB Tn5 tag sequence, and then assembling to obtain a sequence. The obtained sample sequence is compared with the reference sequence, the coverage is close to 100 percent, the accuracy is 100 percent, and the HITAC-seq is applicable to human gene sequencing.
TABLE 8 sequencing results of human genes MRE11 and ESCO1 Using HITAC-seq
Figure BDA0002525789660000132
<110> university of agriculture in China
<120> novel sequencing method of plasmid or DNA fragment with high throughput, low cost and high base accuracy
<160> 13
<170> PatentIn version 3.5
<210> 1
<211> 476
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 1
Met Ile Thr Ser Ala Leu His Arg Ala Ala Asp Trp Ala Lys Ser Val
1 5 10 15
Phe Ser Ser Ala Ala Leu Gly Asp Pro Arg Arg Thr Ala Arg Leu Val
20 25 30
Asn Val Ala Ala Gln Leu Ala Lys Tyr Ser Gly Lys Ser Ile Thr Ile
35 40 45
Ser Ser Glu Gly Ser Lys Ala Met Gln Glu Gly Ala Tyr Arg Phe Ile
50 55 60
Arg Asn Pro Asn Val Ser Ala Glu Ala Ile Arg Lys Ala Gly Ala Met
65 70 75 80
Gln Thr Val Lys Leu Ala Gln Glu Phe Pro Glu Leu Leu Ala Ile Glu
85 90 95
Asp Thr Thr Ser Leu Ser Tyr Arg His Gln Val Ala Glu Glu Leu Gly
100 105 110
Lys Leu Gly Ser Ile Gln Asp Lys Ser Arg Gly Trp Trp Val His Ser
115 120 125
Val Leu Leu Leu Glu Ala Thr Thr Phe Arg Thr Val Gly Leu Leu His
130 135 140
Gln Glu Trp Trp Met Arg Pro Asp Asp Pro Ala Asp Ala Asp Glu Lys
145 150 155 160
Glu Ser Gly Lys Trp Leu Ala Ala Ala Ala Thr Ser Arg Leu Arg Met
165 170 175
Gly Ser Met Met Ser Asn Val Ile Ala Val Cys Asp Arg Glu Ala Asp
180 185 190
Ile His Ala Tyr Leu Gln Asp Lys Leu Ala His Asn Glu Arg Phe Val
195 200 205
Val Arg Ser Lys His Pro Arg Lys Asp Val Glu Ser Gly Leu Tyr Leu
210 215 220
Tyr Asp His Leu Lys Asn Gln Pro Glu Leu Gly Gly Tyr Gln Ile Ser
225 230 235 240
Ile Pro Gln Lys Gly Val Val Asp Lys Arg Gly Lys Arg Lys Asn Arg
245 250 255
Pro Ala Arg Lys Ala Ser Leu Ser Leu Arg Ser Gly Arg Ile Thr Leu
260 265 270
Lys Gln Gly Asn Ile Thr Leu Asn Ala Val Leu Ala Glu Glu Ile Asn
275 280 285
Pro Pro Lys Gly Glu Thr Pro Leu Lys Trp Leu Leu Leu Thr Ser Glu
290 295 300
Pro Val Glu Ser Leu Ala Gln Ala Leu Arg Val Ile Asp Ile Tyr Thr
305 310 315 320
His Arg Trp Arg Ile Glu Glu Phe His Lys Ala Trp Lys Thr Gly Ala
325 330 335
Gly Ala Glu Arg Gln Arg Met Glu Glu Pro Asp Asn Leu Glu Arg Met
340 345 350
Val Ser Ile Leu Ser Phe Val Ala Val Arg Leu Leu Gln Leu Arg Glu
355 360 365
Ser Phe Thr Pro Pro Gln Ala Leu Arg Ala Gln Gly Leu Leu Lys Glu
370 375 380
Ala Glu His Val Glu Ser Gln Ser Ala Glu Thr Val Leu Thr Pro Asp
385 390 395 400
Glu Cys Gln Leu Leu Gly Tyr Leu Asp Lys Gly Lys Arg Lys Arg Lys
405 410 415
Glu Lys Ala Gly Ser Leu Gln Trp Ala Tyr Met Ala Ile Ala Arg Leu
420 425 430
Gly Gly Phe Met Asp Ser Lys Arg Thr Gly Ile Ala Ser Trp Gly Ala
435 440 445
Leu Trp Glu Gly Trp Glu Ala Leu Gln Ser Lys Leu Asp Gly Phe Leu
450 455 460
Ala Ala Lys Asp Leu Met Ala Gln Gly Ile Lys Ile
465 470 475
<210> 2
<211> 1428
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 2
atgattacca gtgcactgca tcgtgcggcg gattgggcga aaagcgtgtt ttctagtgct 60
gcgctgggtg atccgcgtcg taccgcgcgt ctggtgaatg ttgcggcgca actggccaaa 120
tatagcggca aaagcattac cattagcagc gaaggcagca aagccatgca ggaaggcgcg 180
tatcgtttta ttcgtaatcc gaacgtgagc gcggaagcga ttcgtaaagc gggtgccatg 240
cagaccgtga aactggccca ggaatttccg gaactgctgg caattgaaga taccacctct 300
ctgagctatc gtcatcaggt ggcggaagaa ctgggcaaac tgggtagcat tcaggataaa 360
agccgtggtt ggtgggtgca tagcgtgctg ctgctggaag cgaccacctt tcgtaccgtg 420
ggcctgctgc atcaagaatg gtggatgcgt ccggatgatc cggcggatgc ggatgaaaaa 480
gaaagcggca aatggctggc cgctgctgca acttcgcgtc tgagaatggg cagcatgatg 540
agcaacgtga ttgcggtgtg cgatcgtgaa gcggatattc atgcgtatct gcaagataaa 600
ctggcccata acgaacgttt tgtggtgcgt agcaaacatc cgcgtaaaga tgtggaaagc 660
ggcctgtatc tgtatgatca cctgaaaaac cagccggaac tgggcggcta tcagattagc 720
attccgcaga aaggcgtggt ggataaacgt ggcaaacgta aaaaccgtcc ggcgcgtaaa 780
gcgagcctga gcctgcgtag cggccgtatt accctgaaac agggcaacat taccctgaac 840
gcggtgctgg ccgaagaaat taatccgccg aaaggcgaaa ccccgctgaa atggctgctg 900
ctgaccagcg agccggtgga aagtctggcc caagcgctgc gtgtgattga tatttatacc 960
catcgttggc gcattgaaga atttcacaaa gcgtggaaaa cgggtgcggg tgcggaacgt 1020
cagcgtatgg aagaaccgga taacctggaa cgtatggtga gcattctgag ctttgtggcg 1080
gtgcgtctgc tgcaactgcg tgaatctttt actccgccgc aagcactgcg tgcgcagggc 1140
ctgctgaaag aagcggaaca cgttgaaagc cagagcgcgg aaaccgtgct gaccccggat 1200
gaatgccaac tgctgggcta tctggataaa ggcaaacgca aacgcaaaga aaaagcgggc 1260
agcctgcaat gggcgtatat ggcgattgcg cgtctgggcg gctttatgga tagcaaacgt 1320
accggcattg cgagctgggg tgcgctgtgg gaaggttggg aagcgctgca aagcaaactg 1380
gatggctttc tggccgcgaa agacctgatg gcgcagggca ttaaaatc 1428
<210> 3
<211> 19
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 3
ctgtctctta tacacatct 19
<210> 4
<211> 54
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<221> misc_feature
<222> (31)..(35)
<223> n is a, c, g, or t
<400> 4
ctctttccct acacgacgct cttccgatct nnnnnagatg tgtataagag acag 54
<210> 5
<211> 54
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<221> misc_feature
<222> (31)..(35)
<223> n is a, c, g, or t
<400> 5
ctggagttca gacgtgtgct cttccgatct nnnnnagatg tgtataagag acag 54
<210> 6
<211> 58
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 6
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58
<210> 7
<211> 64
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<221> misc_feature
<222> (25)..(30)
<223> n is a, c, g, or t
<400> 7
caagcagaag acggcatacg agatnnnnnn gtgactggag ttcagacgtg tgctcttccg 60
atct 64
<210> 8
<211> 24
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 8
aatgatacgg cgaccaccga gatc 24
<210> 9
<211> 24
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 9
caagcagaag acggcatacg agat 24
<210> 10
<211> 195
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 10
gattacgctg ctcagtgcgg cgcgccagga cgtaaatggt cgccgaattg gggttctata 60
gtatacacag agcgaaacca ttcaaggacg gtgcatctat caagatcgac agggacaata 120
acgccattga actcctcacc tttgaaaaaa agtagtccgt ctattacctg aggcgcgcca 180
cttctaaata agcga 195
<210> 11
<211> 4224
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 11
atgctaggtt tgcgaactca tggtttagac aggtatgaac attatattcg tcgcccgtcg 60
gattttggca aacttgagct gcaagattgg ttgaatcata agtcattccg agtttccccc 120
aatttattaa ttgattcaag caccacacgt gagtggaatg aacctgagct tttttatcaa 180
aataccgagg atgaaacttg ggtgagacct tgtgtagggc caaagctaga accatcaatg 240
atgatgctca gatatcacga ttcgaatatt ggtcaaatgc ctcaattttg ttatccaatc 300
tcaagtccga taaattttaa accagtatta aaatacattt tgcaagaacg ttctgaactg 360
tcagacggct tccctcaaaa gtataacaca ctaataggta gtctttttga cattgataaa 420
aacccagaaa ccttagatga ttcagatata gaagcattgg atgacataga aatgagcagt 480
gacagcggta atgttaaaga accaaaaatt gaattgcagg cgctggaaga aatccaacaa 540
aagcatttca gtttaatagt atccaacaat ggaatctttc aaacaggtag cacttcaata 600
acatacatac agtctggcat atctggcagc atagctataa aacccaacaa cgttgcaatt 660
ttaatattac tcactcaacc aagtggtcac ttattgtcta ttttaccgct tgatgacggt 720
aaagagacat atttgctaca atattggaac ctgggacaaa aaggtcagtg gaacataatc 780
aagcaccaaa acgagaagca gtttgttctt atacataagg aactaggcat ttgcaaattt 840
tttgaattcc atttaccatt tacttttcaa ttagtaaaca atttaacatt gaccgattcc 900
gtgattatga acggatcctt tttcccaaca aattacactg atttagatcc ctatttcatt 960
atatttataa cagccataag atatgaaagg atagtctact ttgtcataga atggaacaac 1020
aacgaaataa agaaaaaaga ggtatatcaa ttgacagtat ttgatggtga gaagactaat 1080
atgacaatac ccattggact aaatgcatgt ttagtcgaaa cacccctaaa gttctcttta 1140
gtttctgcaa atcaaattat gtcaggagag actgaattcc actcattcca attgaaggct 1200
ctcaagggaa tcaagtcatt ttttccagct cctttattgt tattaaaatt acaagaacta 1260
cacccacata catttaaaaa attccaatat tgtaccataa tatcctcctc gacaggaaat 1320
atttgttttt gcgtcaccga acgatcgaca atagtaaatg gtaatttaaa attttacgag 1380
ctaactaggt tcaaaggatt gaaatccatt tcaccactac cttcaaatcc gataaattta 1440
gactccagat cctcaagtta tgtactggtg gtcataagtt ttagtagaac attagagcta 1500
acattatctc tagaagattt gagatgttta gataaaaaag acgttattaa gcctttgaaa 1560
aatatcacct tcaagcacac aattgatagt tccacagagg agaactctca aattttagca 1620
tttacgtctt ctaaatttta taacacacac acaggctcta acatcaatga cacgagaaat 1680
tctcaagttt ggcttacctc accaaatgca ataactcaac cttgcattga ttataaactc 1740
aggaaaactc atcaacttat ccacttaaag caatttcaaa tttttagaca tcttaggata 1800
tggaaatgta agaaccttga tattgctctg ttacagagac ttggaataaa ccagtcaaat 1860
accgaaagtt cgttaatttt tgcgaccgac gctgtttcta acaacagaat atttttatta 1920
gatttaacta tgacaacgac aatcgataat gatgatcccg ttcaaggact gataaatata 1980
gaagatttac tatgtgatac tgaaaacgaa actatccttc taaattttac gaaaaataac 2040
ttgattcaag taactaggga tacgatatac atcgatccca ttggtgggga caaagagttg 2100
cgtaagattt ctccaggttg ggaattcgaa aacgttacat acaatgatgg tattttaata 2160
gtttggaatg ctgggcttgg ctgtgtctct tatattgaaa atatagacgc tgttgatgag 2220
tctggcgcct tagtttcaaa cttaagcagc agcaaaggca tgagcaagtt cttcaagcag 2280
ttaggaactg tcacgagtgt caattttcaa atcaaagagt ctacagatga tccaaccaag 2340
tatgacattt ggatcctttt accagactgt gtcattcgta cacccttttc tgactggatt 2400
agtgattcac ttgatttttc tgatgtgtat attttgagtg ttcagcaggc gttaataaac 2460
ggcccttatt tttgctctct cgattatgaa tcatattttg aggtgcacac tttacagaac 2520
aactgcttca aaaaaggatc cagatgtaca agcagagtta attttcaagg gaaagatatt 2580
aaatttagaa gctttggtgt gaatcaatgt ttggcattta gtgcatttga aatttttgtc 2640
atcaatttaa cgccaattca cgacagtaga gaattggatt tttacaagct aaaattaccg 2700
cacctaggta ataacaattc aattcttgaa gtctgtccgg acatagaaaa caaccaatta 2760
ttcatactct actctgatgg cttaagaatc ctcgaactat catacctaac atcaaataat 2820
ggaaatttct tattaaaatc tacaagaagc aaaaataaga aatttttata tctagacaaa 2880
ataaatcgaa tgctggtatt gaatcaggac ttgagggaat gggagtgtat tagactatca 2940
gatggtaagg cagttggttt agattctcaa cttcttaagg atgattctga agaaatccta 3000
gaaataaagg aattaccaat agcaacagag gacaatcctt tagaaaagaa aactgtatta 3060
ttgatctctt ttactagctc attaaaacta gttttattaa ctgctgcaaa aaacaaaatt 3120
tccaatcaaa taatagattc gtataaactt gacaattcaa gactcctcaa tcatttggtc 3180
attactccca gaggtgaaat attctttctg gattataaag ttatgggcac cgataacgaa 3240
atgtccttta acaagttgaa agtcacaaaa cattgcattg accaggagga gagaaataat 3300
acgactttgc ggctcacttt ggaaacccgg tttacattta agagttggag tacagttaag 3360
acgtttactg tagtaggcga taatatcatt gctaccacaa atatgggcga aaagctctac 3420
ttgattaagg atttctcttc atcatctgac gaatcgagaa gggtgtatcc cttggaaatg 3480
tatcctgatt caaaagttca aaagataata ccattaaatg aatgctgctt tgttgttgcc 3540
gcttactgtg gaaataggaa cgatttagat tcaagattaa ttttttactc tttacccacc 3600
atcaaagttg ggcttaataa cgaaacaggc agcctaccag atgaatatgg gaatgggaga 3660
gtcgacgaca tattcgaggt tgactttcct gaaggatttc aatttggcac tatggctttg 3720
tatgatgttt tacatggtga gaggcacgta aatcgttaca gcgaaggcat acggtcggag 3780
aatgacgaag cagaggttgc cctaaggcag cgtagaaatt tactactctt ttggcgaaac 3840
cactcttcta caccaaaacc ttcactacgc cgagccgcca ctatagtata tgaggatcat 3900
gtatcatccc gttattttga ggatataagt tctatattag gaagcactgc aatgagaact 3960
aaaagactat ctccctataa tgcggtagca ttggacaagc ctattcaaga tattagttac 4020
gatcccgcag tacaaacttt atatgtgcta atggcagatc aaacaattca caaattcggc 4080
aaggacaggt tgccttgcca ggacgaatac gaaccaagat ggaattctgg ctatttggtt 4140
tcaagaaggt caatagttaa atctgacctc atctgtgagg ttgggttatg gaaccttagc 4200
gataactgca agaacacagt ataa 4224
<210> 12
<211> 2127
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 12
atgagtactg cagatgcact tgatgatgaa aacacattta aaatattagt tgcaacagat 60
attcatcttg gatttatgga gaaagatgca gtcagaggaa atgatacgtt tgtaacactc 120
gatgaaattt taagacttgc ccaggaaaat gaagtggatt ttattttgtt aggtggtgat 180
ctttttcatg aaaataagcc ctcaaggaaa acattacata cctgcctcga gttattaaga 240
aaatattgta tgggtgatcg gcctgtccag tttgaaattc tcagtgatca gtcagtcaac 300
tttggtttta gtaagtttcc atgggtgaac tatcaagatg gcaacctcaa catttcaatt 360
ccagtgttta gtattcatgg caatcatgac gatcccacag gggcagatgc actttgtgcc 420
ttggacattt taagttgtgc tggatttgta aatcactttg gacgttcaat gtctgtggag 480
aagatagaca ttagtccggt tttgcttcaa aaaggaagca caaagattgc gctatatggt 540
ttaggatcca ttccagatga aaggctctat cgaatgtttg tcaataaaaa agtaacaatg 600
ttgagaccaa aggaagatga gaactcttgg tttaacttat ttgtgattca tcagaacagg 660
agtaaacatg gaagtactaa cttcattcca gaacaatttt tggatgactt cattgatctt 720
gttatctggg gccatgaaca tgagtgtaaa atagctccaa ccaaaaatga acaacagctg 780
ttttatatct cacaacctgg aagctcagtg gttacttctc tttccccagg agaagctgta 840
aagaaacatg ttggtttgct gcgtattaaa gggaggaaga tgaatatgca taaaattcct 900
cttcacacag tgcggcagtt tttcatggag gatattgttc tagctaatca tccagacatt 960
tttaacccag ataatcctaa agtaacccaa gccatacaaa gcttctgttt ggagaagatt 1020
gaagaaatgc ttgaaaatgc tgaacgggaa cgtctgggta attctcacca gccagagaag 1080
cctcttgtac gactgcgagt ggactatagt ggaggttttg aacctttcag tgttcttcgc 1140
tttagccaga aatttgtgga tcgggtagct aatccaaaag acattatcca ttttttcagg 1200
catagagaac aaaaggaaaa aacaggagaa gagatcaact ttgggaaact tatcacaaag 1260
ccttcagaag gaacaacttt aagggtagaa gatcttgtaa aacagtactt tcaaaccgca 1320
gagaagaatg tgcagctctc actgctaaca gaaagaggga tgggtgaagc agtacaagaa 1380
tttgtggaca aggaggagaa agatgccatt gaggaattag tgaaatacca gttggaaaaa 1440
acacagcgat ttcttaaaga acgtcatatt gatgccctcg aagacaaaat cgatgaggag 1500
gtacgtcgtt tcagagaaac cagacaaaaa aatactaatg aagaagatga tgaagtccgt 1560
gaggctatga ccagggccag agcactcaga tctcagtcag aggagtctgc ttctgccttt 1620
agtgctgatg accttatgag tatagattta gcagaacaga tggctaatga ctctgatgat 1680
agcatctcag cagcaaccaa caaaggaaga ggccgaggaa gaggtcgaag aggtggaaga 1740
gggcagaatt cagcatcgag aggagggtct caaagaggaa gagcagacac tggtctggag 1800
acttctaccc gtagcaggaa ctcaaagact gctgtgtcag catctagaaa tatgtctatt 1860
atagatgcct ttaaatctac aagacagcag ccttcccgaa atgtcactac taagaattat 1920
tcagaggtga ttgaggtaga tgaatcagat gtggaagaag acatttttcc taccacttca 1980
aagacagatc aaaggtggtc cagcacatca tccagcaaaa tcatgtccca gagtcaagta 2040
tcgaaagggg ttgattttga atcaagtgag gatgatgatg atgatccttt tatgaacact 2100
agttctttaa gaagaaatag aagataa 2127
<210> 13
<211> 2523
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 13
atgatgtcca ttcaggagaa atcaaaagag aattcctcca aagttactaa aaaaagtgac 60
gataagaatt cagaaacaga aattcaggat tctcaaaaga atctagcaaa aaaatcaggt 120
ccaaaggaga ctataaaatc acaggctaaa tcttccagtg aaagtaaaat aaatcagcca 180
gaattggaaa cacgcatgag tacaaggtca tcaaaggcag catctaatga taaagctact 240
aaatccatta ataaaaatac ggtgactgtg aggggatatt cacaagaatc tacaaaaaag 300
aaattatctc agaaaaaatt agtacatgaa aaccctaaag caaatgaaca gcttaaccgg 360
agatcacaaa ggctacaaca attaacagag gtttcaagaa ggtcgttacg cagtagagaa 420
attcagggtc aagttcaagc agttaaacag agtttgccac caactaaaaa agagcagtgt 480
agcagtactc agagtaaatc taataaaaca agtcaaaaac atgtgaagag aaaagtactg 540
gaagtaaagt ctgactctaa agaagatgaa aatctagtaa ttaatgaagt aataaattct 600
cccaaaggga aaaaacgcaa ggtagaacat cagacagctt gtgcttgtag ttctcaatgc 660
acgcaaggat ctgaaaagtg tcctcagaag actactagaa gagacgaaac gaaacctgtg 720
cctgtaactt ctgaggtgaa aagatcaaaa atggctactt cagtggtccc gaaaaagaat 780
gagatgaaga agtcggttca tacacaagtg aatactaaca caacactccc aaaaagtcca 840
cagccatcag tgcctgaaca aagtgataat gagctggagc aagcaggaaa gagcaaacga 900
ggtagtattc tccagctctg tgaagaaatt gctggtgaaa ttgagtcaga taatgtagag 960
gtaaaaaagg aatcttcaca aatggaaagt gtaaaggaag aaaagcccac agaaataaaa 1020
ttggaagaga ccagtgttga aagacaaata cttcatcaga aggaaacaaa tcaggatgtg 1080
caatgtaatc gttttttccc aagtagaaaa acaaagcctg tgaaatgtat actaaatgga 1140
ataaacagct cagccaagaa gaactccaac tggactaaaa ttaaactctc aaaatttaac 1200
tctgtgcagc acaataagtt ggactctcaa gtttccccta aattaggctt attacgaacc 1260
agtttttcac caccagcttt agaaatgcat catccagtga ctcaaagtac gtttttaggg 1320
acaaagctac atgatagaaa tataacttgc cagcaggaaa aaatgaaaga aattaattct 1380
gaagaagtga aaattaatga tattacagta gaaattaata aaaccacaga aagggctcct 1440
gaaaattgtc atttggccaa tgagataaaa ccttctgacc caccattgga taatcagatg 1500
aaacattctt ttgattcagc atcaaataag aatttcagcc aatgtttgga atccaagcta 1560
gaaaacagtc cagtggaaaa tgttactgct gcttcgactc tgctcagtca agcaaaaatt 1620
gatacaggag agaataaatt tccaggttca gctccccaac agcatagtat tctcagtaac 1680
cagacatcta aaagcagtga taacagggag acaccacgaa atcattcttt gcctaagtgt 1740
aattcccatt tggagataac aattccaaag gacttgaaac taaaagaagc agagaaaact 1800
gatgaaaaac agttgattat agatgcagga caaaaaagat ttggagcagt ttcttgtaat 1860
gtttgtggaa tgctgtatac agcttcaaat ccagaagatg aaacacagca tctgcttttc 1920
cacaaccagt ttataagtgc tgttaaatat gtgggctgga agaaagaaag aattctggct 1980
gaataccctg atggcaggat aataatggtt cttcctgaag acccaaagta tgccctgaaa 2040
aaggttgacg agattagaga gatggttgac aatgatttag gttttcaaca ggctccacta 2100
atgtgctatt ccagaactaa aacacttctc ttcatttcca atgacaaaaa agtagttggc 2160
tgcctaattg cggaacatat ccaatggggc tacagagtta tagaagagaa acttccagtt 2220
atcaggtcag aagaagaaaa agtcagattt gaaaggcaaa aagcctggtg ctgctcaaca 2280
ttaccagagc ctgcaatctg cgggatcagt cgaatatggg tattcagcat gatgcgtcgg 2340
aagaaaattg cttctcgcat gattgaatgc ctaaggagta actttatata tggctcatat 2400
ttgagcaaag aagaaattgc tttctcagat cccactcctg atggaaagct gtttgcaaca 2460
cagtactgtg gcactggtca atttctggta tataatttta ttaatggaca gaatagcacg 2520
taa 2523

Claims (10)

1. A method of constructing a sequencing library comprising a1) and a 2):
A1) breaking the target DNA by using a transposase complex to obtain a DNA breaking product;
the transposase complex comprises a transposase and a transposable element, wherein the transposable element consists of joints with names of A and B respectively;
the A is composed of single-stranded DNA with names of a1 and c1, the a1 is obtained by sequentially connecting a sequence with a name of primer 1, a barcode sequence with a name of barcode sequence 1 and an identification sequence of the transposase, and the barcode sequence 1 is a random sequence; c1 is the complement of the recognition sequence;
the B consists of single-stranded DNA named B1 and the c1, the B1 is obtained by sequentially connecting a sequence named as a primer 2, a barcode sequence named as a barcode sequence 2 and a recognition sequence of the transposase, and the barcode sequence 2 is a random sequence;
A2) and carrying out PCR amplification on the DNA breaking product to obtain a PCR product, wherein the PCR product is a sequencing library.
2. The following method I or II:
I. a method for constructing a sequencing library, wherein the DNA of the sample to be tested is 1, and the method comprises the following B1) and B2):
B1) breaking the DNA of a sample to be detected by using a transposase complex to obtain a DNA breaking product;
the transposase complex comprises a transposase and a transposable element, wherein the transposable element consists of joints with names of A and B respectively;
the A is composed of single-stranded DNA with names of a1 and c1, the a1 is obtained by sequentially connecting a sequence with a name of primer 1, a barcode sequence with a name of barcode sequence 1 and an identification sequence of the transposase, and the barcode sequence 1 is a random sequence; c1 is the complement of the recognition sequence;
the B consists of single-stranded DNA named B1 and the c1, the B1 is obtained by sequentially connecting a sequence named as a primer 2, a barcode sequence named as a barcode sequence 2 and a recognition sequence of the transposase, and the barcode sequence 2 is a random sequence;
B2) carrying out PCR amplification on the DNA breaking product to obtain a PCR product, wherein the PCR product is a sequencing library;
II. A method for constructing a sequencing library, wherein the DNA of a sample to be tested is n, and n is a natural number which is more than or equal to 1, and the method comprises the following steps of C1) or C2):
C1) respectively breaking the DNA of m samples to be tested by using n transposase complexes to obtain n DNA breaking products; m is a natural number less than or equal to n;
the transposase complex comprises a transposase and a transposable element, wherein the transposable element consists of joints with names of A and B respectively;
the A is composed of single-stranded DNA with names of a1 and c1, the a1 is obtained by sequentially connecting a sequence with a name of primer 1, a barcode sequence with a name of barcode sequence 1 and an identification sequence of the transposase, and the barcode sequence 1 is a random sequence; c1 is the complement of the recognition sequence;
the B consists of single-stranded DNA named B1 and the c1, the B1 is obtained by sequentially connecting a sequence named as a primer 2, a barcode sequence named as a barcode sequence 2 and a recognition sequence of the transposase, and the barcode sequence 2 is a random sequence;
in the n transposase complexes, at least one of the barcode sequence 1 and the barcode sequence 2 is different between any two transposase complexes;
C2) the following C2a) or C2 b):
c2a) mixing the n DNA cleavage products to obtain a mixed cleavage product; carrying out PCR amplification on the mixed interrupt product to obtain a PCR product, wherein the PCR product is a sequencing library;
c2b) carrying out PCR amplification on the n DNA interruption products respectively to obtain n PCR products; and mixing the n PCR products to obtain a mixture, wherein the mixture is a sequencing library.
3. The method according to claim 1 or2, characterized in that: the transposase is Tn5 transposase;
and/or the mixture ratio of the transposase complex and the DNA of the sample to be sequenced is (12.5ng:1ng) - (0.1ng:1 pg).
4. The method according to any one of claims 1-4, wherein: the sequence of the primer 1 is 1-30 bits of the sequence 4;
and/or the recognition sequence of the transposase is 36 th to 54 th positions of the sequence 4;
and/or, the sequence of the primer 2 is 1-30 bits of the sequence 5;
and/or the barcode sequences 1 and 2 are each m1), m2) or m3) in length:
m1)5-30bp;
m2)5-10bp;
m3)5bp。
5. the method according to any one of claims 1-4, wherein: the primers used for PCR amplification consist of forward primers and reverse primers, wherein the forward primers comprise the primer 1 and primer sequences for second-generation sequencing, and the reverse primers comprise the primer 2 and primer sequences for second-generation sequencing.
6. The method according to any one of claims 1-5, wherein: the reverse primer also contains a barcode sequence designated barcode sequence 3, which barcode sequence 3 is a random sequence.
7. The method according to any one of claims 1-6, wherein: the forward primer is single-stranded DNA shown as a sequence 6 in a sequence table, the reverse primer is single-stranded DNA shown as a sequence 7 in the sequence table, and n is any one of a, t, c and g.
A method of DNA sequencing comprising: constructing a sequencing library of more than or equal to 1 sample DNA to be tested by using the method of any one of claims 1 to 7, and then sequencing the sequencing library to complete the sequencing of the more than or equal to 1 sample DNA to be tested.
9. Any of the following products:
x1) the transposase complex of any one of claims 1-7;
x2) kit comprising a transposase complex as defined in any one of claims 1 to 7 and primers for PCR amplification as defined in claim 6 or 7;
x3) data processing system, which can read and assemble the data obtained by the method of claim 8 and derived from the same sample according to the barcode sequence to obtain the sequencing result of the DNA of the sample to be tested.
10. Use of the product of claim 9 in any one of the following Y1) -Y16):
y1) preparing and constructing a sequencing library product;
y2) preparing a DNA sequencing product;
y3) preparing a genotypic product for identifying the plant or animal;
y4) preparing products for identifying or assisting in identifying disease pathogenic sites;
y5) preparing products for diagnosing or assisting in diagnosing human genetic diseases;
y6) preparing a product for treating or assisting in treating human genetic diseases;
y7) to prepare products for use in the prevention or co-prevention of genetic diseases in humans;
y8) constructing a sequencing library;
y9) DNA sequencing;
y10) DNA sample mixed pool sequencing;
y11) identifying the genotype of the plant or animal;
y12) identifying or assisting in identifying disease-causing sites;
y13) diagnosing or aiding in the diagnosis of a human genetic disorder;
y14) for the treatment or co-treatment of genetic disorders in humans;
y15) in the prevention or co-prevention of genetic diseases in humans;
y16) plasmid and DNA fragment sequencing;
or the like, or, alternatively,
use of the method of any one of claims 1-8 in any one of the above Y9) -Y16).
CN202010503754.4A 2019-06-06 2020-06-05 Novel sequencing method of plasmid or DNA fragment with high flux, low cost and high base accuracy Pending CN112048543A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019104907504 2019-06-06
CN201910490750 2019-06-06

Publications (1)

Publication Number Publication Date
CN112048543A true CN112048543A (en) 2020-12-08

Family

ID=73609166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010503754.4A Pending CN112048543A (en) 2019-06-06 2020-06-05 Novel sequencing method of plasmid or DNA fragment with high flux, low cost and high base accuracy

Country Status (1)

Country Link
CN (1) CN112048543A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115948363A (en) * 2022-08-26 2023-04-11 武汉影子基因科技有限公司 Tn5 transposase mutant and preparation method and application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103938277A (en) * 2014-04-18 2014-07-23 中国科学院北京基因组研究所 Trace DNA-based next-generation sequencing library construction method
CN105671644A (en) * 2016-02-26 2016-06-15 武汉冰港生物科技有限公司 Preparation method of genome mixing sequencing library
CN109196115A (en) * 2016-03-01 2019-01-11 通用测序技术公司 Nucleic acid target source is tracked in the method and kit for nucleic acid sequencing
CN109750086A (en) * 2017-11-06 2019-05-14 深圳华大智造科技有限公司 The construction method in single stranded circle library
CN109811045A (en) * 2017-11-22 2019-05-28 深圳华大智造科技有限公司 The construction method of high-throughput unicellular overall length transcript profile sequencing library and its application

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103938277A (en) * 2014-04-18 2014-07-23 中国科学院北京基因组研究所 Trace DNA-based next-generation sequencing library construction method
CN105671644A (en) * 2016-02-26 2016-06-15 武汉冰港生物科技有限公司 Preparation method of genome mixing sequencing library
CN109196115A (en) * 2016-03-01 2019-01-11 通用测序技术公司 Nucleic acid target source is tracked in the method and kit for nucleic acid sequencing
CN109750086A (en) * 2017-11-06 2019-05-14 深圳华大智造科技有限公司 The construction method in single stranded circle library
CN109811045A (en) * 2017-11-22 2019-05-28 深圳华大智造科技有限公司 The construction method of high-throughput unicellular overall length transcript profile sequencing library and its application

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DARREN A. CUSANOVICH等: "Multiplex Single Cell Profiling of Chromatin Accessibility by Combinatorial Cellular Indexing", 《SCIENCE》 *
JIAN WU等: "SALP, a new single-stranded DNA library preparation method especially useful for the high-throughput characterization of chromatin openness states", 《BMC GENOMICS》 *
XIANG GAO等: "Highly-parallel Indexed Tagmentation-reads Assembled Consensus sequencing enables high-throughput cost-effective sequencing of plasmids and DNA fragments with identity", 《JOURNAL OF GENETICS AND GENOMICS》 *
魏建和等: "《中药生物技术》", 28 February 2017, 中国中医药出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115948363A (en) * 2022-08-26 2023-04-11 武汉影子基因科技有限公司 Tn5 transposase mutant and preparation method and application thereof
CN115948363B (en) * 2022-08-26 2024-02-27 武汉影子基因科技有限公司 Tn5 transposase mutant and preparation method and application thereof

Similar Documents

Publication Publication Date Title
Agarwal et al. Advances in molecular marker techniques and their applications in plant sciences
ES2393318T3 (en) Strategies for the identification and detection of high performance polymorphisms
ES2357549T3 (en) STRATEGIES FOR THE IDENTIFICATION AND DETECTION OF HIGH PERFORMANCE OF POLYMORPHISMS.
van Orsouw et al. Complexity reduction of polymorphic sequences (CRoPS™): a novel approach for large-scale polymorphism discovery in complex genomes
CN108368503A (en) Method for controlled dn A fragmentations
CN108611398A (en) Genotyping is carried out by new-generation sequencing
CN108486266B (en) Molecular marker of corn chloroplast genome and application of molecular marker in variety identification
Montgomery et al. Sex-specific markers for waterhemp (Amaranthus tuberculatus) and Palmer amaranth (Amaranthus palmeri)
WO2013106807A1 (en) Scalable characterization of nucleic acids by parallel sequencing
CN108486265B (en) Method for identifying type of male sterile cytoplasm of corn based on KASP technology
CN101948919B (en) Kit used for paternity test of giant pandas
JP2018042548A (en) Method of making dna library and genome dna analyzing method using dna library
CN112048543A (en) Novel sequencing method of plasmid or DNA fragment with high flux, low cost and high base accuracy
JP2023542581A (en) Primer compositions, kits, methods and uses thereof for detecting Y-SNP haplogroups by next generation sequencing technology
Moolhuijzen et al. The complexity of Rhipicephalus (Boophilus) microplus genome characterised through detailed analysis of two BAC clones
Peterson et al. Sequencing plant genomes
CN108699591A (en) Method for the genetic mutation for detecting Viral Hemorrhagic septicemia virus
Won et al. Identification of repetitive DNA sequences in the Chrysanthemum boreale genome
Singh et al. Next-generation sequencing technologies: approaches and applications for crop improvement
CN113811618B (en) Sequencing library construction based on methylated DNA target region, system and application
CN114774409A (en) Secondary sequencing detection system based on 224 InDel and 57 SNP sites
Jiang et al. Old can be new again: HAPPY whole genome sequencing, mapping and assembly
CN113046466A (en) SNP loci significantly associated with wheat powdery mildew resistance and application thereof in genetic breeding
Khan et al. Plant molecular breeding: way forward through next-generation sequencing
Whitehouse Genes and Genomes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination