CN112680431B - DNA library preparation method based on fragmenting enzyme - Google Patents

DNA library preparation method based on fragmenting enzyme Download PDF

Info

Publication number
CN112680431B
CN112680431B CN202110299643.0A CN202110299643A CN112680431B CN 112680431 B CN112680431 B CN 112680431B CN 202110299643 A CN202110299643 A CN 202110299643A CN 112680431 B CN112680431 B CN 112680431B
Authority
CN
China
Prior art keywords
dna
enzyme
fragmenting
library
reaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110299643.0A
Other languages
Chinese (zh)
Other versions
CN112680431A (en
Inventor
赵曼曼
申冕
位小丫
张清仪
郑紫君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou inshore protein Technology Co.,Ltd.
Original Assignee
Wujiang Novoprotein Scientific Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wujiang Novoprotein Scientific Inc filed Critical Wujiang Novoprotein Scientific Inc
Priority to CN202110299643.0A priority Critical patent/CN112680431B/en
Publication of CN112680431A publication Critical patent/CN112680431A/en
Application granted granted Critical
Publication of CN112680431B publication Critical patent/CN112680431B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a novel fragmenting enzyme which can randomly break a genome into fragments of about 50-150bp at high efficiency. The invention also provides a one-step method for quickly constructing the DNA library. The database building method comprises the following steps: the steps of DNA (genome or other large-fragment dsDNA sequences) fragmentation, end-to-end A addition and linker ligation are integrated into one step, a one-tube reaction is carried out, and a DNA library is obtained through purification and enrichment. The one-step library construction method can efficiently and quickly obtain a high-quality DNA sequencing library.

Description

DNA library preparation method based on fragmenting enzyme
Technical Field
The invention relates to the field of biological sequencing, in particular to a DNA library preparation method based on a fragmenting enzyme.
Background
Next-generation sequencing (NGS), also known as high-throughput sequencing or Next-generation sequencing, can determine a large number of nucleic acid sequences at one time, is widely used for genome and transcriptome research, and has a great number of applications in the fields of medicine, forensic science, and the like. Due to the ultrahigh sequencing capability, the method has extremely important application in scientific research and clinic. A new generation of sequencing technology brings a revolutionary change to scientific research. Sequencing hundreds of thousands to millions of DNA sequences at a time makes full-face analysis of genetic information from DNA to RNA of a species possible at a time, and enables the human to be cognized to nature and self to rise to a new level. With the increase of sequencing flux of the current gene sequencing, the sequencing cost is reduced, and the method is widely used.
The rapid development of a new generation sequencing technology greatly increases sequencing flux, and requires that upstream sample processing is as simple and rapid as possible, so as to improve the working efficiency of the whole NGS process. For next generation sequencing technologies, library construction, especially DNA library construction, can be basically divided into two steps: constructing a first step library and carrying out on-machine sequencing on the second step library.
The first step of library construction is generally divided into random breaking of DNA, end repair, linker ligation, PCR amplification of DNA fragments connected with linkers, and quality control to meet the sequencing requirements of computer. Specifically, due to problems of Illumina sequencing strategy itself, the sequencing length cannot be too long, and therefore, genomic DNA is fragmented (disrupted) by a physical or biological enzyme method for a long fragment of gene; if the sequences with blunt ends are generated after the breaking, enzyme is needed to be used for filling; after completion of filling in, a specific A base tail is added to the 3' end by using an enzyme; then, adding an adaptor sequence (one part is a primer sequence required by sequencing, and the other part is a primer sequence required by library construction and amplification) by utilizing a complementary pairing principle; PCR amplification and enrichment are carried out, so that the concentration of the DNA sample meets the requirement of the computer.
The second step is to perform on-machine sequencing on the constructed library using a sequencer from Illumina, Ion Torrent, etc. The Hiseq series sequencing platform of Illumina is one of the high-throughput sequencing platforms, and is responsible for more than 80% of nucleic acid sequencing work worldwide. The Illumina new generation sequencing technology adopts the principle of sequencing while synthesis, forms correct base complementary pairing in the base extension process, and performs rapid and high-flux determination on nucleic acid fragments in the base extension process according to different fluorescent labels carried by the bases and the combination of fluorescent detection elements.
The key step in the beginning of genome sequencing is the preparation of nucleic acid samples in the construction of sequencing libraries. The quality of the sequencing library greatly affects the quality of sequencing. Library quality is affected to varying degrees in different steps of existing production methods: first, the differences caused in the genome are fragmented by a physical or biological enzymatic method; secondly, reagents in the library building process are various, the reaction is complex, and the library building results are obviously different for different samples; finally, the tedious operation steps and lengthy reaction time in the library construction process affect the quality of the library.
In view of the above, there is a need in the art to develop a sequencing and library-building method that is efficient, rapid, easy to operate, and simple in steps.
Disclosure of Invention
The invention aims to provide a sequencing and database building method which is efficient, rapid, easy to operate and simple in steps.
The invention aims to provide a method for rapidly preparing a DNA library by a one-step method and application thereof. In a first aspect of the present invention, there is provided a fragmenting enzyme, said fragmenting enzyme being a fusion protein formed by fusing a H1 histone element and a fokl protein element, said fusion protein having the structure of formula I or II:
Z0―L1―Z1―L2―Z2 (I)
Z0―L1―Z2―L2―Z1 (II)
in the formula (I), the compound is shown in the specification,
z0 is absent or selected from: a signal peptide, a tag sequence, or a combination thereof;
z1 is H1 histone element;
z2 is a fokl protein element;
l1 is a no or linker sequence;
l2 is a no or linker sequence;
"-" is a bond.
In another preferred embodiment, the H1 histone element has the amino acid sequence as shown in the 11-214 position of SEQ ID NO. 5.
In another preferred embodiment, the FokI protein element has the amino acid sequence as shown in position 215-410 of SEQ ID NO. 5.
In another preferred embodiment, the amino acid sequence of the fragmenting enzyme is shown as positions 1-410 of SEQ ID No. 5 or positions 11-410 of SEQ ID No. 5.
In another preferred embodiment, the fragmenting enzyme has an activity of randomly cleaving DNA.
In another preferred embodiment, the fragmenting enzyme can randomly cleave genomic DNA into DNA fragments of length d.
In another preferred embodiment, d is 30-250bp, preferably 50-200bp, more preferably 50-150 bp.
In another preferred embodiment, the fragmenting enzyme is Hifase, which has an amino acid sequence as shown in positions 11-410 of SEQ ID NO 5.
In another preferred embodiment, the fragmenting enzyme further comprises a tag sequence for facilitating expression and/or purification.
In another preferred embodiment, the tag sequence comprises a 6His tag.
In a second aspect of the present invention, there is provided a method of fragmenting DNA, comprising the steps of:
(a) providing a fragmenting enzyme according to the first aspect of the present invention;
(b) mixing the fragmenting enzyme with a DNA sample for reaction to obtain a DNA fragment;
wherein the length of the obtained DNA fragment is 30-250bp, preferably 50-200bp, more preferably 50-150 bp.
In another preferred embodiment, the cell is selected from the group consisting of: a somatic cell.
In another preferred embodiment, the cell is a human embryonic kidney cell (293T).
In another preferred example, in the step (b), the reaction condition is 30-35 ℃ and 30-60 min.
In another preferred embodiment, in step (b), the fragmenting enzyme: the proportion of the DNA sample is selected from 0.005-1.5 mug: 1-1000 ng.
In another preferred embodiment, in step (b), the fragmenting enzyme: the mass ratio of the DNA samples is 50:1 to 1:10, preferably 20: 1 to 1: 2.
In another preferred embodiment, in step (b), the 5' end of the DNA fragment obtained contains a sticky end.
In another preferred embodiment, the length of the sticky end is selected from 3-5 nt, preferably 3, 4, 5 nt.
In another preferred example, in the step (b), the nucleotide at the 5' end of the DNA fragment obtained carries a phosphate group.
In another preferred embodiment, the amino acid sequence of the fragmenting enzyme is shown as positions 1-410 of SEQ ID No. 5 or positions 11-410 of SEQ ID No. 5.
In a third aspect of the invention, there is provided a use of a fragmenting enzyme according to the first aspect of the invention for preparing reagents for constructing a DNA sequencing library; or a reagent for fragmenting a DNA sample.
In another preferred embodiment, the fragmentation is random fragmentation.
In a fourth aspect of the present invention, there is provided a method for constructing a DNA sequencing library, comprising the steps of:
(a) providing a genomic DNA sample;
(b) subjecting the DNA sample to a fragmentation reaction by a fragmentation enzyme according to the first aspect of the invention to obtain a first mixture containing DNA fragments;
(c) adding linkers to both ends of the DNA fragments in the first mixture in (b) to obtain a second mixture containing DNA fragments whose ends are provided with linkers;
(d) amplifying the DNA fragment with the adaptor at the end obtained in the previous step by using a specific primer, thereby obtaining the DNA sequencing library.
In another preferred embodiment, steps (b) and (c) are carried out simultaneously or sequentially in the same reaction system.
In another preferred embodiment, in steps (b) and (c), in the same reaction system, the fragmenting enzyme: DNA sample: the content ratio of the joint is selected from 0.005-1.5 mug: 1-1000 ng: 2 to 30. mu. mol.
In another preferred embodiment, in steps (b) and (c), in the same reaction system, the ratio of the DNA sample: the content ratio of the joint is selected from 1-1000 ng: 2 to 30. mu. mol.
In another preferred example, step (d) is preceded by step (d 1): purifying the second mixture obtained in the step (c) by magnetic beads to obtain purified DNA fragments with joints at the ends.
In another preferred example, in step (b), the fragmentation reaction conditions include: 30-35 ℃ for 30-60 min.
In another preferred embodiment, in step (b), the DNA sample has a length of 46.71 to 248.96 Mb.
In another preferred embodiment, in step (b), the length of the DNA fragment (i.e., DNA fragmentation product) is 30-250bp, preferably 50-200bp, more preferably 50-150 bp.
In another preferred embodiment, in step (b), the method further comprises performing terminal modification on the DNA fragment.
In another preferred embodiment, the terminal modification comprises: the ends were filled in and 3' -end A bases were added.
In another preferred embodiment, the terminal modification is performed by a DNA polymerase.
In another preferred embodiment, the DNA polymerase is taq DNA polymerase.
In another preferred embodiment, in step (b), the fragmenting enzyme: the mass ratio of the DNA samples is selected from 0.005-1.5 mug: 1-1000 ng.
In another preferred embodiment, the fragmenting enzyme: the mass ratio of the DNA samples is 50:1 to 1:10, preferably 20: 1 to 1: 2.
In another preferred embodiment, the DNA sample: the content ratio of the joint is 1-1000 ng: 2 to 30. mu. mol.
In another preferred embodiment, in step (c), the sequence of the linker is selected from the group consisting of:
novo i5 linker:
AATGATATTCCGGCGACCGAGATCTAACACACTCTTTCCCTACACGACGCTCTTGT(SEQ ID NO:1);
novo i7 linker:
CAAGAGCGTCGTCGGCATACTTCTCCGTGAGATGTGACTGGAGTTCAGACGTGTGCTCTTTCCGTC(SEQ ID NO:2)。
in another preferred embodiment, the reaction conditions for adding the linker are selected from: 15-25 ℃ for 15-25 min.
In another preferred embodiment, in step (c), the reaction of adding the linker is performed by a DNA ligase.
In another preferred embodiment, the DNA ligase is selected from the group consisting of: t is4 DNA ligase, T7DNA ligase, T3A DNA ligase.
In another preferred example, in the step (d), the specific primer is a primer that specifically amplifies a DNA fragment with a linker at the end in the second mixture.
In another preferred embodiment, the specific primer is selected from the group consisting of:
primer Novo i 5:
AATGATATTCCGGCGACCGAGATCTAACACACTCTTTCCCTAC(SEQ ID NO:3);
primer Novo i 7:
CAAGCAGAATTCCGGCATACTTCTCCGTGAGATGTGACTGGAGC(SEQ ID NO:4)。
in another preferred embodiment, the amplification comprises PCR amplification, Q-PCR, and/or RT-PCR.
In a fifth aspect of the present invention, there is provided a kit for constructing a DNA sequencing library, the kit comprising:
(Z1) a fragmenting enzyme according to the first aspect of the present invention;
(Z2) other reagents for constructing DNA sequencing libraries;
(Y) description which describes the method of use.
In another preferred embodiment, the other reagents for constructing a DNA sequencing library include:
(a) TaqDNA polymerase, T4A DNA ligase;
(b) magnetic beads;
(c) a reaction buffer;
(d) a linker sequence.
In another preferred example, the using method comprises the following steps: constructing a sequencing library using the method according to the fourth aspect of the invention.
In another preferred embodiment, the kit further comprises: a specific primer that specifically binds to the linker sequence.
In a sixth aspect of the invention, there is provided a nucleic acid molecule encoding a fragmenting enzyme according to the first aspect of the invention.
In a seventh aspect of the invention, there is provided a vector comprising a nucleic acid molecule according to the sixth aspect of the invention.
In an eighth aspect of the invention, there is provided a host cell comprising a vector according to the seventh aspect of the invention, or having integrated into its chromosome an exogenous nucleic acid molecule according to the sixth aspect of the invention, or expressing a fragmenting enzyme according to the first aspect of the invention.
In another preferred embodiment, the cell is an isolated cell, and/or the cell is a genetically engineered cell.
In a ninth aspect of the present invention, there is provided a method of preparing a protein, the method comprising the steps of:
(a) culturing a host cell according to the eighth aspect of the invention to obtain a fragmenting enzyme according to the first aspect of the invention.
In another preferred example, the method further comprises the steps of: (b) purifying the fragmenting enzyme obtained in step (a).
The beneficial effects of the invention include:
the DNA library prepared by the new fragmenting enzyme rapidly by the one-step method is consistent with the traditional method in the aspect of fragment coverage distribution, the library yield (the ratio of DNA sample amount to the sequencing library) is in a good linear relation, and the library construction efficiency (success rate, time cost and the like) is high. The one-tube library building method provides a new rapid library building scheme for next-generation sequencing.
It is to be understood that within the scope of the present invention, the above-described features of the present invention and those specifically described below (e.g., in the examples) may be combined with each other to form new or preferred embodiments. Not to be reiterated herein, but to the extent of space.
Drawings
FIG. 1 shows the expression vector of Hifase and the result of SDS-PAGE electrophoresis. Wherein, 1A shows a schematic diagram of an expression vector of Hifase, and 1B shows an SDS-PAGE electrophoresis chart of Hifase.
FIG. 2 shows a schematic diagram of one-step rapid DNA library construction of the present invention, wherein adapter is an adapter, Novo i5 adapter comprises P5, i5 and Rd1 SP fragments, Novo i7 adapter comprises P7, i7 index (i 7 marker) and Rd2 SP fragments, index is used to distinguish different samples, Rd1 SP and Rd2 SP are Read1 and Read2 sequencing primer binding region, and "NN" is 4 bases filling in 5' cohesive end.
FIG. 3 shows the situation where the fragmenting enzyme Hifase breaks the genome at different times. Wherein, each lane from left to right shows the fragment size after being interrupted by 5min, 10min, 15min, 20 min, 30min, 40 min and 60min for the genome sample, wherein the lane M is a DNA molecule marker.
FIG. 4 shows the agarose gel results of one-step rapid DNA banking; wherein, the lanes are as follows: lane 1: (iii) hfase fragmentation results; lane 2 and lane 4: fragmenting, expanding the tail A at the tail end and adding a joint to obtain a tubular reaction result; lane 3 and lane 5: the results of PCR enrichment in lanes 2 and 4, respectively.
FIG. 5 shows the quality control results of the library construction using 200ng of DNA as the starting amount. In the figure, the abscissa represents the fragment size (bp), the ordinate represents the fluorescence intensity (FU), and each peak represents the concentration of a DNA fragment distributed over a certain base length. The peaks at 35bp and 10380bp are shown in the figure as the peaks of high-sensitivity DNA Markers, and the peak at 250-400bp is the DNA fragment library (duplicate) obtained by library construction in the example.
FIG. 6 shows the amino acid sequence of the Hifase enzyme of the present invention (with His-tag, SEQ ID No: 5) in which positions 11-214 are the H1 histone portion; 215-410 is the FokI portion.
Detailed Description
The present inventors have extensively and intensively studied to construct a novel fragmenting enzyme, Hifase, which is a fusion protein of H1 histone and fokl protein. The fragmenting enzyme can efficiently randomly break a genome into fragments of about 50-150bp, and is suitable for the construction of a subsequent DNA library. The invention also provides a method for constructing a DNA sequencing library based on the one-step method of the fragmenting enzyme. Specifically, the steps of DNA sample fragmentation, end repair, end addition of A tail and linker ligation are integrated into one step to perform a one-tube reaction. And then, obtaining a high-quality genome DNA library through purification and enrichment. By the library construction method, a stable high-quality library can be obtained only in 1.5 hours, and the distribution of the DNA library fragment coverage is concentrated. The present invention has been completed based on this finding.
Term(s) for
Before the present invention is described, it is to be understood that this invention is not limited to the particular methodology and experimental conditions described, as such methodologies and conditions may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
As used herein, the term "about" when used in reference to a specifically recited value means that the value may vary by no more than 1% from the recited value. For example, as used herein, the expression "about 100" includes 99 and 101 and all values in between (e.g., 99.1, 99.2, 99.3, 99.4, etc.).
As used herein, the term "comprising" or "includes" can be open, semi-closed, and closed. In other words, the term also includes "consisting essentially of …," or "consisting of ….
The three letter codes and the one letter codes for amino acids used in the present invention are as described in j. diol. chem,243, p3558 (1968).
As used herein, the term "optional" or "optionally" means that the subsequently described event or circumstance may, but need not, occur.
"sequence identity" as referred to herein means the degree of identity between two nucleic acid or two amino acid sequences when optimally aligned and compared with appropriate mutations such as substitutions, insertions or deletions. The sequence identity between a sequence described in the present invention and a sequence with which it is identical may be at least 85%, 90% or 95%, preferably at least 95%. Non-limiting examples include 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100%.
Enzyme
As used herein, "the fragmenting enzyme of the present invention", "fragmenting enzyme Hifase", "Hifase enzyme" are used interchangeably and all refer to the novel endonuclease described in the first aspect of the present invention which is useful as a fragmenting enzyme for preparing libraries. It is to be understood that the term also includes active fragments, fusion proteins or muteins of the fragmenting enzyme of the present invention having the same fragmentation function as the Hifase enzyme, in particular a random fragmentation function.
Preferably, the present invention provides a fragment enzyme, Hifase, the core part of which is a fusion protein formed by a H1 histone element and a FokI protein element (a linker may or may not be added between the two), such as a fusion protein of about 400 amino acids in length without a linker. Preferably, the fusion protein has an amino acid sequence as shown in positions 11-410 of SEQ ID NO. 5.
MGSSHHHHHHSSGLVPRGSHMTENSTSAPAAKPKRAKASKKSTDHPKYSDMIVAAIQAEKNRAGSSRQ SIQKYIKSHYKVGENADSQIKLSIKRLVTTGVLKQTKGVGASGSFRLAKSDEPKKSVAFKKTKKEIKKVATPKKAS KPKKAASKAPTKKPKATPVKKAKKKLAATPKKAKKPKTVKAKPVKASKPKKAKPVKPKAKSSAKRAGKKK QLVKSE LEEKKSELRHKLKYVPHEYIELIEIARNSTQDRILEMKVMEFFMKVYGYRGKHLGGSRKPDGAIYTVGSPIDYGVI VDTKAYSGGYNLPIGQADEMQRYVEENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNHITNC NGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINF (SEQ ID NO:5)
Wherein, the 11 th to 214 th sites in the SEQ ID NO. 5 sequence are H1 histone part; position 215-.
In another preferred embodiment, the amino acid sequence of the His-tagged Hifase protein is shown in SEQ ID NO:5, and the nucleotide sequence thereof is shown in SEQ ID NO:6, wherein the underlined parts (positions 31-645) are the coding sequences for the H1 histone proteins, and positions 646-1236 are the coding sequences for the FokI element.
ATGGGCAGCAGCCATCATCATCATCATCACAGCAGCGGCCTGGTGCCGCGCGGCAGCCATATGACGGA AAATAGCACCAGCGCCCCAGCCGCCAAACCAAAACGCGCCAAAGCCAGCAAGAAGAGCACCGACCACCCAAAATAC AGCGACATGATCGTGGCCGCCATCCAAGCCGAGAAAAATCGCGCCGGTAGCAGCCGCCAGAGCATCCAGAAATACA TCAAGAGCCACTATAAGGTGGGCGAGAACGCCGACAGCCAGATCAAGCTGAGCATCAAGCGCCTCGTTACCACGGG CGTTCTGAAACAGACCAAAGGCGTTGGCGCGAGCGGTAGCTTTCGTCTGGCCAAAAGTGATGAGCCGAAGAAGAGC GTGGCCTTCAAGAAGACGAAGAAGGAGATCAAGAAGGTGGCCACCCCGAAAAAAGCCAGTAAACCGAAGAAAGCGG CGAGCAAAGCGCCGACCAAGAAACCGAAAGCGACGCCGGTGAAGAAGGCGAAGAAAAAACTGGCGGCCACGCCGAA AAAGGCCAAGAAGCCGAAGACCGTGAAAGCCAAACCGGTGAAGGCGAGCAAACCGAAAAAAGCGAAGCCGGTTAAG CCGAAAGCCAAAAGCAGCGCCAAACGCGCCGGCAAGAAAAAAGCC CAGCTGGTGAAAAGCGAACTGGAAGAGAAGA AGAGTGAGCTGCGCCACAAACTCAAGTACGTTCCGCACGAGTACATCGAACTCATCGAGATCGCCCGCAATAGCAC GCAAGATCGCATTCTGGAAATGAAAGTGATGGAGTTTTTTATGAAAGTGTACGGCTACCGCGGCAAACATCTGGGC GGCAGTCGCAAACCAGATGGTGCCATTTACACGGTTGGCAGCCCAATCGACTACGGCGTGATCGTTGACACCAAGG CGTACAGCGGCGGCTACAATCTGCCGATCGGCCAAGCCGATGAGATGCAGCGTTACGTGGAAGAGAACCAGACCCG CAATAAACACATTAATCCAAACGAATGGTGGAAAGTGTACCCGAGCAGCGTGACCGAGTTCAAGTTTCTCTTCGTG AGCGGCCATTTCAAGGGCAACTACAAGGCGCAACTGACCCGCCTCAACCACATTACCAACTGCAACGGCGCCGTGC TGAGCGTTGAAGAGCTGCTGATCGGTGGCGAAATGATCAAAGCCGGTACGCTCACGCTGGAAGAAGTTCGCCGCAA GTTTAACAACGGCGAGATCAACTTTTAA(SEQ ID NO:6)。
The fragmenting enzyme, Hifase, of the present invention has many advantages and is well suited as a library of fragmenting enzymes. The method is characterized in that:
(1) the reaction of the Hifase enzyme belongs to an enzyme-dependent type, is not limited by the concentration and the content of a substrate (namely a genome), and only the addition amount of the enzyme is considered in the enzyme reaction;
(2) the fragmentation time of the Hifase enzyme is shortened along with the increase of the enzyme amount (0.005-1.3 mu g);
(3) the Hifase enzyme can randomly break a genome into small segments with a certain length range (about 50-150 bp), and is very suitable for constructing a library;
(4) the Hifase enzyme does not recognize specific sequence sites, has no GC base bias, can randomly cut the genome, and is suitable for the fragmentation of all genomes (including genomes and DNA sequences of complex organisms);
(5) the Hifase enzyme can completely break the added genome DNA into small fragments, thereby greatly avoiding the loss of important sequence fragments;
(6) the reaction conditions of the Hifase enzyme are relatively loose, and the Hifase enzyme has practicability;
(7) the 5' end of the DNA fragment cut randomly by the Hifase enzyme is a sticky end (4 nt in length) instead of a blunt end;
(8) the 5' end of the DNA fragment randomly cut by the Hifase enzyme contains phosphate group, and no phosphokinase needs to be added in the reaction.
In particular, the fragmented Hifase enzyme randomly cuts into DNA fragments with sticky ends, so that an additional end repairing enzyme required by general library construction is not required to be added in a one-step library construction reaction. In the one-step library building reaction, sequence completion and A tail adding operation can be realized only by Taq polymerase, so that a reaction system is optimized, the types of reagents are reduced, and the reaction time is greatly shortened.
Particularly, Taq polymerase is used for carrying out the reaction of adding the A tail, so that the reaction efficiency is greatly improved; preferably up to 90%, more preferably up to 100%. In addition, the reaction efficiency of adding the A tail is improved, so that the subsequent connection reaction with a connector (adapter) is improved to about 80% from the efficiency of less than 30% in the traditional method, and the reaction success rate is improved.
Fusion proteins
As used herein, "fusion protein of the invention" and "recombinant fusion protein" each refer to a fusion protein according to the first aspect of the invention.
Specifically, the fusion protein of the invention comprises a structure shown in formula I or formula II:
Z0―L1―Z1―L2―Z2 (I)
Z0―L1―Z2―L2―Z1 (II)
in the formula (I), the compound is shown in the specification,
z0 is absent or selected from: a signal peptide, a tag sequence, or a combination thereof;
z1 is H1 histone element;
z2 is a fokl protein element;
l1 is a no or linker sequence;
l2 is a no or linker sequence;
"-" is a bond.
The term "fusion protein" as used herein also includes variants having the above-described activities. These variants include (but are not limited to): deletion, insertion and/or substitution of 1 to 3 (usually 1 to 2, more preferably 1) amino acids, and addition or deletion of one or several (usually up to 3, preferably up to 2, more preferably up to 1) amino acids at the C-terminal and/or N-terminal. For example, in the art, substitutions with amino acids of similar or similar properties will not generally alter the function of the protein. Also, for example, the addition or deletion of one or several amino acids at the C-terminus and/or N-terminus does not generally alter the structure and function of the protein. In addition, the term also includes monomeric and multimeric forms of the polypeptides of the invention. The term also includes linear as well as non-linear polypeptides (e.g., cyclic peptides).
The invention also includes active fragments, derivatives and analogs of the above fusion proteins. As used herein, the terms "fragment," "derivative," and "analog" refer to a polypeptide that substantially retains the function or activity of a fusion protein of the invention. The polypeptide fragment, derivative or analogue of the present invention may be (i) a polypeptide in which one or more conserved or non-conserved amino acid residues (preferably conserved amino acid residues) are substituted, or (ii) a polypeptide having a substituent group in one or more amino acid residues, or (iii) a polypeptide in which a polypeptide is fused with another compound (such as a compound for increasing the half-life of the polypeptide, e.g., polyethylene glycol), or (iv) a polypeptide in which an additional amino acid sequence is fused with the polypeptide sequence (a fusion protein in which a tag sequence such as a leader sequence, a secretory sequence or 6His is fused). Such fragments, derivatives and analogs are within the purview of those skilled in the art in view of the teachings herein.
A preferred class of reactive derivatives refers to polypeptides formed by the replacement of up to 3, preferably up to 2, more preferably up to 1 amino acid with an amino acid of similar or analogous nature compared to the amino acid sequence of the present invention. These conservative variants are preferably produced by amino acid substitutions according to Table A.
TABLE A
Figure 788494DEST_PATH_IMAGE001
The invention also provides analogs of the fusion proteins of the invention. These analogs may differ from the polypeptides of the invention by amino acid sequence differences, by modifications that do not affect the sequence, or by both. Analogs also include analogs having residues other than the natural L-amino acids (e.g., D-amino acids), as well as analogs having non-naturally occurring or synthetic amino acids (e.g., beta, gamma-amino acids). It is to be understood that the polypeptides of the present invention are not limited to the representative polypeptides exemplified above.
In addition, modifications may be made to the fusion proteins of the invention. Modified (generally without altering primary structure) forms include: chemically derivatized forms of the polypeptide, such as acetylation or carboxylation, in vivo or in vitro. Modifications also include glycosylation, such as those resulting from glycosylation modifications in the synthesis and processing of the polypeptide or in further processing steps. Such modification may be accomplished by exposing the polypeptide to an enzyme that performs glycosylation, such as a mammalian glycosylase or deglycosylase. Modified forms also include sequences having phosphorylated amino acid residues (e.g., phosphotyrosine, phosphoserine, phosphothreonine). Also included are polypeptides modified to increase their resistance to proteolysis or to optimize solubility.
The term "polynucleotide encoding a fusion protein of the present invention" may include a polynucleotide encoding a fusion protein of the present invention, and may also include polynucleotides that additionally include coding and/or non-coding sequences.
The invention also relates to variants of the above polynucleotides which encode fragments, analogs and derivatives of the polypeptides or fusion proteins having the same amino acid sequence as the present invention. These nucleotide variants include substitution variants, deletion variants and insertion variants. As is known in the art, an allelic variant is a substitution of a polynucleotide, which may be a substitution, deletion, or insertion of one or more nucleotides, without substantially altering the function of the fusion protein encoded thereby.
The present invention also relates to polynucleotides which hybridize to the sequences described above and which have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences. The present invention particularly relates to polynucleotides hybridizable under stringent conditions (or stringent conditions) with the polynucleotides of the present invention. In the present invention, "stringent conditions" mean: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2 XSSC, 0.1% SDS, 60 ℃; or (2) adding denaturant during hybridization, such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll, 42 deg.C, etc.; or (3) hybridization occurs only when the identity between two sequences is at least 90% or more, preferably 95% or more.
The fusion proteins and polynucleotides of the invention are preferably provided in isolated form, and more preferably, purified to homogeneity.
The full-length sequence of the polynucleotide of the present invention can be obtained by PCR amplification, recombination, or artificial synthesis. For PCR amplification, primers can be designed based on the nucleotide sequences disclosed herein, particularly open reading frame sequences, and the sequences can be amplified using commercially available cDNA libraries or cDNA libraries prepared by conventional methods known to those skilled in the art as templates. When the sequence is long, two or more PCR amplifications are often required, and then the amplified fragments are spliced together in the correct order.
Once the sequence of interest has been obtained, it can be obtained in large quantities by recombinant methods. This is usually done by cloning it into a vector, transferring it into a cell, and isolating the relevant sequence from the propagated host cell by conventional methods.
In addition, the sequence can be synthesized by artificial synthesis, especially when the fragment length is short. Generally, fragments with long sequences are obtained by first synthesizing a plurality of small fragments and then ligating them.
At present, DNA sequences encoding the proteins of the present invention (or fragments or derivatives thereof) have been obtained completely by chemical synthesis. The DNA sequence may then be introduced into various existing DNA molecules (or vectors, for example) and cells known in the art.
Methods for amplifying DNA/RNA using PCR techniques are preferably used to obtain the polynucleotides of the invention. Particularly, when it is difficult to obtain a full-length cDNA from a library, it is preferable to use the RACE method (RACE-cDNA terminal rapid amplification method), and primers used for PCR can be appropriately selected based on the sequence information of the present invention disclosed herein and synthesized by a conventional method. The amplified DNA/RNA fragments can be isolated and purified by conventional methods, such as by gel electrophoresis.
Expression vector
The invention also relates to vectors comprising the polynucleotides of the invention, as well as genetically engineered host cells transformed with the vectors of the invention or the coding sequences of the fusion proteins of the invention, and methods for producing the polypeptides of the invention by recombinant techniques.
The polynucleotide sequences of the present invention may be used to express or produce recombinant fusion proteins by conventional recombinant DNA techniques. Generally, the following steps are performed:
(1) transforming or transducing a suitable host cell with a polynucleotide (or variant) of the invention encoding a fusion protein of the invention, or with a recombinant expression vector comprising the polynucleotide;
(2) a host cell cultured in a suitable medium;
(3) isolating and purifying the protein from the culture medium or the cells.
In the present invention, the polynucleotide sequence encoding the fusion protein may be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to a bacterial plasmid, bacteriophage, yeast plasmid, plant cell virus, mammalian cell virus such as adenovirus, retrovirus, or other vectors well known in the art. Any plasmid or vector may be used as long as it can replicate and is stable in the host. An important feature of expression vectors is that they generally contain an origin of replication, a promoter, a marker gene and translation control elements.
Methods well known to those skilled in the art can be used to construct expression vectors containing the DNA sequences encoding the fusion proteins of the present invention and appropriate transcription/translation control signals. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like. The DNA sequence may be operably linked to a suitable promoter in an expression vector to direct mRNA synthesis. Representative examples of such promoters are: lac or trp promoter of E.coli; a lambda phage PL promoter; eukaryotic promoters include CMV immediate early promoter, HSV thymidine kinase promoter, early and late SV40 promoter, LTRs of retrovirus, and other known promoters capable of controlling gene expression in prokaryotic or eukaryotic cells or viruses. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator.
Furthermore, the expression vector preferably comprises one or more selectable marker genes to provide phenotypic traits for selection of transformed host cells, such as dihydrofolate reductase, neomycin resistance and Green Fluorescent Protein (GFP) for eukaryotic cell culture, or tetracycline or ampicillin resistance for E.coli.
Vectors comprising the appropriate DNA sequences described above, together with appropriate promoter or control sequences, may be used to transform appropriate host cells to enable expression of the protein.
The host cell may be a prokaryotic cell, such as a bacterial cell; or lower eukaryotic cells, such as yeast cells; or higher eukaryotic cells, such as mammalian cells. Representative examples are: escherichia coli, streptomyces; bacterial cells of salmonella typhimurium; fungal cells such as yeast, plant cells (e.g., ginseng cells).
When the polynucleotide of the present invention is expressed in higher eukaryotic cells, transcription will be enhanced if an enhancer sequence is inserted into the vector. Enhancers are cis-acting elements of DNA, usually about 10 to 300 base pairs, that act on a promoter to increase transcription of a gene. Examples include the SV40 enhancer at the late side of the replication origin at 100 to 270 bp, the polyoma enhancer at the late side of the replication origin, and adenovirus enhancers.
It will be clear to one of ordinary skill in the art how to select appropriate vectors, promoters, enhancers and host cells.
Transformation of a host cell with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art. When the host is prokaryotic, e.g., E.coli, competent cells capable of DNA uptake can be harvested after exponential growth phase using CaCl2Methods, the steps used are well known in the art. Another method is to use MgCl2. If desired, transformation can also be carried out by electroporation. When the host is a eukaryote, the following DNA transfection methods may be used: calcium phosphate coprecipitation, conventional mechanical methods such as microinjection, electroporation, liposome encapsulation, etc.
The obtained transformant can be cultured by a conventional method to express the polypeptide encoded by the gene of the present invention. The medium used in the culture may be selected from various conventional media depending on the host cell used. The culturing is performed under conditions suitable for growth of the host cell. After the host cells have been grown to an appropriate cell density, the selected promoter is induced by suitable means (e.g., temperature shift or chemical induction) and the cells are cultured for an additional period of time.
The recombinant polypeptide in the above method may be expressed intracellularly or on the cell membrane, or secreted extracellularly. If necessary, the recombinant protein can be isolated and purified by various separation methods using its physical, chemical and other properties. These methods are well known to those skilled in the art. Examples of such methods include, but are not limited to: conventional renaturation treatment, treatment with a protein precipitant (such as salt precipitation), centrifugation, cell lysis by osmosis, sonication, ultracentrifugation, molecular sieve chromatography (gel filtration), adsorption chromatography, ion exchange chromatography, High Performance Liquid Chromatography (HPLC), and other various liquid chromatography techniques, and combinations thereof.
Construction method for establishing database by one-step method
The invention provides a one-step method rapid DNA library construction method based on an Illumina sequencing platformThe library building method comprises the following steps: DNA (genome or DNA sequence) is mixed with the fragmenting enzyme, Taq polymerase, T of the present invention4Mixing the DNA ligase and the optimized reaction Buffer, carrying out one-tube reaction, and obtaining a high-quality DNA sequencing library through purification and enrichment.
According to the library building method, DNA fragmentation, end repair, A tail addition and joint connection are carried out by adopting a one-tube reaction, and the 4 independent steps are combined into one step, so that the library building time is greatly shortened, and the operation steps are simplified. By the library construction method, a stable high-quality library can be obtained in 1.5 hours.
The invention is suitable for the high-efficiency transformation of biological genome and large-fragment dsDNA sequence samples between 1ng and 1 mu g, and has the advantages of simple and convenient operation, stable performance and good overall sequencing result.
The fragmenting enzyme is particularly suitable for preparing high-quality DNA sequencing libraries and is particularly suitable for a high-throughput sequencer. High-throughput sequencing of dozens or even hundreds of samples can be conveniently realized by matching with index sequences or Barcode sequences.
The high-throughput sequencer has the obvious characteristic of high throughput, and can complete millions or even hundreds of millions of sequencing reactions in one reaction, and often is data obtained by mixing dozens or even hundreds of samples. In order to distinguish these samples, it is necessary to label the adaptors in the library, each sample contains a specific adaptor, each adaptor sequence contains a specific tag sequence, and this specific tag sequence is an index (meaning of index or tag, also called Barcode). Index is an essential component of the linker in high throughput sequencing, namely, each piece of sequencing data (Reads) is subjected to reductive sorting to determine which sample belongs to. The positions, common lengths and numbers of indexes of different sequencing platforms are not completely the same. For example, the index sequence of the Life IonTorrent platform is usually 10-12 bp and is not fixed in length; the Illumina sequencing platform sequence uses an index sequence of 6bp or 8 bp. In a specific embodiment of the invention, an index sequence of 8bp is preferably used.
Kit for constructing sequencing library based on DNA sample
The invention also provides a kit for constructing a sequencing library by using the method, wherein the kit comprises:
(Z1) a fragmenting enzyme, Hifase;
(Z2) other reagents for constructing DNA sequencing libraries;
(Y) description which describes the method of use.
In another preferred embodiment, the other reagents for constructing a DNA sequencing library include:
(a) TaqDNA polymerase, T4A DNA ligase;
(b) magnetic beads;
(c) a reaction buffer;
(d) a linker sequence.
In another preferred embodiment, the kit further comprises: a specific primer that specifically binds to the linker sequence.
A preferred kit comprises:
fragmentation enzyme Hifase (0.1. mu.g/. mu.L, 0.2-13. mu.l);
TaqDNA polymerase (5U/. mu.l, 0.5. mu.l); t is4DNA ligase (2U/. mu.l, 2. mu.l);
magnetic beads: 50 μ l of DNA Selection Beads (1X, Beads: DNA =1:1)
Reaction buffer: the 5 × Reaction Buffer specifically contains 100 mM Tris-HCl (pH8.0), 100 mM MgCl 2750 mM KCl, 100 mM DTT, 10. mu.M dNTPs and ATP, 25% PEG-8000.
Linker sequence: novo i5 linker (10. mu. mol/L, 1-2. mu.l) shown in SEQ ID NO. 1 and Novo i7 linker (10. mu. mol/L, 1-2. mu.l) shown in SEQ ID NO. 2.
Specific primers: novo i5 PCR primer (10. mu. mol/L, 2.5. mu.l) shown in SEQ ID NO: 3 and Novo i7 PCR primer (10. mu. mol/L, 2.5. mu.l) shown in SEQ ID NO: 4.
As used herein, in "mM", "μ M", mM means mmol/L and μ M means μmol/L.
In particular embodiments, the amount of the fragmenting enzyme, Hifase, added increases with increasing genomic concentration. Preferably, a preferred recommended amount of DNA used with linkers (adapters) and Hifase is shown in Table 1:
TABLE 1 recommended amounts of ng-1. mu.g DNA used with linkers (Adapter) and Hifase
Figure 236793DEST_PATH_IMAGE002
The main advantages of the invention include:
1) the novel fragmenting enzyme can efficiently randomly break a genome into small fragments with a certain length range (about 50-150 bp), has short reaction time and loose conditions, and is very suitable for constructing libraries.
2) In the one-step library building reaction, a plurality of terminal repair enzymes required by common library building are not needed, and sequence completion and A tail adding operation can be realized only by Taq polymerase, so that a reaction system is optimized, and the types of used reagents are reduced.
3) The library construction method integrates the steps of DNA sample fragmentation, end repair, end A tail addition and joint connection reaction into one step, and performs one-tube reaction, so that the reaction greatly shortens library construction time, simplifies operation steps, avoids possible errors in multi-step operation, and reduces loss of original materials.
4) The library construction method is suitable for efficient transformation of biological genomes between 1ng and 1 mu g and large-fragment dsDNA sequence samples, and has the advantages of simple and convenient operation, stable performance and good sequencing result.
The invention is further illustrated by the following examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Experimental procedures without specifying the detailed conditions in the following examples, generally followed by conventional conditions such as Sambrook et al, molecular cloning: the conditions described in the Laboratory Manual (New York: Cold Spring Harbor Laboratory Press,1989), or according to the manufacturer's recommendations. Unless otherwise indicated, percentages and parts are by weight.
Primer and method for producing the same
TABLE 2 linker and primer sequences of the invention
Figure 298421DEST_PATH_IMAGE003
Wherein underlining indicates an 8bp index sequence in the present invention, when multiple samples are co-sequenced, the underlined index sequence can be replaced by the corresponding index sequence for different samples, so as to distinguish different samples.
Example 1: preparation and characterization of Hifase
The preparation of the recombinant fusion protein Hifase comprises the following steps:
1) construction of the Hifase expression plasmid: synthesizing a DNA sequence shown as SEQ ID NO. 6 by whole gene, introducing NdeI and xhoI enzyme cutting sites through PCR amplification, cutting and recovering fragments by enzyme cutting, connecting the fragments to a pET30a vector, transforming DH5a, and sequencing to obtain a correct positive expression plasmid;
2) expression of Hifase: transforming the correct plasmid obtained in the step 1) into BL 21 host bacteria, inoculating the host bacteria with the plasmid into a shake flask for culture, adding 0.1-1mM IPTG for induction expression when the concentration of the bacteria reaches OD of 0.2-0.6, and collecting the bacteria after continuously culturing for 2-4 hours;
3) separating and purifying Hifase: crushing the thalli harvested in the step 2), collecting inclusion bodies, renaturing the inclusion bodies to obtain correctly folded soluble protein, and obtaining high-purity recombinant fusion protein Hifase through ion exchange and hydrophobic chromatography.
FIG. 1 shows the expression vector of Hifase and the result of SDS-PAGE electrophoresis. Wherein, 1A shows a schematic diagram of an expression vector of Hifase, and 1B shows an SDS-PAGE electrophoresis chart of Hifase.
Example 2: situation where the fragmenting enzyme Hifase breaks the genome at different times
Genomic DNA was extracted from human 293T cells using a cell/tissue gene DNA extraction kit (general, cat # GK0122), and the concentration of the extracted genome was determined using Nanodrop. 160ng of genome was taken for fragmentation reaction. The reaction system is as follows (table 3):
TABLE 3 fragmentation reaction System
Figure 224789DEST_PATH_IMAGE004
Wherein the 10 XHifasen Buffer used in the reaction specifically comprises 100 mM Tris-HCl (pH8.0), 100 mM MgCl2And 500 mM KCl.
And (3) lightly blowing and beating by using a pipette, fully and uniformly mixing, and respectively placing the tube in a PCR instrument for reaction at 37 ℃ for 5min, 10min, 15min, 20 min, 30min, 40 min and 60 min. Fragmentation was observed.
As shown in FIG. 3, after 30min, the disruption reaction is not substantially carried out, and the disrupted fragments are substantially about 100bp (50-150 bp).
Example 3: one-step method for rapid DNA library construction
One-step library construction was performed using 200ng of genomic DNA (human 293T cell genome) as an initial sample. FIG. 2 shows a schematic diagram of rapid DNA library construction in one step of the present invention. The method specifically comprises the following steps:
1. sample and material preparation
a. Sample preparation: starting samples were 1 ng-1. mu.g genomic DNA (A260/A280 between 1.8-2.0) in Nuclean Free Water or 10mM Tris-HCl (pH8.0) without EDTA.
b. Materials, reagents and instruments: freshly prepared 80% ethanol, purified magnetic beads, linker sequence, 5 × Reaction Buffer, fragmentation enzyme Hifase, TaqDNA polymerase, T4DNA ligase, dNTPs, etc. (all common commercial reagents). 0.2ml EP tube, magnetic frame, PCR instrument, DNA quality control instrument, etc.
2. Fragmentation, end-expansion A and linker-one-tube reaction. The following reagents (table 4) were added to prepare a reaction system:
TABLE 4 reagent composition for one-tube reaction
Figure 835899DEST_PATH_IMAGE005
Wherein the 5 × Reaction Buffer used in the Reaction specifically comprises 100 mM Tris-HCl (pH8.0) and 100 mM MgCl 2750 mM KCl, 100 mM DTT, 10. mu.M dNTPs and ATP, 25% PEG-8000. And (4) lightly beating the mixture by using a pipettor, fully and uniformly mixing the mixture, and centrifuging the reaction solution to the bottom of the tube for a short time. The samples were placed in a PCR instrument and the reaction program set up as follows (table 5):
TABLE 5 PCR reaction program set-up for one-tube reaction
Figure 923941DEST_PATH_IMAGE006
3. DNA fragment length sorting and purifying step of one-tube reaction product
This step purifies the one-tube reaction product using DNA fragment sorting purification magnetic beads (Bomeger organisms, cat # BMSX), and removes the non-effective products such as unligated linker sequences or linker dimers.
And (3) purification operation steps:
1) preparation work: and (4) taking the magnetic beads for sorting and purifying the DNA fragments out of the refrigerator, and balancing at room temperature for at least 30 min. Preparing 80% ethanol;
2) vortex and oscillate or fully reverse the magnetic beads to ensure full mixing;
3) pipette 50. mu.l of DNA Selection Beads (1X, Beads: DNA =1:1) into the one-tube reaction product, and incubate for 5min at room temperature;
4) the PCR tube was briefly centrifuged and placed in a magnetic rack to separate the beads from the liquid, after the solution was clarified (about 5 min), the supernatant was carefully removed;
5) keeping the PCR tube in a magnetic frame all the time, adding 100 mu l of freshly prepared 80% ethanol to rinse the magnetic beads, incubating at room temperature for 30 sec, and carefully removing the supernatant;
6) repeating the step 5);
7) keeping the PCR tube in the magnetic frame all the time, and uncovering and drying for no more than 5 min;
8) the PCR tube was removed from the magnetic stand, 21. mu.l RNase Free Water was added, gently pipetted to mix well and left at room temperature for 5min (TE Buffer elution is used if the purified product is to be preserved). The PCR tube was briefly centrifuged and placed in a magnetic stand and left to stand, after the solution cleared (about 5 min), 20. mu.l of the supernatant was carefully removed to a new PCR tube, and the beads were removed. 20ul of product can be purified and directly used as a template for library amplification.
4. Library amplification
This step will perform PCR amplification enrichment on the purified one-tube reaction product (Table 6).
TABLE 6 reaction reagents for PCR amplification of libraries
Figure 738444DEST_PATH_IMAGE007
The reaction solution was gently mixed and centrifuged briefly to aggregate the reaction solution at the bottom of the tube, and the reaction was carried out according to the following procedure (table 7):
TABLE 7 reaction Programming set-Up for PCR amplification of the library
Figure 468503DEST_PATH_IMAGE008
Set different amplification cycles for different sample starting amounts.
FIG. 4 shows the agarose gel results of one-step rapid DNA banking.
5. Library quality control
In general, the quality of the constructed library needs to be evaluated by concentration detection and length distribution detection, wherein the concentration detection can use qubit combined with picogreen dye; the quality assessment can be determined using the Agilent 2100 DNA 1000 Chip.
The experimental results (see FIG. 5) show that the DNA fragments obtained by the one-step library construction method of the present invention through two repeated experiments are between 250 and 400bp, with concentrated size and stable distribution, and the fragments meet the general criteria of an excellent library (exhibiting a single, smooth peak and close to normal distribution, and the length range is between 150 and 700 bp).
All documents referred to herein are incorporated by reference into this application as if each had been individually incorporated by reference. Furthermore, it will be appreciated that various changes or modifications may be made by those skilled in the art after reading the above teachings of the invention, and such equivalents may fall within the scope of the invention as defined in the appended claims.
Sequence listing
<110> Wujiang Yoashan protein science and technology Co Ltd
<120> DNA library preparation method based on fragmenting enzyme
<130> P2020-2818
<160> 6
<170> SIPOSequenceListing 1.0
<210> 1
<211> 56
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
aatgatattc cggcgaccga gatctaacac actctttccc tacacgacgc tcttgt 56
<210> 2
<211> 66
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
caagagcgtc gtcggcatac ttctccgtga gatgtgactg gagttcagac gtgtgctctt 60
tccgtc 66
<210> 3
<211> 43
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
aatgatattc cggcgaccga gatctaacac actctttccc tac 43
<210> 4
<211> 44
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
caagcagaat tccggcatac ttctccgtga gatgtgactg gagc 44
<210> 5
<211> 410
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 5
Met Gly Ser Ser His His His His His His Ser Ser Gly Leu Val Pro
1 5 10 15
Arg Gly Ser His Met Thr Glu Asn Ser Thr Ser Ala Pro Ala Ala Lys
20 25 30
Pro Lys Arg Ala Lys Ala Ser Lys Lys Ser Thr Asp His Pro Lys Tyr
35 40 45
Ser Asp Met Ile Val Ala Ala Ile Gln Ala Glu Lys Asn Arg Ala Gly
50 55 60
Ser Ser Arg Gln Ser Ile Gln Lys Tyr Ile Lys Ser His Tyr Lys Val
65 70 75 80
Gly Glu Asn Ala Asp Ser Gln Ile Lys Leu Ser Ile Lys Arg Leu Val
85 90 95
Thr Thr Gly Val Leu Lys Gln Thr Lys Gly Val Gly Ala Ser Gly Ser
100 105 110
Phe Arg Leu Ala Lys Ser Asp Glu Pro Lys Lys Ser Val Ala Phe Lys
115 120 125
Lys Thr Lys Lys Glu Ile Lys Lys Val Ala Thr Pro Lys Lys Ala Ser
130 135 140
Lys Pro Lys Lys Ala Ala Ser Lys Ala Pro Thr Lys Lys Pro Lys Ala
145 150 155 160
Thr Pro Val Lys Lys Ala Lys Lys Lys Leu Ala Ala Thr Pro Lys Lys
165 170 175
Ala Lys Lys Pro Lys Thr Val Lys Ala Lys Pro Val Lys Ala Ser Lys
180 185 190
Pro Lys Lys Ala Lys Pro Val Lys Pro Lys Ala Lys Ser Ser Ala Lys
195 200 205
Arg Ala Gly Lys Lys Lys Gln Leu Val Lys Ser Glu Leu Glu Glu Lys
210 215 220
Lys Ser Glu Leu Arg His Lys Leu Lys Tyr Val Pro His Glu Tyr Ile
225 230 235 240
Glu Leu Ile Glu Ile Ala Arg Asn Ser Thr Gln Asp Arg Ile Leu Glu
245 250 255
Met Lys Val Met Glu Phe Phe Met Lys Val Tyr Gly Tyr Arg Gly Lys
260 265 270
His Leu Gly Gly Ser Arg Lys Pro Asp Gly Ala Ile Tyr Thr Val Gly
275 280 285
Ser Pro Ile Asp Tyr Gly Val Ile Val Asp Thr Lys Ala Tyr Ser Gly
290 295 300
Gly Tyr Asn Leu Pro Ile Gly Gln Ala Asp Glu Met Gln Arg Tyr Val
305 310 315 320
Glu Glu Asn Gln Thr Arg Asn Lys His Ile Asn Pro Asn Glu Trp Trp
325 330 335
Lys Val Tyr Pro Ser Ser Val Thr Glu Phe Lys Phe Leu Phe Val Ser
340 345 350
Gly His Phe Lys Gly Asn Tyr Lys Ala Gln Leu Thr Arg Leu Asn His
355 360 365
Ile Thr Asn Cys Asn Gly Ala Val Leu Ser Val Glu Glu Leu Leu Ile
370 375 380
Gly Gly Glu Met Ile Lys Ala Gly Thr Leu Thr Leu Glu Glu Val Arg
385 390 395 400
Arg Lys Phe Asn Asn Gly Glu Ile Asn Phe
405 410
<210> 6
<211> 1236
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
atgggcagca gccatcatca tcatcatcac agcagcggcc tggtgccgcg cggcagccat 60
atgacggaaa atagcaccag cgccccagcc gccaaaccaa aacgcgccaa agccagcaag 120
aagagcaccg accacccaaa atacagcgac atgatcgtgg ccgccatcca agccgagaaa 180
aatcgcgccg gtagcagccg ccagagcatc cagaaataca tcaagagcca ctataaggtg 240
ggcgagaacg ccgacagcca gatcaagctg agcatcaagc gcctcgttac cacgggcgtt 300
ctgaaacaga ccaaaggcgt tggcgcgagc ggtagctttc gtctggccaa aagtgatgag 360
ccgaagaaga gcgtggcctt caagaagacg aagaaggaga tcaagaaggt ggccaccccg 420
aaaaaagcca gtaaaccgaa gaaagcggcg agcaaagcgc cgaccaagaa accgaaagcg 480
acgccggtga agaaggcgaa gaaaaaactg gcggccacgc cgaaaaaggc caagaagccg 540
aagaccgtga aagccaaacc ggtgaaggcg agcaaaccga aaaaagcgaa gccggttaag 600
ccgaaagcca aaagcagcgc caaacgcgcc ggcaagaaaa aagcccagct ggtgaaaagc 660
gaactggaag agaagaagag tgagctgcgc cacaaactca agtacgttcc gcacgagtac 720
atcgaactca tcgagatcgc ccgcaatagc acgcaagatc gcattctgga aatgaaagtg 780
atggagtttt ttatgaaagt gtacggctac cgcggcaaac atctgggcgg cagtcgcaaa 840
ccagatggtg ccatttacac ggttggcagc ccaatcgact acggcgtgat cgttgacacc 900
aaggcgtaca gcggcggcta caatctgccg atcggccaag ccgatgagat gcagcgttac 960
gtggaagaga accagacccg caataaacac attaatccaa acgaatggtg gaaagtgtac 1020
ccgagcagcg tgaccgagtt caagtttctc ttcgtgagcg gccatttcaa gggcaactac 1080
aaggcgcaac tgacccgcct caaccacatt accaactgca acggcgccgt gctgagcgtt 1140
gaagagctgc tgatcggtgg cgaaatgatc aaagccggta cgctcacgct ggaagaagtt 1200
cgccgcaagt ttaacaacgg cgagatcaac ttttaa 1236

Claims (12)

1. A fragmentation enzyme, wherein the fragmentation enzyme is a fusion protein formed by fusing an H1 histone element and a FokI protein element, and the fusion protein has a structure shown in formula I:
Z0―Z1―L2―Z2 (I)
in the formula (I), the compound is shown in the specification,
z0 is absent or selected from: a signal peptide, a tag sequence, or a combination thereof;
z1 is H1 histone element;
z2 is a fokl protein element;
l2 is none;
"-" is a bond; wherein the content of the first and second substances,
the H1 histone element has an amino acid sequence shown as 11-214 of SEQ ID NO. 5; and
the FokI protein element has an amino acid sequence shown as the 215-410 position of SEQ ID NO. 5.
2. The fragmenting enzyme according to claim 1, characterized in that the amino acid sequence of said fragmenting enzyme is as shown in SEQ ID No. 5 from position 1 to 410 or SEQ ID No. 5 from position 11 to 410.
3. A method of fragmenting DNA comprising the steps of:
(a) providing a fragmenting enzyme according to any one of claims 1-2;
(b) mixing the fragmenting enzyme with a DNA sample for reaction to obtain a DNA fragment;
wherein the length of the obtained DNA fragment is 30-250 bp.
4. The method of claim 3, wherein the DNA fragment obtained is 50-200bp in length.
5. The method of claim 3, wherein the DNA fragment obtained is 50-150bp in length.
6. Use of a fragmenting enzyme as claimed in any one of claims 1 to 2 for the preparation of reagents for the construction of DNA sequencing libraries; or a reagent for fragmenting a DNA sample.
7. A method for constructing a DNA sequencing library, comprising the steps of:
(a) providing a genomic DNA sample;
(b) subjecting the DNA sample to a fragmentation reaction by a fragmentation enzyme according to any of claims 1-2 to obtain a first mixture containing DNA fragments;
(c) adding linkers to both ends of the DNA fragments in the first mixture in (b) to obtain a second mixture containing DNA fragments whose ends are provided with linkers;
(d) amplifying the DNA fragments with the adapters at the ends obtained in the last step by using specific primers, thereby obtaining the DNA sequencing library, wherein the specific primers are primers for specifically amplifying the DNA fragments with the adapters at the ends in the second mixture.
8. The method of claim 7, wherein steps (b) and (c) are performed simultaneously or sequentially in the same reaction system.
9. A kit for constructing a DNA sequencing library, said kit comprising:
(Z1) the fragmenting enzyme of any one of claims 1-2;
(Y) description which describes the method of use.
10. A nucleic acid molecule encoding the fragmenting enzyme of claim 1.
11. A vector comprising the nucleic acid molecule of claim 10.
12. A host cell comprising the vector of claim 11, or having integrated into its chromosome an exogenous nucleic acid molecule of claim 10, or expressing the fragmenting enzyme of claim 1.
CN202110299643.0A 2021-03-22 2021-03-22 DNA library preparation method based on fragmenting enzyme Active CN112680431B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110299643.0A CN112680431B (en) 2021-03-22 2021-03-22 DNA library preparation method based on fragmenting enzyme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110299643.0A CN112680431B (en) 2021-03-22 2021-03-22 DNA library preparation method based on fragmenting enzyme

Publications (2)

Publication Number Publication Date
CN112680431A CN112680431A (en) 2021-04-20
CN112680431B true CN112680431B (en) 2021-06-11

Family

ID=75455743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110299643.0A Active CN112680431B (en) 2021-03-22 2021-03-22 DNA library preparation method based on fragmenting enzyme

Country Status (1)

Country Link
CN (1) CN112680431B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090305419A1 (en) * 2008-05-28 2009-12-10 Sangamo Biosciences, Inc. Compositions for linking DNA-binding domains and cleavage domains
CN110418841A (en) * 2016-08-24 2019-11-05 桑格摩生物治疗股份有限公司 The target specific nucleic acid enzyme of engineering
CN112430622A (en) * 2020-10-26 2021-03-02 扬州大学 FokI and dCpf1 fusion protein expression vector and site-directed gene editing method mediated by same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090305419A1 (en) * 2008-05-28 2009-12-10 Sangamo Biosciences, Inc. Compositions for linking DNA-binding domains and cleavage domains
CN110418841A (en) * 2016-08-24 2019-11-05 桑格摩生物治疗股份有限公司 The target specific nucleic acid enzyme of engineering
CN112430622A (en) * 2020-10-26 2021-03-02 扬州大学 FokI and dCpf1 fusion protein expression vector and site-directed gene editing method mediated by same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"FokI dimerization is required for DNA cleavage";JURATE BITINAITE et al.,;《Proc. Natl. Acad. Sci. USA》;19981231;第95卷;第10570-10575页 *
"Insertion and Deletion Mutants of FokI Restriction Endonuclease";Yang-Gyun Kim et al.,;《THE JOURNAL OF BIOLOGICAL CHEMISTRY》;19941231;第269卷(第50期);第31978-31982页 *

Also Published As

Publication number Publication date
CN112680431A (en) 2021-04-20

Similar Documents

Publication Publication Date Title
CN107922931B (en) Thermostable Cas9 nuclease
CN106507677B (en) Modified transposases for improved insert sequence bias and increased DNA import tolerance
CN102796728B (en) Methods and compositions for DNA fragmentation and tagging by transposases
KR101685712B1 (en) Tailored multi-site combinatorial assembly
CN113373130A (en) Cas12 protein, gene editing system containing Cas12 protein and application
US5891637A (en) Construction of full length cDNA libraries
CN109337904B (en) Genome editing system and method based on C2C1 nuclease
WO2001027260A1 (en) Template molecule having broad applicability and highly efficient function means of cell-free synthesis of proteins by using the same
CN113249362B (en) Modified cytosine base editor and application thereof
CN112680431B (en) DNA library preparation method based on fragmenting enzyme
US9856470B2 (en) Process for generating a variant library of DNA sequences
CN116004681A (en) Method and kit for improving carrier connection efficiency in TOPO cloning
CN115703842A (en) Base editor for efficient and highly accurate cytosine C to guanine G conversion
CN117187210B (en) Mutant Bst DNA polymerase large fragment and preparation method thereof
CN116926034A (en) Recombinant creatine kinase isozyme, and preparation method and application thereof
CN116622744A (en) Preparation method and application of recombinant terminal deoxynucleotidyl transferase
CN117721094A (en) Exonuclease I mutant and preparation method and application thereof
CN116751762A (en) Cas12b proteins, single stranded guide RNAs, gene editing systems comprising same and related applications
CN116606874A (en) Preparation method and application of cystathionine-beta-synthetase
CN115838437A (en) Human NT-proBNP fusion protein and preparation method and application thereof
CN116042664A (en) Preparation method and application of acyl-CoA oxidase
CN116121280A (en) Preparation method and application of recombinant cystathionine-beta-lyase
CN116042662A (en) Preparation method and application of acyl-coa synthetase
CN116656645A (en) Group of thermostable GDSL family esterase mutants with improved catalytic efficiency and expanded substrate spectrum
CN115820677A (en) Preparation method and application of recombinant SLO antigen

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 215200 science and technology pioneer park complex building, Wujiang Economic Development Zone, Suzhou City, Jiangsu Province

Patentee after: Suzhou inshore protein Technology Co.,Ltd.

Address before: 215200 science and technology pioneer park complex building, Wujiang Economic Development Zone, Suzhou City, Jiangsu Province

Patentee before: WUJIANG NOVOPROTEIN SCIENTIFIC Inc.

CP01 Change in the name or title of a patent holder