CN107586835B - Single-chain-linker-based construction method and application of next-generation sequencing library - Google Patents

Single-chain-linker-based construction method and application of next-generation sequencing library Download PDF

Info

Publication number
CN107586835B
CN107586835B CN201710978737.4A CN201710978737A CN107586835B CN 107586835 B CN107586835 B CN 107586835B CN 201710978737 A CN201710978737 A CN 201710978737A CN 107586835 B CN107586835 B CN 107586835B
Authority
CN
China
Prior art keywords
dna
linker
double
stranded
stranded dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710978737.4A
Other languages
Chinese (zh)
Other versions
CN107586835A (en
Inventor
王进科
武剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201710978737.4A priority Critical patent/CN107586835B/en
Publication of CN107586835A publication Critical patent/CN107586835A/en
Application granted granted Critical
Publication of CN107586835B publication Critical patent/CN107586835B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a construction method and application of next generation sequencing library based on single-chain linker, the method comprises the following steps (1) denaturing double-stranded DNA fragment or RNA/DNA heterozygote fragment to make it become single-stranded DNA; (2) connecting a single-chain linker to the 3' end of the single-stranded DNA; (3) extending the single-stranded DNA connected with the single-stranded linker with DNA polymerase to form double-stranded DNA; (4) connecting a T adaptor or Tn5 tag adaptor to the other end of the double-stranded DNA; (5) double-stranded DNA with two ends connected with the joint is amplified through PCR, so that the double-stranded DNA becomes a DNA library which can be sequenced by a next generation sequencing technology. The method of the invention can be used for constructing a next generation sequencing library and determining a DNA sequence, can also identify a chromatin open region, gene expression detection, trace nucleic acid amplification and the like, and is a novel method with multiple functions and wide application value in the field of nucleic acid detection and analysis.

Description

Single-chain-linker-based construction method and application of next-generation sequencing library
Technical Field
The invention belongs to the technical field of biomedicine, and particularly relates to a method for constructing a next-generation sequencing (NGS) Library based on single-linker Library Preparation (SALP) and application thereof.
Background
Since the next-generation sequencing (NGS) appeared on the market in 2005, this technology changed our opinion of scientific research methods in the fields of basic, application and clinical research. With the continued development of new methods and computer computing power, the NGS platform has driven explosive growth of biological knowledge over the last few years. As the most important application of NGS, re-sequencing of the human genome has greatly deepened our understanding of the genetic diversity and the interrelationship between health and disease. The biggest difference between NGS compared to traditional Sanger sequencing is that NGS requires the preparation of a sequencing library. With the increasing data volume of NGS sequencing platforms and the continuous optimization of various related hardware and software, the preparation of sequencing libraries has become a bottleneck in applying this technology.
The existing standard library building process is completely carried out in vitro (in vitro), and the main steps comprise: DNA fragmentation (sonication or digestion) (DNA fragmentation), end-flattening (end-flattening), a (a) addition, Y-linker ligation (adaptor ligation), fragment selection (size selection), and PCR amplification (PCR amplification). The standard library building process is long and tedious, and many steps need to be optimized, so that a large amount of samples are lost. In particular, in the library construction process, for the efficiency of the adaptor ligation step, the DNA fragments are subjected to an end-blunting (blunt end) step of treating the DNA fragments with a multienzyme mixture before adaptor ligation, and an essential end-up a (a labeling) step. While there are many companies that have introduced methods for combining or optimizing these steps, as well as methods for joining joints like Y joints (e.g., neck ring joints), the library technology has not been fundamentally changed. In addition, in this library construction method, Y-type adapters are widely used, and the PCR primers with Index are not used until the final PCR step to distinguish different DNA samples, and then mixed for sequencing of the same channel (lane). The method for mixed sequencing of each DNA sample by independently going through the whole library construction process greatly increases the operation complexity, reagent and manpower consumption, not only has high library construction cost, but also is easy to cause artificial bias (bias) of different DNA samples during library construction, and is not beneficial to parallel comparison of sequencing results among different samples.
In order to overcome the above disadvantages of the standard library building process, an NGS library building method based on Tn5 transposome (transposome) cutting and pasting technology was developed. In this method, two Mosaic End (ME) linkers containing primer annealing sites are first assembled with a highly active Tn5transposase to form a transposome capable of fragmenting DNA and ligating the linkers to the 5' end of the DNA (a process known as "cut and paste"; a proprietary English term "ligation"). Finally, the DNA fragments are PCR amplified by specific primers with low cycle number, resulting in a library compatible with high throughput sequencing platforms (e.g., Illumina). However, when library preparation is performed by this method, since both ends of a partial DNA fragment (up to 50% in theory) generated by Tn5 transposome ligation reaction carry the same linker, only a partial DNA fragment (50% in theory) carries different linker sequences at both ends, and only this partial DNA fragment can be amplified and sequenced by two different primers at the same time. Although inhibition of PCR can increase the proportion of DNA fragments having a structure that can be sequenced to some extent during DNA amplification of the DNA fragments, many DNA fragments in the library cannot be sequenced, and thus a large amount of information is lost.
Mammalian growth and development processes are regulated by the constant interaction between DNA binding proteins and chromatin. Chromatin is able to restrict the binding between a transcription factor and its DNA binding site in a manner that is highly cell-specific, i.e. chromatin determines the binding between a transcription factor and its DNA binding site through its open state. The open regions of chromatin provide an opportunity for transcription factors to bind to their DNA binding sites located in the open regions of chromatin, and thus play an important role in the regulation of gene expression. Therefore, the identification of the chromatin opening state is of great significance for finding gene regulatory regions and analyzing the regulation mechanism of gene expression. The ATAC-seq (transposase-accessible chromatin using sequence) technology developed in recent years based on Tn5 transposome chromosome cutting is a new method capable of rapidly and sensitively capturing chromatin open regions, and is widely applied to the research of chromatin open states under different conditions. The study of the open state of the chromatin can also provide powerful help for the discovery of important regulatory elements that cause disease. For example, by using ATAC-seq to identify and compare the difference of chromatin opening states of tissue samples of esophageal cancer patients and esophageal cancer cell lines, the transcription factor AP1 is identified to play a key role in the course of esophageal cancer, so that a new target is found for the treatment of esophageal cancer. However, due to the limitations of the above-mentioned Tn5 transposome-based cutting and banking method, the ATAC-seq technique for identifying chromatin opening regions by using Tn5 transposome-based cutting chromatin has the same disadvantages. Since the library construction method based on Tn5 transposome cleavage cannot determine the sequences of all DNA fragments, the ATAC-seq technique may lose part of the open region information due to the library construction process.
Disclosure of Invention
The purpose of the invention is as follows: in view of the defects and shortcomings of the current commonly used Y-linker NGS library construction method, Tn5 transposome NGS library method and the ATAC-seq technology for researching chromatin open regions based on Tn5 transposome. The invention provides a Single Strand Adapter (SSA) -based construction method of a Next Generation Sequencing (NGS) Library, namely Single Strand Adapter Library Preparation (SALP). The SALP method provided by the invention can be used for constructing NGS libraries and determining DNA sequences, and can also be used for identifying chromatin open regions, detecting gene expression (similar to RNA-seq function), amplifying trace nucleic acid and the like, so that the SALP method is a novel technology with multiple functions and wide application value in the field of nucleic acid detection and analysis.
The invention also provides the use of the SALP method.
The technical scheme is as follows: in order to achieve the above object, the present invention provides a method for constructing a single-chain linker (SSA) -based NGS library, i.e. SALP, comprising the steps of:
(1) denaturing double-stranded DNA (dsdna) fragments or RNA/DNA hybrid fragments into single-stranded DNA (ssdna);
(2) connecting a single-chain linker to the 3' end of the single-stranded DNA;
(3) extending the single-stranded DNA connected with the single-stranded linker with DNA polymerase to form double-stranded DNA;
(4) to the single-strand-free end of the double-stranded DNA (ligation T-linker or Tn5 tag linker;
(5) double-stranded DNA with adaptors ligated at both ends is amplified by PCR to make it a Next Generation Sequencing (NGS) sequencable DNA library.
Wherein, the dsDNA fragments in the step (1) are any dsDNA fragments, and comprise double-stranded DNA fragments including ultrasonically sheared DNA fragments, DNA fragments generated by various enzyme cleavages, DNA fragments generated by fragmenting based on transposable bodies or naturally degraded DNA fragments. The dsDNA refers to genomic DNA (gDNA) which is cut into dsDNA fragments by various methods, and when the library building method is used for building a library and performing high-throughput sequencing, the dsDNA can be used for whole gene DNA sequencing analysis or sequencing analysis of specific target gDNA (such as gDNA in chromatin open regions).
Wherein the naturally degraded DNA fragments are circulating free DNA (cfDNA) and circulating tumor DNA (ctDNA) naturally generated in body fluid such as blood. When cfDNA and ctDNA are used for library construction, the amplified DNA can be used for high-throughput sequencing analysis and further low-throughput detection, such as PCR amplification and mutation detection of specific genes or DNA fragments; at this time, the application value of the library construction method becomes an amplification technology of a trace nucleic acid sample.
Preferably, the transposome is formed by assembling a Tn5transposase and a Tn5 tag linker.
Further, when the double-stranded DNA fragment in the step (1) is a DNA fragment generated based on transposon fragmentation, the Tn5 tag adaptor is ligated to the other end of the double-stranded DNA in the step (4). If a base is built on the basis of a transposome, the transposome assembled by Tn5 and different Tn5 tag joints is used for processing different DNA samples, dsDNA fragments of the different DNA samples generated by processing can be mixed to be used as a DNA fragment sample for denaturation, single-link joint connection, extension and PCR amplification; generating a mixture comprising various different DNA samples that can be distinguished by the tag sequence in the Tn5 tag linker for sequencing; the treatment greatly simplifies the operation of constructing the library of multiple samples, eliminates the possible bias (bias) caused by the step of constructing the library and is convenient for the comparative analysis of the sequencing results among different DNA samples.
Wherein, the Tn5 tag linker has a sequence structure as follows: 5 '-primer annealing site sequence-tag sequence (barcode) -ME sequence-3'; wherein the ME sequence is double-stranded, and the primer annealing site sequence and the tag sequence can be single-stranded or double-stranded; the 3 'end of the double-chain ME sequence is hydroxyl, and the 5' end of the double-chain ME sequence is phosphate; wherein one chain of the ME sequence is SEQ ID NO. 1: 5'-AGATGTGTATAAGAGACAG-3' and the other complementary strand is SEQ ID NO. 2: 5 '-P-CTGTCTCTTATACACATCT-3', wherein P represents a phosphate group.
Preferably, the RNA/DNA hybrid of step (1) is an RNA/DNA hybrid produced by a reverse transcription reaction; the single-stranded DNA resulting from the denaturation is complementary DNA (cDNA). When ssDNA generated by RNA/DNA hybrid denaturation is used for library construction and high-throughput sequencing, the method can realize functions of whole genome gene expression detection and the like, and is similar to RNA-seq.
Wherein, the single-chain adaptor in the step (2) is a double-stranded oligonucleotide with a sticky end; the cohesive end of the single chain linker is a plurality of random nucleotides protruding from the 3 'end, and the other 3' end of the single chain linker is a group-closed blunt end, such as an amino group.
Preferably, the length of the plurality of random nucleotides protruding from the 3' end is 1-4 nucleotides. Most preferably 3 nucleotides.
Further, the 5 'end of the cohesive end is a phosphate group, and the 5' end of the blunt end is a hydroxyl group.
Wherein, the ligation in the step (2) anneals (or hybridizes) the cohesive end of the single-stranded DNA generated in the step (1) with the 3' end of the single-stranded DNA, and then the 5' end phosphate group of the cohesive end of the single-stranded linker and the 3' end hydroxyl group of the single-stranded DNA generated in the step (1) form a 3' -5 ' phosphodiester bond catalyzed by the nucleic acid ligase. The nucleic acid ligase is T4DNA ligase.
Preferably, the single-stranded linker is a single-stranded tag linker, i.e., a tag sequence is added to the double-stranded region of the single-stranded linker, and the tag sequence is adjacent to the 3' -overhanging random nucleotide.
Wherein the DNA polymerase of step (3) comprises various DNA polymerases; if the DNA polymerase is common Taq DNA polymerase, a protruding A base is naturally generated at the end of the 3' end of the double-stranded DNA generated in the step (3), and the extension product of the DNA polymerase can be directly used for connecting a T joint in the step (4); if the DNA polymerase is other high-fidelity DNA polymerase, because the tail end of the 3 'end of the double-stranded DNA generated in the step (3) does not have an outstanding A base, the extension product needs to be treated by common Taq DNA polymerase and other enzymes with similar functions, so that the tail end of the 3' end of the extension product generates an outstanding A base, and the A base is used for connecting the T joint in the step (4).
Wherein, the T adaptor in the step (4) is a double-stranded oligonucleotide with a sticky end, and the sticky end of the double-stranded oligonucleotide is a T base protruded from the 3' end; the T base can anneal with the A base protruded from the 3' end of the double-stranded DNA generated in the step (3).
Wherein, the connection in the step (4) anneals the overhanging T base of the 3 'end of the T joint and the overhanging A base of the 3' end of the double-stranded DNA generated in the step (3), and then the T joint and the double-stranded DNA generated in the step (3) form a 3 '-5' phosphodiester bond through catalysis of the nucleic acid ligase. The nucleic acid ligase is typically T4DNA ligase.
Wherein, the PCR amplification in the step (5) is a double-stranded DNA fragment with two ends respectively connected with a single-chain connector and a T connector or a single-chain connector and a Tn5 tag connector, and the single-chain connector and the T connector or the single-chain connector and the Tn5 tag connector provide annealing sites of the PCR primer in the step (5); PCR amplification is carried out by a pair of primers capable of annealing with the sequences of the single-chain joint and the T joint or the single-chain joint and the Tn5 tag joint, and PCR amplification products of DNA fragments can be generated; the PCR amplification product is a DNA library which can be sequenced by the next generation sequencing technology and can be used for next generation sequencing analysis. Wherein, the PCR primer is an NGS sequencing compatible primer, such as an Illumina index (index) primer.
The construction method of the next generation sequencing library is applied to genome DNA sequencing, cell chromatin opening analysis, gene expression detection and trace nucleic acid amplification.
Wherein, the specific steps in the analysis of the cellular chromatin opening are as follows: (1) assembling a rotary seat body by using Tn5 and different Tn5 label joints; (2) collecting different cells, and using a cell membrane mild lysis method to lyse the cell membrane but keeping the cell nucleus intact; centrifuging to collect cell nucleus and removing cell membrane debris and cytoplasm components; (3) treating the cell nucleus with a transposome to make it a chromatin fragment; (4) separating and purifying the genome double-stranded DNA fragment in the chromatin fragment; (5) denaturing the double DNA fragments into single-stranded DNA; (6) connecting a single-chain linker to the 3' end of the single-stranded DNA; (7) extending the ssDNA connected to the single-stranded linker with DNA polymerase to make it double-stranded; (8) connecting a Tn5 tag adaptor to one end of the double-stranded DNA without the single-chain adaptor; (8) and amplifying double-stranded DNA (deoxyribonucleic acid) with two ends connected with the joint through PCR (polymerase chain reaction), so that the double-stranded DNA becomes a DNA library which can be sequenced by next-generation sequencing NGS (next generation sequencing). When the banking method is used for analyzing the open state of the chromatin, the different cells include different kinds of cells which are not subjected to a certain treatment, different kinds of cells which are subjected to the same treatment, the same kind of cells which are subjected to a different kind of treatment, different kinds of tumor tissue cells which are derived from different patients, and the like; if the different cells are tumor tissues from clinical patients, the method has important application value in the fields of individualized medical treatment and precise medical treatment.
If a base is built on the basis of a transposome body, the transposome body assembled by Tn5 and different Tn5 tag joints is used for processing different cell samples, chromatin fragments of the different cell samples generated by processing can be mixed to be used as a chromatin fragment sample, and gDNA fragment purification, denaturation, single-link joint connection, extension and PCR amplification are carried out; generating a mixture comprising various different cell samples that can be distinguished using the tag sequences in the Tn5 tag linkers for sequencing; this treatment greatly simplifies the process of pooling multi-cell samples, eliminates the bias (bias) that may be associated with the pooling step, and facilitates comparative analysis of chromatin opening status among different cell samples.
Preferably, when the linear amplification technology of nucleic acid molecules is applied, the single-chain linker, the T linker or the Tn5 tag linker sequence contains a T7 promoter sequence, and double-stranded DNA fragments with linkers connected at two ends are subjected to in vitro transcription amplification and then are subjected to reverse transcription to be converted into DNA fragments; the reverse transcription generated DNA fragments can be used for high-throughput sequencing analysis or low-throughput detection analysis.
Wherein, the single-chain linker can also be transformed into a single-chain tag linker, namely, a tag sequence is added into a double-chain region of the single-chain linker, and the tag sequence is close to random nucleotides protruding from the 3' end; when the single-chain tag joints are used for building a library, different DNA samples connected with different single-chain tag joints can be mixed, and then the steps of extension, T-joint connection, PCR amplification and the like are carried out; the treatment can greatly simplify the operation of constructing the DNA sample library, eliminate the possible deviation caused by the step of constructing the library and facilitate the comparative analysis of the sequencing results among different DNA samples.
In the present invention, (1) DNA sequencing analysis: one approach is based on "single-linker + T-linker" SALP library-building sequencing analysis (FIG. 1A). The experimental process is as follows: firstly, denaturing the dsDNA fragment into ssDNA; the dsDNA fragment can be any dsDNA fragment of any source, such as ultrasonically sheared DNA, endonuclease sheared DNA, DNA generated by natural degradation and the like; ② connecting a single linker (SSA) at the 3' end of the ssDNA; extending the ssDNA connected with the single-chain linker with DNA polymerase to make it into dsDNA; connecting a T joint; fifthly, amplifying dsDNA of the connecting joints (single-chain joint and T joint) at two ends by PCR to make the dsDNA become a DNA library which can be sequenced by NGS.
In this procedure, if a plurality of DNA samples are to be subjected to library construction and sequencing simultaneously, a high-throughput library construction process (fig. 2A) of the procedure can be adopted, and the experimental process is: firstly, denaturing dsDNA fragments of each sample to make the dsDNA fragments become ssDNA; secondly, respectively connecting a single-stranded tag adapter (SBA) to the 3' end of each sample ssDNA; then, mixing the ssDNA of each sample connected with the SBA to form a mixed DNA sample; extending the ssDNA connected with the single-chain linker with DNA polymerase to make it into dsDNA; connecting a T joint; fifthly, amplifying dsDNA of connecting joints (single-stranded tag joint and T joint) at two ends by PCR to make the dsDNA become a DNA library which can be sequenced by NGS. In the program, different DNA samples with different SBAs are mixed to form a DNA mixture which is used as a DNA mixed sample to carry out extension of DNA fragments, T-joint connection and PCR amplification; generating a mixture comprising various different DNA samples that can be distinguished using the tag sequence in SBA for sequencing; the treatment greatly simplifies the library construction operation of multiple DNA samples, eliminates the possible bias (bias) caused by the library construction step, and is convenient for the comparative analysis of sequence information among different DNA samples.
(2) DNA sequencing analysis: the second approach is a "Tn 5 linker + single-strand linker" based sequencing analysis of the SALP library (FIG. 1B). The experimental process is as follows: assembling a Tn5 and a Tn5 joint (TA) into a rotary seat body; ② treating the DNA sample by Tn5 transposome; ③ denaturing the dsDNA fragment into ssDNA; connecting a single chain connector to the 3' end of the ssDNA; extending the ssDNA connected with the single-chain linker with DNA polymerase to make the ssDNA become dsDNA; sixthly, performing PCR amplification on dsDNA of which both ends are connected with joints (Tn5 joint and single-chain joint) to make the dsDNA become a DNA library which can be sequenced by NGS.
In this procedure, if a plurality of DNA samples are to be subjected to library construction and sequencing simultaneously, a high-throughput library construction process (fig. 2B) of the procedure can be adopted, and the experimental process is: assembling a rotary seat body by using Tn5 and different Tn5 tag joints (BTA); treating different DNA samples with Tn5 transposomes containing different BTA; then mixing different DNA samples treated by Tn5 transposome to obtain a DNA mixture; ③ denaturing the dsDNA fragment into ssDNA; connecting a single chain connector to the 3' end of the ssDNA; extending the ssDNA connected with the single-chain linker with DNA polymerase to make the ssDNA become dsDNA; sixthly, performing PCR amplification on dsDNA of the joint (Tn5 label joint and single-chain joint) connected at two ends to make the dsDNA become a DNA library which can be sequenced by NGS. In the procedure, different DNA samples treated with Tn5 transposome were mixed to form a DNA mixture as a DNA mixture sample for gDNA fragment purification, denaturation, single-linker ligation, extension and PCR amplification; generating a mixture comprising various different DNA samples that can be distinguished by the tag sequence in the Tn5 tag linker for sequencing; the treatment greatly simplifies the library construction operation of multiple DNA samples, eliminates the possible bias (bias) caused by the library construction step, and is convenient for the comparative analysis of sequence information among different DNA samples.
(3) Chromatin opening analysis: for chromatin opening analysis using the SALP technique, a "Tn 5 linker + single-stranded linker" based SALP pooling approach was used (FIG. 1B). The experimental process is as follows: assembling a rotary seat body by using Tn5 and Tn5 joints (TA); collecting cells, and using a cell membrane mild lysis method to lyse cell membranes but keep cell nuclei intact; collecting nuclei by centrifugation, removing cell membrane debris and cytoplasmic components (cytoplasmic components can be used for RNA isolation); processing the cell nucleus with the transposome to obtain chromatin segment; separating and purifying gDNA in the chromatin fragment; fifthly, denaturing the dsDNA fragment into ssDNA; sixthly, connecting a single chain head at the 3' end of the ssDNA; seventhly, extending the ssDNA connected with the single-chain joint by using DNA polymerase to make the ssDNA become dsDNA; PCR amplifies dsDNA of two ends connecting joint (Tn5 joint and single chain joint) to make it become DNA library of NGS which can be sequenced.
In this procedure, if a plurality of cell samples are to be subjected to library construction and sequencing simultaneously, a high-throughput library construction procedure (fig. 2C) of the procedure can be employed, and the experimental procedure is as follows: assembling a rotary seat body by using Tn5 and different Tn5 tag joints (BTA); collecting different cells, and cracking the cell membrane by using a cell membrane mild cracking method while keeping the cell nucleus intact; collecting nuclei by centrifugation, removing cell membrane debris and cytoplasmic components (cytoplasmic components can be used for RNA isolation); processing the cell nucleus with the transposome to obtain chromatin segment; then, the chromatin fragments of each cell sample are mixed to form a chromatin mixture; separating and purifying gDNA in the chromatin fragment; fifthly, denaturing the dsDNA fragment into ssDNA; sixthly, connecting a single chain head at the 3' end of the ssDNA; seventhly, extending the ssDNA connected with the single-chain joint by using DNA polymerase to make the ssDNA become dsDNA; PCR amplifies dsDNA of two ends connecting joint (Tn5 label joint and single chain joint) to make it become DNA library of NGS which can be sequenced. In the program, a transposome assembled by Tn5 and different Tn5 tag linkers is used for treating different cell samples, chromatin fragments of the different cell samples generated by treatment can be mixed to be used as a chromatin fragment sample, and gDNA fragment purification, denaturation, single-linker connection, extension and PCR amplification are carried out; generating a mixture comprising various different cell samples that can be distinguished using the tag sequences in the Tn5 tag linkers for sequencing; this treatment greatly simplifies the process of pooling multi-cell samples, eliminates the bias (bias) that may be associated with the pooling step, and facilitates comparative analysis of chromatin opening status among different cell samples.
When the high-throughput banking method is used for chromatin opening analysis, different cells are used, including different kinds of cells which are not subjected to a certain treatment, different kinds of cells which are subjected to the same treatment, the same kind of cells which are subjected to different kinds of treatments, different kinds of tumor tissue cells derived from different patients, and the like. If different cells are tumor tissues from clinical patients, the method has important application value in the fields of individualized medical treatment and precise medical treatment. In addition, when the SALP technique is used for analysis of chromatin opening regions, a different number of cells (e.g., 500 to 10) can be used5Individual cells) was performed with high sensitivity (500 cells) in identifying the open state of chromatin in cells.
(4) And (3) gene expression detection: when the SALP technology is used for detecting gene expression, a SALP library construction mode based on a single-chain connector and a T connector is used (figure 1A; figure 2B). Any RNA (e.g., mRNA) that requires only one step of reverse transcription processing becomes an RNA/DNA hybrid, which is denatured to produce ssDNA (e.g., cDNA). The ssDNA is subjected to single-linker ligation, extension, T-linker ligation, and PCR amplification to form a library capable of sequencing. Analysis of the sequencing results can determine important information such as the nature (identity), abundance (abundance), splicing (splicing), editing (editing) and the like of each RNA in the original RNA sample. Particularly, the expression level of the gene can be evaluated through the integration of attribute and abundance information, a gene expression profile is drawn, and the similar RNA-seq function is realized. In addition, if the amplified library is added with the label information during the amplification, the evaluation of the gene expression level and the mapping of the gene expression profile can be realized through the hybridization with the gene chip. When the SALP technology is used for gene expression detection, the defect of using random primers in the existing RNA-seq technology is overcome, longer and more complete cDNA sequence and abundance information can be obtained, the sequence information of the mRNA 3' end can be obtained, and the like.
(5) Amplification of trace nucleic acids: when the SALP technique is used for amplification of trace amounts of nucleic acids, a SALP library building program based on "single-linker + T-linker" is used (FIG. 1A). The amplification of DNA fragments with linkers (single-stranded and T-linkers) attached to both ends can be performed in two ways. One is exponential amplification: that is, a pair of PCR primers capable of annealing to the adapters at both ends is used to perform PCR amplification of the DNA fragments with adapters at both ends in different cycles (the amplification process is the final step of SALP library preparation-PCR amplification, which is only applied for different purposes). Second, linear amplification: the T7 promoter sequences are respectively embedded into the single-chain connector and the T connector, the DNA fragments with the connectors connected at the two ends are subjected to in vitro transcription amplification at different time, and then reverse transcription is carried out to convert the DNA fragments into DNA fragments, namely linear amplification is finished. In this application, the library construction method becomes a linear amplification technique for nucleic acid molecules, and the amplified DNA can be used for high-throughput sequencing analysis (blood free DNA full sequencing analysis) or low-throughput detection analysis (hybridization, amplification, cloning, sequencing and other analysis of specific DNA fragments). Linear amplification is far less efficient than exponential amplification, but is particularly advantageous because the amplification process does not alter the relative proportions of molecules in the sample. The SALP database building technology is used for trace nucleic acid amplification and has extremely important application value in the fields of liquid biopsy (liquid biopsy), noninvasive prenatal gene detection (NIPT) and the like. The DNA detected in the fields is circulating free DNA (cfDNA), circulating tumor DNA (ctDNA), circulating fetal DNA (cffDNA) and the like which are generated by natural degradation in body fluid such as blood and the like, the DNA fragments are 100-500 bp in size, the content of the DNA fragments in the blood is low, and the DNA fragments are the most limiting factors for the current detection, but the DNA fragments are very suitable for the library construction and amplification of the SALP technology, and can be used for subsequent detection and analysis after fidelity amplification.
Has the advantages that: the invention provides a method for constructing a next generation sequencing Library based on Single Strand Adapter (SSA), namely Single Strand Adapter Library Preparation (SALP), which has the following advantages:
(1) the method can be used for constructing a library of any DNA fragment without any modification treatment.
(2) Whether a single-link joint and a T joint are adopted or a Tn5 label joint and a single-link joint are adopted for the library building process, the SALP method can realize the mixing of different DNA or chromatin samples at an early stage, and the subsequent library building step is carried out by the mixture, thereby greatly simplifying the library building process, reducing the reagent and labor consumption, avoiding artificial deviation (artificailbias) in the library building, being very beneficial to the rapid parallel library building of a plurality of samples and the comparative analysis of the sequencing information among different samples, and being especially suitable for the detection and analysis of a large number of clinical samples in individualized medical treatment and precise medical treatment.
(3) When the library constructed by the SALP method is subjected to next generation sequencing (SALP-sequencing, abbreviated as SALP-seq), the SALP-seq not only can capture chromatin open regions as efficiently as ATAC-seq, but also overcomes the weakness of ATAC-seq technology, so that more chromatin open regions are found than ATAC-seq.
(4) The SALP method provided by the invention can be used for constructing an NGS library and determining a DNA sequence, and can also be used for identifying a chromatin open region, determining a gene expression profile (similar to the function of RNA sequencing (RNA-seq)), linear (in vitro transcription) or exponential (PCR) amplification of trace nucleic acid (such as blood free DNA) and the like, so that the SALP method is a novel technology with multiple functions and wide application value in the fields of nucleic acid sequencing, nucleic acid detection and analysis and the like.
Drawings
FIG. 1 is a schematic diagram of the principle and process of the construction of the SALP-based next-generation sequencing NGS library of the present invention; wherein, (A) SALP library-building principle and process of common DNA fragments, the common DNA fragments refer to DNA fragments generated by ultrasound and endonuclease, naturally generated DNA fragments, and the like; (B) based on the principle and the process of creating a library of the SALP of cutting and pasting DNA fragments by a Tn5 transposome, the Tn5 tag adaptor (BTA) for the SALP-seq comprises a 19bp double-strand transposase binding site (ME) and a single-strand tag sequence and a PCR primer annealing site, a Tn5 BTA compound (transposome) is used for fragmenting DNA or chromatin, and a single-strand adaptor (SSA) is a double-strand nucleotide with 3' protruding a plurality of random bases;
FIG. 2 is a schematic diagram of high throughput library construction (SALP for short) based on the construction of the SALP Next Generation Sequencing (NGS) library of the present invention; wherein (A) a single-stranded tag-adaptor (SBA) is a tagged double-stranded nucleotide with 3' overhang of several random bases; (B) schematic diagram of high throughput library construction based on Tn5 transposomes starting from DNA, different DNAs were treated with different Tn5 tag linkers to tag different DNA samples. (C) Schematic diagram of high throughput library construction of Tn5 transposome-based SALP from cells. Treating different cells with different Tn5 tag linkers to tag different cell samples, the set of programs being intended for high throughput analysis of chromatin opening status of a plurality of cells;
FIG. 3 is a schematic diagram of a SALP library construction method validation electropherogram according to the present invention; wherein (A) Tn5 transposome fragmented HepG2 genomic DNA, compared with the initial DNA, the fragmented DNA presents a diffuse band, and the gel is cut to recover the HepG2 genomic DNA diffuse band; (B) SSA with different numbers of random bases are highlighted at the 3' end to construct an Illumina compatible library, and the SSA connection efficiency of the 3 random bases is highlighted to be the highest; 1N-4N: SSA of different overhanging number random bases; (C) clone sequencing verified library structures prepared by 4 different SSAs, from top to bottom: 1N to 4N linker;
FIG. 4 is a schematic diagram of the library structure of the SALP library construction method of the present invention, an Illumina sequencing platform compatible SALP sequencing library structure;
FIG. 5 is a schematic diagram showing the comparison of the chromatin opening status of GM12878 cell line by SALP-seq and ATAC-seq in the present invention; wherein, (A) the distribution of the densities of Reads, which refers to the number of Reads in a 1-Mb window; (B) comparing the peak numbers obtained by enrichment by the two methods; (C) comparing the densities of reads in the overlapped peak obtained by the two methods; (D) comparing FE of different types of peak; ATAC: enriching the obtained peak by ATAC-seq; SALP: enriching the obtained peak by SALP-seq; and (4) Overlap: overlapping ATAC-seq and SALP-seq peak; not Overlap; peak whose SALP-seq does not overlap with ATAC-seq; (E) distribution of reads on chromosome 22; (F) displaying the open state of chromatin identified by the SALP-seq in the selected region and other methods by a UCSC genome browser;
FIG. 6 is a schematic representation of the comparison of chromatin opening states of four different cells by SALP-seq in accordance with the present invention; wherein, (a) the distribution of reads around the TSS; (B) (iii) the Reads density profile; reads density refers to the number of Reads in the 1-Mb window; (C) UCSC track displays SALP-seq peak at a specific genomic position; chromatin opening markers, H3K27Ac track and DNase Cluster;
FIG. 7 is a schematic representation of the identification of chromatin opening status by different cell mass of SALP-seq of the invention; wherein, (A) statistics of the densities of reads with different cell volumes at the whole genome level; (B) the UCSC genome browser displays chromatin open regions identified by SALP-seq by HepG2 with different cell mass, and the open regions identified by ENCODE are displayed for comparison; (C) SALP-seq peak obtained at different cell volumes, and overlapping peak between different cell volumes; (D) comparing SALP-seq peak FE obtained from different cell volumes;
FIG. 8 shows the construction of NGS libraries from DNA disrupted by different fragmentation methods by the SALP library construction method; wherein (A) HindIII is used for enzyme digestion of HepG2 genome DNA to construct a sequencing library, the upper figure shows Hind III is used for enzyme digestion of HepG2 genome DNA, and the lower figure shows the prepared library; (B) constructing a sequencing library from the ultrasonically-broken HepG2 genomic DNA; the upper panel shows ultrasonically fragmented HepG2 genomic DNA, the lower panel shows the prepared library;
FIG. 9 is a diagram showing how the distributions of reads density and HindIII sites of HindIII library are compared at the whole genome level, and the reads density and the number of HindIII sites are calculated respectively in a 1M window in the whole genome range;
FIG. 10 is a schematic representation of the ultrasonic fragmentation of the reads of the HepG2 genomic DNA library at 1M window at the whole genome level for the calculation of the reads density of the ultrasonic fragmentation of the HepG2 genomic DNA library according to the present invention;
FIG. 11 is a schematic diagram of the experimental steps and experimental results for preparing cDNA using SALP method according to the present invention; wherein, (A) the experimental steps for preparing cDNA by SALP method are shown schematically; (B) preparing a cDNA experiment result graph by using the SALP method; m: a molecular weight standard; t4 +: adding T4DNA ligase in SSA ligation reaction; t4-: t4DNA ligase is not added in the SSA ligation reaction; blank, PCR negative control.
Detailed Description
The invention is further illustrated by the following figures and examples.
Example 1 SALP based on Tn5 transposome cleavage reaction
The experimental method comprises the following steps:
cell culture: HepG2 cells were cultured using DMEM medium. The medium contained 10% (v/v) fetal bovine serum, 100units/mL penicillin and 100. mu.g/mL streptomycin. Cells were incubated at 37 ℃ and 5% (v/v) CO2Culturing in an incubator. The cells are from the cell resource center of Shanghai Life sciences research institute of Chinese academy of sciences.
Preparation of genomic DNA: genomic DNA (genomic DNA, gDNA) of HepG2 cells was extracted by phenol chloroform extraction.
Preparing a joint: all oligonucleotides were synthesized by Shanghai biosynthesis (as shown in SEQ ID Nos. 3-23 of Table 1). Preparation of Tn5 tag linkers (BTAs), tags and ME oligonucleotides in ddH2O was dissolved at 20. mu.M and mixed equimolar to the PCR tube. Preparation of Single Link adapters (SSA), SSA-PN and SSA-PNre in ddH2O was dissolved at 100. mu.M and mixed equimolar to the PCR tube. The oligonucleotide mixture is denatured at 95 ℃ for 5min and naturally cooled to 25 ℃.
Preparation of Tn5 transposome: mu.L of BTA (10. mu.M) and 2. mu.L of 10 XTPS, Ltd according to the instructions of Robust Tn5Transposase (Robust Tn5Transposase, Robustnique corporation),1 μ L Tn5transposase and 13 μ L H2And (4) mixing the materials. Tn5 transposomes were formed by incubation at 25 ℃ for 30 minutes. The transposable body can be stored at-20 ℃ for later use.
Optimization of SALP single-link joint: in order to prepare an Illumina sequencing platform compatible sequencing library and optimize linker ligation efficiency, SSA with 4 different length (1-4) random bases protruding from the 3' end was designed. For ligation of SSA, 12.5ngtagmented HepG2gDNA was denatured at 95 ℃ for 5 minutes and rapidly ice-cooled for 5 minutes. The denatured gDNA and SSA were ligated in 10. mu.L reaction system of 1. mu.L of T4DNA ligase (NEB, M0202L), 1 XT 4DNA ligase (ligase) buffer, 1. mu.L of SSA (5. mu.M) at 16 ℃ for 60 minutes. Then mixed with an equal volume of 2 XrTaq mix (Takara) and reacted at 72 ℃ for 15 minutes. The product was purified by 1.2 × Ampure XP beads (Beckman Coulter) and PCR amplified in a 50 μ L reaction. The PCR reaction system is as follows: 25 μ L
Figure BDA0001438689190000121
Hot Start HiFi PCR Master Mix (NEB, M0543S), 1. mu.L NEBNext Universal PCR Primer (10. mu.M), 1. mu.L NEBNext Index Primers (10. mu.M). The PCR reaction procedure was as follows: (i) 5 minutes at 98 ℃, (ii) 10s at 98 ℃, 30s at 65 ℃, 1 minute at 72 ℃, 18 cycles, (iii) 5 minutes at 72 ℃. The PCR product was electrophoresed on agarose gel, and the 300-and 1000-bp fragments were recovered from the gel.
Cloning and sequencing: the resulting library was prepared for clonal sequencing. The DNA recovered from the gel was mixed with an equal volume of 2 XrTaq mix and reacted at 72 ℃ for 15 minutes. The purified DNA was cloned into PMD19-T Simple vector (Takara, 6013). Sanger sequencing was performed on the cloned DNA fragments. 10 clones were picked for sequencing per SSA.
The experimental results are as follows:
the Tn5 transposome chip-cutting and chip-cutting DNA fragment can be conveniently used for constructing an NGS sequencing library, firstly, the Tn5 transposome chip-cutting and chip-cutting DNA is utilized, the NGS library is constructed by an SALP method, and a novel Tn5 tag linker (BTA) is designed for the purpose (figure 1B). Using these tags, different samples after Tn5 transposome excision could be mixed. Since Tn5 has the property of "cutting and sticking", the transposon formed by Tn5transposase and BTA is able to efficiently fragment HepG2 genomic DNA and ligate BTA to DNA (fig. 3A). The fragmented DNA is then denatured and ligated with a Single Strand Adapter (SSA). SSA is a double-stranded nucleotide with 1-4 random bases protruding from the 3' end. The adaptor-ligated genomic DNA was extended with Taq polymerase to perform PCR amplification with primers annealing to BTA and SSA, respectively. The amplification results showed that the SSA ligation efficiency was highest with 3 bases highlighted (FIG. 3B). To further verify the structure of the DNA library, the PCR products were cloned into a T-vector, 40 clones were selected for clonal PCR identification and Sanger sequencing. Cloning PCR revealed that the DNA fragments inserted into the T-vectors were all between 150 and 1000bp in length (FIG. 3C). Sequencing results of 40 clones 1N-1 to 1N-10, 2N-1 to 2N-10, 3N-1 to 3N-10, and 4N-1 to 4N-10 are shown in sequence SEQ ID NO.24-63, and the libraries constructed by using the four SSAs meet the requirements and can be compatible with an Illumina sequencing platform (FIG. 4B). Based on the above results, 3 base protruding SSA (i.e., SSA formed by annealing the oligonucleotide SSA-PN of Table 1 with the oligonucleotide SSA-PNrev-3N) was selected for subsequent SALP pooling experiments.
TABLE 1 SALP oligonucleotide linkers and PCR primers
Figure BDA0001438689190000131
Figure BDA0001438689190000141
Note:5'-P is for ligation;3'-NH2prevents undesired ligation.
Example 2 SALP-seq identification of chromatin opening states in different cell lines
The experimental method comprises the following steps:
cell culture: HeLa, HepG2 and 293T were cultured using DMEM medium, and GM12878 cells were cultured in RPMI 1640 medium containing 10% (v/v) fetal bovine serum, 100units/mL penicillin and 100. mu.g/mL streptomycin. Cells were incubated at 37 ℃ and 5% (v/v) CO2Culturing in an incubator. The cells are from the cell resource center of Shanghai Life sciences research institute of Chinese academy of sciences.
Preparing a joint: all oligonucleotides were synthesized in Shanghai (Table 1). Preparation of Tn5 tag linkers (BTAs), tags and ME oligonucleotides in ddH2O was dissolved at 20. mu.M and mixed equimolar to the PCR tube. Preparation of Single Link adapters (SSA), SSA-PN and SSA-PNre in ddH2O was dissolved at 100. mu.M and mixed equimolar to the PCR tube. The oligonucleotide mixture is denatured at 95 ℃ for 5min and naturally cooled to 25 ℃.
Preparation of Tn5 transposome: mu.L of BTA (10. mu.M) was combined with 2. mu.L of 10 XPPS, 1. mu.L of Tn5Transposase and 13. mu. L H according to the instructions of Robust Tn5Transposase (Robust Tn5Transposase, Robustonique corporation Ltd.)2And (4) mixing the materials. Tn5 transposomes were formed by incubation at 25 ℃ for 30 minutes. The transposable body can be stored at-20 ℃ for later use.
Chromatin shearing: 100,000 GM12878, HeLa, HepG2, and 293T cells were counted, respectively. Cells were harvested by centrifugation at 500g for 5min at 4 ℃ and washed with 50. mu.L of pre-cooled PBS. mu.L of precooled cell membrane lysis buffer (10mM Tris-HCl, pH7.4, 10mM NaCl, 3mM MgCl)20.1% IGEPAL CA-630) resuspended cells. Nuclei were collected by centrifugation at 500g for 10 minutes at 4 ℃. Tn5 transposome ligation reactions were performed on the chromosomes with 100,000 cells in 30. mu.L reaction system, with 20. mu.L Tn5 transposomes, 3. mu.L DMF and 1 XLM buffer. After different reaction systems are mixed uniformly, incubation is carried out for 30 minutes at 37 ℃, and the reaction systems are uniformly mixed every 10 minutes in the incubation process, so that the shearing efficiency is improved. Different Tn5 tag linkers (BTA) were used for different cell samples in the shear reaction (see Table 2).
Library preparation: after Tn5 transposome ligation reactions, chromatins from four different cell lines were mixed to obtain a mixture of chromatins. 1% SDS and 20mg/mL Proteinase K (Sigma) were added to the chromatin mixture to final concentrations of 0.1% and 400. mu.g/mL, respectively. The mixture was incubated at 65 ℃ for 1 hour and 1 XTE buffer was added to 200. mu.L. The DNA was then extracted by phenol chloroform extraction. The resulting DNA samples were subjected to SSA ligation, rTaq enzyme extension and Illumina compatible Index PCR amplification (same procedure as in example 1). And carrying out agarose gel electrophoresis on the amplification product, and recovering the fragments within the range of 150-1000 bp to obtain more nucleosome deletion sequences. In the library preparation process, 3 base overhanging SSA linkers were used according to the optimization results.
NGS sequencing: the library NGS-L1 (four cell line tagged chromatin) was prepared by Illumina compatible Index PCR amplification (table 3). This library was mixed with other 3 NGS libraries constructed using SALP, including NGS-L2 (five cell number tagged chromatin; example 3), NGS-L3(HindIII digested gDNA; example 4) and NGS-L4 (ultrasonically fragmented gDNA; example 4), in proportions of 4:1: 1(NGS-L1: NGS-L2: NGS-L3: NGS-L4) according to DNA mass to form DNA samples that could be sequenced using a single channel (lane). After quality control and quantitative detection of the mixed DNA sample by an Agilent Bioanalyzer 2100, the mixed DNA sample is handed over to Nanjing and Gene biotechnology, Inc. for sequencing by an Illumina Hiseq XTen platform.
And (3) data analysis: the raw data was segmented by Perl script according to index and barcode. ME (19bp) and barcode (6bp) were removed from the 5' end of double ended sequenced read 2. All reads were truncated to 30bp, aligned to the human genome (version hg19) by the Bowtie (version 1.1.2) program, using the parameters: x2000 to ensure alignment of long fragments to the genome. Peak calling was performed by macs2 program with the following parameters: -f BEDPE-keep-dup ═ 2. Peak annotation was performed by Homer software. All peak tracks were displayed by UCSC genome browser and relevant statistical analysis was performed by R software and Perl script. ATAC-seq data for GM12878 cells were downloaded from the GEO database (access number: GSE47753) and compared using the same SALP-seq analysis procedure.
The experimental results are as follows:
the chromatin opening state of GM12878 cell line has been studied by using methods such as DNase-seq, FAIRE-seq and ATAC-seq. The chromatin opening state of this cell line was identified by SALP-seq for comparison with the results of other methods. First, the read length (read) distributions of SALP-seq and ATAC-seq were compared at the whole genome level, and the results showed that the reads distributions obtained by both methods were similar (FIG. 5A). Although there is some difference in the identification of the two methods for the partial reads-rich region. By comparing the number of peaks (peaks) obtained by the enrichment of the two methods, it was found that SALP-seq was able to identify a greater number of peaks than ATAC-seq when the sequencing depth was normalized (FIG. 5B). Comparing the density of reads in overlapping peaks identified by the two methods, it was found that there were more reads in peaks than ATAC-seq, SALP-seq (FIG. 5C). By comparing the enrichment Factor (FE) of peak, it was found that low FE peak was more easily enriched by SALP-seq (FIG. 5D). Comparison of the Reads profiles shows that SALP-seq can achieve the same Reads density profile as ATAC-seq, indicating that SALP-seq is highly reliable (FIG. 5E). To further demonstrate the reliability of the SALP-seq in identifying chromatin opening status, peak in a region of high interest by other methods was compared (FIG. 5F). The results indicate that SALP-seq, ATAC-seq, FAIRE and DNase-seq all identified the same chromatin opening status. And the peak obtained by SALP-seq enrichment is highly overlapped with H3K27Ac track and DNase Cluster, further showing that the chromatin open region obtained by SALP-seq identification has high reliability.
NGS libraries from the four cell lines BTA, GM12878, HepG2, HeLa and 293T designed in this example were prepared by SALP in a high throughput protocol (FIG. 2C). By calculating the reads density around the Transcription Start Site (TSS), the TSS region was found to have a higher reads density (fig. 6A), indicating a higher degree of chromatin opening around the TSS region. To compare the chromatin opening state of four different cell lines at the whole genome level, the reads densities of the different cell lines were counted separately (fig. 6B). The results indicate that a partial region is identified as an open region of chromatin in different cell lines, such as a region located on chromosome 5. The region on chromosome 19 was selected and the chromatin opening status of this region was displayed in different cells (FIG. 6C), where the SALP-seqpeak in all four cell lines was highly coincident with the H3K27Ac site obtained in the ENCODE project and DNase cluster (FIG. 6C). At the genomic level, there are multiple peaks overlapping each other between different cells, indicating that there are many open regions of chromatin in common between different cells. However, there are differences in the degree of opening of these common chromatin opening regions. In addition, there are cell-specific peaks between different cell lines, indicating cell specificity in the presence of chromatin opening states. These data indicate that SALP-seq is an efficient, simple, easy-to-use method that enables a comprehensive comparison of chromatin opening states of different cell lines.
TABLE 2 Tn5 transposome tags and sample correspondences
Sample Number of Barcode Barcode
GM12878
105cells Barcode1 TAGCTT
HepG2
105cells Barcode2 CTTGTA
HeLa
105 cells Barcode3 GCCAAT
293T
105cells Barcode4 TGACCA
HepG2
5×104cells Barcode5 ATCACG
HepG2
1×104cells Barcode6 ACTTGA
HepG2
5×103cells Barcode7 CGATGT
HepG2 2.5×103cells Barcode8 ACAGTG
HepG2
5×102cells Barcode9 CAGATC
TABLE 3 sequencing Reads number statistics
Figure BDA0001438689190000161
Figure BDA0001438689190000171
Example 3 SALP-seq identification of chromatin opening status by different cell numbers
The experimental method comprises the following steps:
cell culture: HepG2 was cultured using DMEM medium. The medium contained 10% (v/v) fetal bovine serum, 100units/mL penicillin and 100. mu.g/mL streptomycin. Cells were incubated at 37 ℃ and 5% (v/v) CO2Culturing in an incubator.
Preparing a joint: all oligonucleotides were synthesized in Shanghai (Table 1). Preparation of Tn5 tag linkers (BTAs), tags and ME oligonucleotides in ddH2O was dissolved at 20. mu.M and mixed equimolar to the PCR tube. Preparation of Single Link adapters (SSA), SSA-PN and SSA-PNre in ddH2O was dissolved at 100. mu.M and mixed equimolar to the PCR tube. The above oligonucleotide mixture was 95 deg.CDenaturalizing for 5min, and naturally cooling to 25 ℃.
Preparation of Tn5 transposome: mu.L of BTA (10. mu.M) was combined with 2. mu.L of 10 XPPS, 1. mu.L of Tn5Transposase and 13. mu. L H according to the instructions of Robust Tn5Transposase (Robust Tn5Transposase, Robustonique corporation Ltd.)2And (4) mixing the materials. Tn5 transposomes were formed by incubation at 25 ℃ for 30 minutes. The transposable body can be stored at-20 ℃ for later use.
Chromatin shearing (tagmentation): 50,000, 10,000, 5,000, 2,500 and 500 HepG2 cells were counted, respectively. Cells were harvested by centrifugation at 500g for 5min at 4 ℃ and washed with 50. mu.L of pre-cooled PBS. mu.L of precooled cell membrane lysis buffer (10mM Tris-HCl, pH7.4, 10mM NaCl, 3mM MgCl)20.1% IGEPAL CA-630) resuspended cells. Nuclei were collected by centrifugation at 500g for 10 minutes at 4 ℃. During chromatin shearing (tagmentation) reaction, 50000 and 10000 cells were carried out in 30. mu.L reaction system, and the components were 4. mu.L Tn5 transposome, 3. mu.L DMF and 1 × LM buffer solution; 5,000, 2,500 and 500 cells were performed in a 5. mu.L reaction system with the composition of 1. mu.L Tn5 transposomes, 0.5. mu.L DMF and 1 XLM buffer. After different reaction systems are mixed uniformly, incubation is carried out for 30 minutes at 37 ℃, and the mixture is uniformly mixed every 10 minutes in the incubation process so as to improve the shearing (attenuation) efficiency. In the shearing (differentiation) reaction, different cell samples used different Tn5 tag linkers (BTA) (see table 2).
Library preparation: after Tn5 transposome ligation reactions, chromatin from five different numbers of HepG2 cells was pooled to give a chromatin mixture. 1% SDS and 20mg/mL Proteinase K (Sigma) were added to the chromatin mixture to final concentrations of 0.1% and 400. mu.g/mL, respectively. The mixture was incubated at 65 ℃ for 1 hour and 1 XTE buffer was added to 200. mu.L. The DNA was then extracted by phenol chloroform extraction. The resulting DNA samples were subjected to SSA ligation, rTaq enzyme extension and Illumina compatible Index PCR amplification (same procedure as in example 1). And carrying out agarose gel electrophoresis on the amplification product, and recovering the fragments within the range of 150-1000 bp to obtain more nucleosome deletion sequences. In the library preparation process, 3 base overhanging SSA linkers were used according to the optimization results.
NGS sequencing: the library NGS-L2 (five cell number tagged chromatin) was prepared by Illumina compatible Index PCR amplification (table 3). This library was mixed with other 3 NGS libraries constructed using SALP, including NGS-L1 (four cell lines tagged chromatin; example 2), NGS-L3(HindIII digested gDNA; example 4) and NGS-L4 (ultrasonically fragmented gDNA; example 4), in proportions of 4:1: 1(NGS-L1: NGS-L2: NGS-L3: NGS-L4) according to DNA mass to form DNA samples that could be sequenced using a single channel (lane). After quality control and quantitative detection of the mixed DNA sample by an Agilent Bioanalyzer 2100, the mixed DNA sample is handed over to Nanjing and Gene biotechnology, Inc. for sequencing by an Illumina Hiseq XTen platform.
And (3) data analysis: raw data were sorted by Perl script according to index and barcode. ME (19bp) and barcode (6bp) were removed from the 5' end of double ended sequenced read 2. All reads were truncated to 30bp, aligned to the human genome (version hg19) by the Bowtie (version 1.1.2) program, using the parameters: x2000 to ensure alignment of long fragments to the genome. Peak calling was performed by macs2 with the following parameters: -f BEDPE-keep-dup ═ 2. Peak annotation was performed by Homer software. Gene ontology analysis was performed by uploading a list of genes to the PANTHER website (http:// pantherdb. org. /). Detection of overlapping peak was performed by the BEDTools Interpectect program. All peak tracks were displayed by UCSC genome browser and relevant statistical analysis was performed by R software and Perl script.
The experimental results are as follows:
as a method for identifying the chromatin opening state, FAIRE-seq and DNase-seq generally require 1 to 5X 106One cell, while ATAC-seq requires only 500 to 50000 cells. To demonstrate that the SALP-seq is able to identify the chromatin opening state using a different number of cells, 6 different cell numbers (100,000, 50,000, 10,000, 5,000, 2,500 and 500) of HepG2 cells were selected for SALP-seq. The density of reads at different cell numbers at the whole genome level was counted and the results indicated that the regions of high chromatin opening levels identified by different cell numbers coincided (fig. 7A, 7B). The chromatin open regions identified by SALP-seq matched H3K27Ac and DNase Cluster in ENCODE (FIG. 7B). However, when the starting cell number was small, the sensitivity was decreased (fig. 7C). When the amount of cells used is small, only the region with high enrichment Factor (FE) is coveredIdentified (FIG. 7D). This indicates that SALP-seq is also able to capture the more open areas of chromatin by 500 cells.
Example 4 SALP-seq construction of an NGS library by enzymatic or ultrasonic fragmentation of genomic DNA
The experimental method comprises the following steps:
cell culture: HepG2 cells were cultured in DMEM medium containing 10% (v/v) fetal bovine serum, 100units/mL penicillin and 100. mu.g/mL streptomycin. Cells were incubated at 37 ℃ and 5% (v/v) CO2Culturing in an incubator. Preparation of genomic DNA: genomic DNA (genomic DNA, gDNA) of HepG2 cells was extracted by phenol chloroform extraction.
Fragmentation of gDNA: 50ng of gDNA was incubated at 55 ℃ for 15 minutes in a 30. mu.L reaction system to carry out the mutagenesis reaction, which had the following components: 1 xlm buffer, 3 μ L of Dimethylformamide (DMF), 4 μ L of Tn5 transposomes. The fragmented gDNA was purified using MinElutePCR Purification Kit (QIAGEN, 28004). Hind III, 1. mu.g of genomic DNA was digested overnight at 37 ℃ in a 50. mu.L reaction system consisting of: 1 XFastDigst buffer and 5. mu.L of LFastDigest Hind III (Thermo Fisher, ER 0501). G DNA is broken by ultrasonic, 1 mug of genome DNA is processed by a BRANSON ultrasonic instrument, and the ultrasonic conditions are as follows: 70% power, 20s on, 20s off for 20 cycles. All fragmented DNA was denatured at 95 ℃ for 5 minutes, rapidly frozen in ice for 5 minutes, electrophoresed in 1.5% agarose gel, and 200 to 1000bp fragment was recovered from QIAquick gel extraction Kit (QIAGEN,28704) gel.
Preparing a T joint: all oligonucleotides were synthesized in Shanghai (Table 1). In preparing single-link joints (SSA), SSA-PN and SSA-PNre are in ddH2O was dissolved at 100. mu.M and mixed equimolar to the PCR tube. In preparing T-linkers, the TOA and TOArev oligonucleotides are in ddH2O was dissolved at 100. mu.M and mixed equimolar to the PCR tube. The oligonucleotide mixture is denatured at 95 ℃ for 5min and naturally cooled to 25 ℃.
Library preparation: hind III digested and sonicated gDNA was ligated to 3 random base protruding SSA by the same procedure as Tn5 treated DNA and extended. The extension product was ligated to T-linker in 10. mu.L reaction under conditions of 1. mu. L T linker (5. mu.M), 1 XT 4DNA ligase reaction buffer, 1. mu. L T4DNA ligase, and incubation at 16 ℃ for 2 h. The ligation products were purified by 2 × Ampure XP beads and amplified with different NEB index primers (Table 1). The PCR system was performed as described above. And carrying out agarose gel electrophoresis on the amplification product to recover a 300-1000bp DNA fragment.
NGS sequencing: the libraries NGS-L3(HindIII digested gDNA) and NGS-L4 (sonicated gDNA) were constructed using SALP by Illumina compatible Index PCR amplification (Table 1). Two library DNAs were mixed with two libraries established on the basis of Tn5, namely the libraries NGS-L1 (four cell line tagged chromatin) and NGS-L2 (five cell number tagged chromatin), in a ratio of 4:1:1:1(NGS-L1: NGS-L2: NGS-L3: NGS-L4) according to DNA quality to form DNA samples that could be sequenced using a single channel (lane). After quality control and quantitative detection of the mixed DNA sample by the Agilent Bioanalyzer 2100, the mixed DNA sample is handed over to Nanjing and Gene biotechnology Co., Ltd and sequenced by an Illumina Hiseq X Ten platform.
And (3) data analysis: raw data were sorted by Perl script according to index. All reads were truncated to 30bp, aligned to the human genome (version hg19) by the Bowtie (version 1.1.2) program, using the parameters: x2000 to ensure alignment of long fragments to the genome. Reads Density calculation: the number of Reads per 1Mb window was counted and the whole genome Reads density was plotted and displayed in chromosome units. The statistical method of the HindIII enzyme cutting site density comprises the following steps: searching a HindIII enzyme cutting site sequence (5'-AAGCTT-3') in a human whole genome sequence, counting the number of HindIII enzyme cutting sites in each 1Mb window, drawing the whole genome HindIII enzyme cutting site density, and displaying by taking a chromosome as a unit.
The experimental results are as follows:
to demonstrate that SALPs can construct NGS libraries from differentially fragmented DNA, the HindIII digested and sonicated HepG2 genomic DNA (gDNA) was used to construct NGS libraries by a modified SALP method in which a T-linker (T adaptor) with a 3' overhang base T was ligated to the end of the DNA fragment after SSA ligation extension. The DNA was then amplified by two primers annealing to SSA and T-linker, respectively, to generate a library compatible with the Illumina sequencing platform (FIG. 1A; FIG. 4A). The results show that by the modified SALP method, restriction enzymes (e.g., Hind II) I can be used to cut and sonicate fragmented HepG2 genomic DNA (gDNA) to construct the NGS library (FIG. 8). The method can successfully construct an NGS library (FIG. 10) by enzyme digestion (FIG. 9) and ultrasonically fragmented gDNA, and the NGS library and Tn5 fragmented DNA are sequenced through an Illumina Hiseq platform.
To demonstrate the coverage of the two libraries on the genome, the reads densities of the two libraries were counted separately at the whole genome level. The results showed that reads measured by the two libraries were evenly distributed on different chromosomes (fig. 9, fig. 10), indicating that SALPs could successfully construct NGS libraries from gDNA fragmented by enzymatic treatment and physical methods. In addition, the distribution of the density distribution of HindIII cleavage sites on the genome obtained by prediction substantially agreed with the density distribution of reads measured by NGS (FIG. 9). On the partial chromosomes ( chromosomes 5, 9 and 11) (fig. 9), there were more significant differences between the two due to the sequence differences that existed between the reference genome (hg19) and the HepG2 cell genome.
Example 5 preparation of cDNA based on the SALP method
The experimental method comprises the following steps:
cell culture: HepG2 was cultured using DMEM medium. The medium contained 10% (v/v) fetal bovine serum, 100units/mL penicillin and 100. mu.g/mL streptomycin. Cells were incubated at 37 ℃ and 5% (v/v) CO2Culturing in an incubator. Preparation of cDNA: total RNA (total RNA) of HepG2 cells was extracted by Trizol method. The total RNA was reverse transcribed using a reverse transcription kit (RR036A) from Takara to obtain cDNA.
Preparing a joint: all oligonucleotides were synthesized in Shanghai (Table 1). Preparation of Single Link adapters (SSA), SSA-PN and SSA-PNre in ddH2O was dissolved at 100. mu.M and mixed equimolar to the PCR tube. The oligonucleotide mixture is denatured at 95 ℃ for 5min and naturally cooled to 25 ℃.
Preparation of cDNA: the cDNA sample obtained by the reverse transcription was subjected to SSA ligation, rTaq enzyme extension, and PCR amplification was performed using oligo dT (reverse transcription kit of Takara Co.; RR037A) and NEBNext Universal PCR Primer. The PCR reaction system is as follows: 20 μ L of ligation product, 1 μ L of NEBNext Universal PCR Primer, 1 μ L of oligo dT, 25 μ L
Figure BDA0001438689190000201
HS(Premix)(R040Q),ddH2O make up to 50. mu.L. The PCR reaction procedure was as follows: (i) 5 minutes at 98 ℃; (ii) 30s at 98 ℃; 30s at 50 ℃; 90s at 72 ℃; 35 cycles; (iii) 5 minutes at 72 ℃. The resulting product was electrophoresed on a 1.5% agarose gel.
The experimental results are as follows:
to demonstrate that the SALP method can be used to prepare cDNA, we initiated cDNA obtained by reverse transcription, ligated to SSA and extended, and amplified the ligated product by two primers annealing to SSA and ployA, respectively, to obtain cDNA (FIG. 11). The results showed that the experimental group (T4+) showed significant amplification products compared to the SSA ligation control group (T4-) without T4DNA ligase and the PCR negative control group (Blank) without template, indicating that cDNA was successfully prepared by the SALP method.
Because random primers are commonly used in cDNA amplification at present, a complete cDNA 3' terminal sequence is difficult to obtain, and a full-length cDNA fragment cannot be obtained. In the present invention, SSA is ligated to the 3' end of cDNA and extended to generate double-stranded DNA, so that the obtained cDNA contains full-length cDNA.
Sequence listing
<110> university of southeast
<120> construction method and application of next generation sequencing library based on single-chain linker
<160>63
<170>SIPOSequenceListing 1.0
<210>1
<211>19
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>1
agatgtgtat aagagacag 19
<210>2
<211>19
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>2
ctgtctctta tacacatct 19
<210>4
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>4
gactggagtt cagacgtgtg ctcttccgat cttagcttag atgtgtataa gagacag 57
<210>4
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>4
gactggagtt cagacgtgtg ctcttccgat ctcttgtaag atgtgtataa gagacag 57
<210>5
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>5
gactggagtt cagacgtgtg ctcttccgat ctgccaatag atgtgtataa gagacag 57
<210>6
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>6
gactggagtt cagacgtgtg ctcttccgat cttgaccaag atgtgtataa gagacag 57
<210>7
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>7
gactggagtt cagacgtgtg ctcttccgat ctatcacgag atgtgtataa gagacag 57
<210>8
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>8
gactggagtt cagacgtgtg ctcttccgat ctacttgaag atgtgtataa gagacag 57
<210>9
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>9
gactggagtt cagacgtgtg ctcttccgat ctcgatgtag atgtgtataa gagacag 57
<210>10
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>10
gactggagtt cagacgtgtg ctcttccgat ctacagtgag atgtgtataa gagacag 57
<210>11
<211>57
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>11
gactggagtt cagacgtgtg ctcttccgat ctcagatcag atgtgtataa gagacag 57
<210>12
<211>19
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>12
ctgtctctta tacacatct 19
<210>13
<211>33
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>13
agatcggaag agcgtcgtgt agggaaagag tgt 33
<210>14
<211>34
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>14
acactctttc cctacacgac gctcttccga tctn 34
<210>15
<211>35
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>15
acactctttc cctacacgac gctcttccga tctnn 35
<210>16
<211>36
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>16
acactctttc cctacacgac gctcttccga tctnnn 36
<210>17
<211>37
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>17
acactctttc cctacacgac gctcttccga tctnnnn 37
<210>18
<211>33
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>18
gactggagtt cagacgtgtg ctcttccgat ctt 33
<210>19
<211>32
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>19
agatcggaag agcacacgtc tgaactccag tc 32
<210>20
<211>58
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>20
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58
<210>21
<211>66
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>21
caagcagaag acggcatacg agattgttga ctgtgactgg agttcagacg tgtgctcttc 60
cgatct 66
<210>22
<211>66
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>22
caagcagaag acggcatacg agatacggaa ctgtgactgg agttcagacg tgtgctcttc 60
cgatct 66
<210>23
<211>66
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>23
caagcagaag acggcatacg agattctgac atgtgactgg agttcagacg tgtgctcttc 60
cgatct 66
<210>24
<211>449
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>24
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gtagttagat gctgtcattg gatgacattg 120
ggcaagcttg tcatgtgtct tctgatgtct cccttgtcct ttatcaactc accttcttgc 180
tgaacacttt tggagtttct tgtgtgttta ttggctactg aatctccttc caactaaatt 240
atgtagagtc taggaaacac agttctgaaa tttaatcctg gttcatttgc tagaactctg 300
gatttttttc cccaaatagt ttggtttctt atacactaat caggaccatt ttcctagttg 360
gaaaaaagca ggcacaaggt gtggtggcag aagatcggaa gagcgtcgtg tagggaaaga 420
gtgtagatct cggtggtcgc cgtatcatt 449
<210>25
<211>501
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>25
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggaagtgttc ttgataaaga agaaagatga 120
cttgattgca ttaaggccag tgagttccac tctcatcctg gaaacaaaag aatatacttc 180
tagtagagca gatctggcaa atgatagatg gagaaggcaa aacaacacta ctcatgcctt 240
aagcctgctg ctttcttaaa ttgaacacac aagaaaaaaa agatgaaaac aagtattttg 300
tttttacata attttatttc aaaattttaa gtttcagaaa agagagttgc atgatgtatt 360
gttataataa gaaatgctac ttgaaaggac ttttgaataa attgagaaaa acaagaaagt 420
gataccaagg agcactgaga cagagatcgg aagagcgtcg tgtagggaaa gagtgtagat 480
ctcggtggtc gccgtatcat t 501
<210>26
<211>391
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>26
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctggcc aatagatgtg tataagagac aggattataa ttcaatacat taaaaataaa 120
attaaaatgc agagtaacat tcttctatag tgaagaatgc cagctattaa acactgaaca 180
aagatcgaat tagaaaagca caatttaaaa aatgcacagt ttattagata aggataattg 240
atgaaatcaa tggatattgg aaaccatggg tgaaagattt tatgggaata agatatttac 300
atagtccaaa ataattcagc caaaattcat cccagatcgg aagagcgtcg tgtagggaaa 360
gagtgtagat ctcggtggtc gccgtatcat t 391
<210>27
<211>457
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>27
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gagaaaacac actggccaaa tcttagctat 120
ttgaggaatg tagggagaaa agccaccttc tctctctatg tctgaaggtt cccatggctg 180
tctctttgcc caaggggcaa actttccatc agggcatctt ctgtgcctct gaggatcatt 240
ttccaattat aggcaatggt agtacgtgtt tcagtgcaga atgagataga gttgtttaat 300
ttgacaataa agcgatgcgt caaaaacctc agtcaacaca gtaagtgttt tcttgttttc 360
ctgctgacca acctaattct ggtttcatac agggcagcca gatcggaaga gcgtcgtgta 420
gggaaagagt gtagatctcg gtggtcgccg tatcatt 457
<210>28
<211>343
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>28
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggggtggttt gctttccaag gggcatggag 120
atggagatgc tacagaacat gccacgcttg atacacttcg catcgcccag gtgcccctac 180
tgcgtgtcat gtgacggtgg gtgtgcctgg ttgaggacac cttctagtct catgtgtgaa 240
acacaagctt gtttgtttga catagtctgt tgtgtagtta atgttagatc ggaagagcgt 300
cgtgtaggga aagagtgtag atctcggtgg tcgccgtatc att 343
<210>29
<211>545
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>29
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggtctcacct tcaaccactg tgtgctaatc 120
ccagcaggct gggtgaggtg tgtagatggt atctcacgtg gagatcttgc agggtaaacc 180
ttaagatcta ctgacaaata ctgattccag ttggaagcat tagtacattt tgaaatattt 240
aataatttta acttttctta gatatgcccc acttggggac tatctttaag ggccatgaaa 300
ccggtatgat agtaattctt aagattttta aatgaagaaa agcaggagaa atgttggtaa 360
taggatcagt caaatatctg ctagttgaaa ccaccagatg caaatgtttt aagtttcttc 420
ccactgctac tttccactct aatatagctt gttgaaagaa aataaaattt gatcatgcgg 480
gctagtcaga tcggaagagc gtcgtgtagg gaaagagtgt agatctcggt ggtcgccgta 540
tcatt 545
<210>30
<211>495
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>30
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggcctgtagg tttttttctt ttaacttgtg 120
atttttaaaa tgaagtaatt taaaaattgg gaaatttcac ataaaaaccc agatttttgg 180
aaaaatcaga tgatctagca caactatgct tggattcaac gtgatgatga tcctggccac 240
gcgagggggc tgcctgtttc cactgagata tctgctctct gcctgacagc tgttcccatc 300
aggccccaca gtcttgcatc tgcctgcctt ccacagtggc ctcacctgtg ggcttgcata 360
catccctgag tttggaactc atgttctgtt gatcatttct cacttaacta atacccatgg 420
gcttcaaaga cttcagcaga tcggaagagc gtcgtgtagg gaaagagtgt agatctcggt 480
ggtcgccgta tcatt 495
<210>31
<211>455
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>31
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gtatttggat tctgggctac cattttattt 120
catgcatttt gttttgctgg tttaatttcc tttttctctt ctttggagtt gacttctatc 180
attcaatttt tcttctcctc tacttgtttg gtttctattt ttaatgtata cattgtatgc 240
actatgtata catgtatata gtgtatataa gcatatgtat atgtatcatg tatgtgtata 300
tgtatatata tgcctacaaa tgaagattac ttaaatctta gcaactagtc taaaataatg 360
aaggcttaga aactggaagg gaagagagct tagtggtaga tcggaagagc gtcgtgtagg 420
gaaagagtgt agatctcggt ggtcgccgta tcatt 455
<210>32
<211>531
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>32
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggtgcattca actcaccgag tgcaacattc 120
ctcttgatag agcagtttgg aaacattgtt tctgtagaat ctgcaagtgg atatatggac 180
cgctttgagg ccttcgttgg aaacgggatt tcttcctata aacccagaca gaagaattct 240
cagagatttc tttgtgatgt gtgaattcaa ctcacagtgt ggatccttcc ttttgataga 300
gcagttttga aacaccgttt ttgtagtatt tccaagcgga tatttggaac gccttgaagc 360
gtatggtaga aaaggaaata tcttcccata aaacctagac agaaccaatc tcagaaacga 420
ctttgtgatg tctgcattca actcacagag ttgaacattt ctcttgatag agcagatcgg 480
aagagcgtcg tgtagggaaa gagtgtagat ctcggtggtc gccgtatcat t 531
<210>33
<211>568
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>33
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggagaacagt attgaaatat gggcatccag 120
gccaattgca agaggacaga tcttgagtcc taatccattt gagtttatta caaggacaga 180
agtgaagtga aagggaatta ctcatttttg tttttcaaca tttgcctctt gaatcaagag 240
agagggtgga gcctctcttg cttgtgaggc agggtgtttc cacatactcg taacttgaac 300
tctaggaaga aaaaggtagc aggataaatt ttacagaaaa gggaagtaga gcagcatgct 360
ttgcccaagc actcatctcc tttgatacag ttccttcaga tactttgaaa tgactaatgc 420
attatattta aggccactag tactagtcat tgtgttttca aggaaatcag aggtattccc 480
tgcttcacta aatgtatttg tcatgaccaa agatcggaag agcgtcgtgt agggaaagag 540
tgtagatctc ggtggtcgcc gtatcatt 568
<210>34
<211>716
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>34
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggctgtcaac ctgagtcatg ggggtgggat 120
gagggtaggg ggcagagtaa tgttttctct aggtcacata ctttgtattc aacttatagc 180
ttgaatcttc agattggcaa caagtgcaac attggcaaat cttacaattc ccttgcaatt 240
cacaagttac aaagcacttt aaccagaaca tcctcagaac aacatactgt aacattggta 300
gagttggtat tatcatcact ttttaaggaa aagatatagg aagcttagta aagctaagca 360
aactattcaa attcacacag agagtaatta gaagaaaagg attaaaaaca ggtctctaga 420
gttctctcca aagtaccatg gtactccaaa aataaattat tgcagcgttc tttgaatata 480
tcatcacact tcatttttat aaaacatttg ggctatttat atgtatgcat atacaactga 540
tactttcaaa caattatacc atcctttcat aaacacttgt actccaactt tttaataaat 600
gaagtcaggt ctagaaaaat atacccttaa gttccaccaa aatatagact gctgcacaag 660
atcggaagag cgtcgtgtag ggaaagagtg tagatctcgg tggtcgccgt atcatt 716
<210>35
<211>484
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>35
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gggtcatttt accttctcag ttaaaattaa 120
acatttattc tttgtgtcaa tagcacttga atgtatagtt agaatactta tcaaattatt 180
gtgcttgttt atatatttat ttccctgatt aagagaggaa aaaaagaata actttctatt 240
tcatttcctc agaatctgcc ctgatgttta gctcaaatag atgttaatga gtatttattg 300
aattaagaat gaaaaaattt aagccaacaa atgtataact gtgttctctg tcttgttcaa 360
gttgaggaat acataaacta ggttacttta gagaataaat gagcaaagaa aatgagcttt 420
tagtgcagat cggaagagcg tcgtgtaggg aaagagtgta gatctcggtg gtcgccgtat 480
catt 484
<210>36
<211>402
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>36
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gcttccgtct cccccactgg actctgaatt 120
ccttgagggt agggattgtg cccttcttcc cagtgcctgc cacagaaacg gtgcccagta 180
aacacgtatt tgtggaattg atgaattgga gttggtctct gccctgggtg tttcccatca 240
gtctcgctgt cccgcccttc tgcccttctg aagcccataa aacagagtct gctccccaag 300
ctggcctggc tcgggtcggg gctcgcagcg tcccctcccc agcaagatcg gaagagcgtc 360
gtgtagggaa agagtgtaga tctcggtggt cgccgtatca tt 402
<210>37
<211>566
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>37
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggtgtgggaa ggtgggtgga aaatgagttt 120
ttgttaatat tcaaaggcat gaaaacattt ttaccagttt atgttttcct ggtgcattta 180
gaaatctgtg gatccttggg gatggtgtat gcaggcaaat agagaatcca gtacttgtga 240
atctgcctga atccacaggt ttgggaataa gggcagggac ttgagggttc acagatgtga 300
aggttgtaca cagaactcat gcagagagat acaagatctt tttgttcccc ctttgattag 360
aaagaatagg acatgaaagt acttaattgt caacttcgtc ttcaccataa gcccagtatt 420
gatgcaaaaa tgataataat aatgagaaca agcatttatt gagtattggg tattctaatt 480
gcttaaatca actcatgtaa ttctcacaag atcggaagag cgtcgtgtag ggaaagagtg 540
tagatctcgg tggtcgccgt atcatt 566
<210>38
<211>566
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>38
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggtgtgggaa ggtgggtgga aaatgagttt 120
ttgttaatat tcaaaggcat gaaaacattt ttaccagttt atgttttcct ggtgcattta 180
gaaatctgtg gatccttggg gatggtgtat gcaggcaaat agagaatcca gtacttgtga 240
atctgcctga atccacaggt ttgggaataa gggcagggac ttgagggttc acagatgtga 300
aggttgtaca cagaactcat gcagagagat acaagatctt tttgttcccc ctttgattag 360
aaagaatagg acatgaaagt acttaattgt caacttcgtc ttcaccataa gcccagtatt 420
gatgcaaaaa tgataataat aatgagaaca agcatttatt gagtattggg tattctaatt 480
gcttaaatca actcatgtaa ttctcacaag atcggaagag cgtcgtgtag ggaaagagtg 540
tagatctcgg tggtcgccgt atcatt 566
<210>39
<211>409
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>39
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gatacattag aaatggaaag ctatggaaga 120
ttccacaaag agaaatagat aatattttga aaccttactc taaggaatat gacaatgtgg 180
gatatcctcc ctgccctcaa ccctccccct tgttcccatt ccatttcttc tcctttagag 240
ctttgaagaa aacgcatttg gtatttagta atcaggatta aacaatataa gcacatcaca 300
cctcttagct cactttttct gataactgca cagaaacaag actctgtcat aagatcggaa 360
gagcgtcgtg tagggaaaga gtgtagatct cggtggtcgc cgtatcatt 409
<210>40
<211>499
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>40
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggaacactga actatgggag ggtacaccca 120
acattgctgg agacaccatg cccttcacag ggtcctccca aagcgctcca gacaccagca 180
ccctggataa agaacctgcc actttgtccc aggggctgag gcttctctcc agtctcctgc 240
gtctacccca ttttcaagcc ctcttgcttt ggtctcatgt gcccacactt tcaacccaaa 300
ctgtgccttt ctggccagtc tctatggatg aatacctcag ctgaactgtc tacctggctt 360
tccataagat catctttggt tccaggatct acaataaaag caccagacct gatctatccc 420
agtctgctct gacctcatca agatccggaa gagcgtcgtg tagggaaaga gtgtagatct 480
cggtggtcgc cgtatcatt 499
<210>41
<211>509
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>41
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggagtgtaca tcttaaataa ccaatttgta 120
ataaatttaa tcagctagaa aacaagtgta acttttgcaa cttttgaaaa acacacatct 180
ctgggcatca atgaaaactc ttccctctac agtaagccta atgaagtgca actaaaaata 240
acagtcatca actgtgtttt aaaggcagta tttcaacata atcaaatgtg tcaaatattc 300
atccttacag cttcttatgc tgtgggttat aagtaagttt catttcttgg gaatgactga 360
acataaccca cctggggctc tgccatctgt gaattactta tatgtgaaca ctctttaaga 420
gatggaaatt ttgattgttt tttcttcctg tagatcggaa gagcgtcgtg tagggaaaga 480
gtgtagatct cggtggtcgc cgtatcaat 509
<210>42
<211>458
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>42
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gcttttggat ttttttaaat attttatttt 120
ttataattca agcaaagatc aaaaaaatat taaaaaaata attgctcgac actatcaata 180
ctttagtgtt aaagaaactc taaaaaaatt tgataaaaat tcaaataaag ttggaattgt 240
atgacatgtt caaggttcag gtaaatccct tacaatggta atgttgacta agttgttaag 300
aacaattgaa aagaatttaa cagttattgt ggtaactgat agaattgatc ttcaagatca 360
attgaacaac acttttaata actttcataa atatattggt agatcggaag agcgtcgtgt 420
agggaaagag tgtagatctc ggtggtcgcc gtatcatt458
<210>43
<211>644
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>43
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gttacaatcc tgctcttgtg gctgggatca 120
ggaatatgag caaaggaggc cacgagaagg aatcacatgt gtaaaaggtg gcttgaatta 180
ttatttttta aaatatcatg gaggcttatt atgagacaaa tcatcaagat agctgacaat 240
agatataatg ttcagccact tcaggtcttt gccttttctg tccacactat atttatttgc 300
acacaaatac caccaatgcc actaccactg tcactagtcc cagttagcct tattgttctc 360
catagcattg agcacaacta gtcattccac gtattttacc tttttatttt ctttatctta 420
tgcctcattc taccagcatg gaaactcaaa taaatcattg atttgtttat ttttcatttg 480
tgcaccctca atacctaaaa cagtacgtgg cacagggcag ggattcaaaa agtgtttgtt 540
gactgaatgg tcagatctat tattttttga gacattctct ctagaaagat cggaagagcg 600
tcgtgtaggg aaagagtgta gatctcggtg gtcgccgtat catt 644
<210>44
<211>416
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>44
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gggcatatac attctaaata ttctaataaa 120
aacttttaga gattaccaaa caagtacttt tatttttcca tttaaaatag gatagaatgg 180
atagtcaaga tctatccagt cttctgtttc actttgggaa aatccccatt tgcctcatat 240
tagtttgtaa acatctcacg tttttcccaa gtctcagtag ttttaagtgc aaatgttacc 300
accaacaatc acatttttaa ctatatctat ttcgtcccta aaaaaactgg tgtttctcag 360
atcggaagag cgtcgtgtag ggaaagagtg tagatctcgg tggtcgccgt atcatt 416
<210>45
<211>508
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>45
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggggagagga tcaggaaaaa taactaatgg 120
gtactaggct taatacctgg gtgataaaat aatctgtaca acaaaccccc atgacacaag 180
tttacctatg aaacaaacct gcacttgtac ccctgaactt aaaagttaaa aaaagtgcat 240
atatacaatg aacaactatt cagccaaaaa aaatgaatga gatcctgtca tttccaatag 300
catgaaagga actgaaagac attaagtgaa gtaagtcagg cacagaaaga caaactttgc 360
atgttctcac atattcgtga gagctaaaaa attaaaacaa ttgaaatcat gcagatagag 420
agtagaatta tggttaccag aggctgggaa agatcggaag agcgtcgtgt agggaaagag 480
tgtagatctc ggtggtcgcc gtatcatt 508
<210>46
<211>418
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>46
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gctatataaa catctgactt ctcaacataa 120
atagtggaaa ttaacagaca ctggaatgat aattcaaagt gctgaagatg aaaaatcaag 180
aattctatat ttaatgaaat tatcttttta aaatggaggc caaaaataca tttttcagat 240
caacaaaatc taagataatt tgattgtaac aaatttatac ttcaggatgg acaagaagtt 300
ctgtcagctg atgagaaatg atgccagatg gtaactcaga tatacaagaa atactgaaat 360
agatcggaag agcgtcgtgt agggaaagag tgtagatctc ggtggtcgcc gtatcatt 418
<210>47
<211>446
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>47
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gaccatatgg aatgagatga taaaattatt 120
ggatttaaca gaacagtgtg ggaggtaaaa aaaaaaatca agaattttat tggcacaaat 180
tactggtttc tatcccctat tttctcaact ataattcttt tacattcctt cattctttcc 240
tctggaccca atcataatgt aattcctaaa tctagtggtt tttgtcagca ttcatcctac 300
ttgaactttc ttacagtgtt tgacaaacta cattctaatt ctggagctct gtcttttcac 360
atcactctat ctcagcttcc agaatactag atcggaagag cgtcgtgtag ggaaagagtg 420
tagatctcgg tggtcgccgt atcatt 446
<210>48
<211>593
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>48
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gatataatgc agtgcctcag acataattaa 120
aaaccttata gaactgaatt gaatgtccat gccatttatg gctggatgat ggcaagaaaa 180
cagattgtag gaaggaaaaa tcttgccatc atgtccagtt gggatgccga aatgcttcag 240
actttttttt tttttttttt aagaaaaaga atttgtgtct actggacagg aaattaattc 300
atttccagaa caagtttttt cttaaaacac gctaaggtca aacttcccat aatgcctact 360
gtcatggtgg ttgtctatga ttggtatagg cacatcccaa agcaataaat tcatctccta 420
aaggaccact gtgctaatgc ttgcctgaca acctgcttca agaaaatgtg tctaactcca 480
ttactaacat tgagtcatca ctgtccaatt ctttctcttt aatgtttaag agtaaagatc 540
ggaagagcgt cgtgtaggga aagagtgtag atctcggtgg tcgccgtatc att 593
<210>49
<211>569
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>49
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggaattgaat catcaccaaa ttgagtcgaa 120
tggaatcatc aaatggagtg aaatggaatc atcatcgaag ggaatggaat agaatcatcg 180
aattgactcg aaagaataat catcgaaggg aacggaaagg aattatccaa tggaatacaa 240
gagaaacatc atcaaatgga atcgaatgga atcatcatcg aaaggaatcc aatggaataa 300
tcatcaaatg gattcatacg gaatgataat cgaatggaat tcaaaggaat catcatcgaa 360
gggaatcgaa tgcaacaatc gaatggaatc taatggaatc atcatcgaat ggaatcgacc 420
ggaatcatcg aatggaagag aagagaatca tcattgaatg gaattgaatg gaatcgtcaa 480
tgaatggaat cgaatggaat aatcagagaa tagatcggaa gagcgtcgtg tagggaaaga 540
gtgtagatct cggtggtcgc cgtatcatt 569
<210>50
<211>495
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>50
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gatacaggag aactccaaac caagaaaggt 120
tttttttttt tttttggcca gactctgaaa aagtaggctg taatatatat ttttaaaaag 180
tctataccca tgaaggaccg tgtgaggaga tgctatctta tatagaatag ggctgaggct 240
tattgaggct ttgccaagat ttcagagtaa atcttattca ctttgaataa gaaatttgtc 300
ttatgagaaa actattggct tgaaatgtgg tgaatacaag ggctgaggga gactccagtg 360
ggtttgtacc tattctcagc cttacccagg agctggctga aatgggttag ttgatggaaa 420
aatctctttg tgtgtgtaga tcggaagagc gtcgtgtagg gaaagagtgt agatctcggt 480
ggtcgccgta tcatt 495
<210>51
<211>399
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>51
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gatcatcaag ttcaaactgc ttatcctagc 120
aatgcaaact gacagcatat gcacatacat aatcaaatgg gaaataataa tagtatgtct 180
cggcggactg cctaacacaa gaaagtcaac ggcccaaggg atatgtcaag aataatttct 240
ttgcatgcat tgctctgatg ctatgctttg acacaggatt atttcatcag tggagagtgt 300
atgaaactct taggcaacaa actggatttt ccttttcagc cagatcggaa gagcgtcgtg 360
tagggaaaga gtgtagatct cggtggtcgc cgtatcatt 399
<210>52
<211>687
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>52
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggcagtagat agtgaagttt ccttttttca 120
gtagaactga aacaatcaaa gaaatgttat tttagtaaat gttgatctct tttcattctt 180
tctccctgga cattctctga aatctctgtc aatggaattt gtttactcca aatcacatct 240
ttgttgcctt tgagtattac catgtttgaa tgtttaccac tcaaatccag cataaaagtg 300
tcttcttttt taggtaagat caggcaaaga ggtactgaat gaataacact tgattgggaa 360
tggtaaataa ccatgcaatt aaactgtaaa cactgtgtgt ggtgatttta atgtaatttg 420
aggacttgta aattatatgg tcataaaatg gcacttgggc ttatgcttta caaaaatatc 480
catgtttgta tgagattaat tagcgcagtt tgcataaaga cattgagtaa agcactttct 540
gaacattctc atttgtaagg ttttcttatt tataaggctt tctttttatt tctttctgtg 600
gtcttgaaga aattattatc attgtccaca gatcggaaga gcgtcgtgta gggaaagagt 660
gtagatctcg gtggtcgccg tatcatt 687
<210>53
<211>537
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>53
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggtgcaccct cacagttcaa acctgtgttg 120
ttcaagggtc aactgtatat ccaagttcat acatatcgta aatggtagaa ctaagatgca 180
atttcagatc caaattcaga ttttcaaatt cagtttccaa gtcatatgat gacactactt 240
agaaaatcaa aattagtttc cagcttttac aaatcaagct gctagtagta attctaatac 300
cattatatga ttattaataa tgccaccaca ttgatggctc agctgaggac tagaaaataa 360
gtctttaaca aaatttccta tttgtatttt atttttcttg caatgatgca cagctgagaa 420
cagaaaataa gtcttaacac tctcccaaag atggcaagat gcacagtcac catgtctaaa 480
gatcggaaga gcgtcgtgta gggaaagagt gtagatctcg gtggtcgccg tatcatt 537
<210>54
<211>506
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>54
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gctttcaatg ttcacttcaa cacttccggg 120
tatcaatctt tgtgtatagg aagatcctta acacacttcc tatacacaaa gctctggctc 180
agagtcaact tccccagaaa cagagaacct gacttcaaac aatcccttat taaaacataa 240
aaggtatcgt aagtttagaa atcagaaagg ccatgagtat aaatagctaa aatatgaatg 300
cagcagaaaa taccttcctt agaacattgt tttagaagtg gcaaactagg aaactttgaa 360
aagaggtcag tatgaaactg tgattttttt aaaaaagatt tcattttgac ttagttttaa 420
gggtgtttca gcctgcagtt atttcagaag atcggaagag cgtcgtgtag ggaaagagtg 480
tagatctcgg tggtcgccgt atcatt 506
<210>55
<211>451
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>55
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggttcagttt ccaatcctgt ggttatcttc 120
tttgcaatgt tgcagcagtt ttcagaagtt taaaagcttt aataattgaa atctctgata 180
tgctttatag caatgtacac gcatatgctt ttattctgta ataatttact gacaaaattt 240
accacctgac tgctcgaagt tctttcagac ttaggagatg tttttccagc agctcaagaa 300
atgctttctt gggaggactt cccatgctcc agggacttta cacgcctcat ttctcttaat 360
tctcacaagc agccaaagag atgggtgata ccaagatcgg aagagcgtcg tgtagggaaa 420
gagtgtagat ctcggtggtc gccgtatcat t 451
<210>56
<211>364
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>56
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgtgccaata gatgtgtata agagacaggc caatagatgt gtataagaga cagtgtgtac 120
atatgtaaca aacctgcacg ttgtgcacat gtaccctaga acttaaagta tagtaaataa 180
aaaaaaggaa aaaaattgct cacaagactg tggagaaaaa agaatgctta tatcatgttg 240
gtaggactgt aaattagttc agccattgtg gaaaggagtt tgatgatttc ttaaagaact 300
taaaacagat cggaagagcg tcgtgtaggg aaagagtgta gatctcggtg gtcgccgtat 360
catt 364
<210>57
<211>434
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>57
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gcttagataa atatctacaa aacataattt 120
attaaaactg acttaagcaa aaaaccaaaa ggtaaaactt gaataatgct ataattatta 180
aagaaaaatt attctcacac acacacaaaa agtaccaagc ccccatgggt ttacaggtga 240
ggtgaaattt tcaaggacca gatcatctag acaaaagaaa ttcttccgga caaaagaaaa 300
attcttccag acaaaagaaa atgagggatt actccctaac tcctcttata aggggagatg 360
ttaaaggaga atggacagat cggaagagcg tcgtgtaggg aaagagtgta gatctcggtg 420
gtcgccgtat catt 434
<210>58
<211>501
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>58
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gggccaccag agatatttaa aaatcagaaa 120
atcagaaaaa atcagagcca tcattttcaa aaattagtct tctagcaggg aatgaactga 180
agaagaaagg ggcaaaaggt atggaaagta attaggacag agcttttagg gctccatatg 240
tcatgctttt cacaataaca tgcagaatcc acatcctcat ttaggtgtga taatatcatt 300
actctcttct ctctccaatg tctacttaga agtagcgcaa cagtaagtct tttttctgga 360
gatggtcatg gcttatacct gtagtcctgg tgaaattaat aatagtgccc ccattcactc 420
tcgaaagtat tctggtttgg acaagatcgg aagagcgtcg tgtagggaaa gagtgtagat 480
ctcggtggtc gccgtatcat t 501
<210>59
<211>306
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>59
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggaaccagtg agcttgaaga taggacattt 120
gaaattatcc agtcaaataa taaaaaaaga atgaagaaag cctccaggaa ttatgggata 180
ccatcaagag acccaacatt cacataataa aagttgctga aggagaaaag agagaaaaag 240
agccagaaag atcggaagag cgtcgtgtag ggaaagagtg tagatctcgg tggtcgccgt 300
atcatt 306
<210>60
<211>482
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>60
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca gacatgatac agtggaagga atactagagt 120
aggggtcagg aaaatcaaat actgtgaagt gtggaagtgc ctggcacaga gtagaaactt 180
aatcaatgta aatccctcat cccctcccat tgtgcatcat gagcttccca acacagccca 240
taaaatctcc aagttgtaat gctgaaagaa gggccacaac cttgtcaatg acgcaaaaga 300
gctttcatca gactgtgcat aatttagaat gtgaatctct gagaaatgag agctgatgag 360
agcagacttt agtaatcccc taaactctca atcatctgtg ttttggtaaa acaggagcac 420
tagcagatcg gaagagcgtc gtgtagggaa agagtgtaga tctcggtggt cgccgtatca 480
tt 482
<210>61
<211>742
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>61
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggggtaagga atggtgctga aagctttttc 120
tcagtgttcc agctccacca tgagctttat tatgtacctg tcccagagga agtctcatgc 180
ctccttctgt cccttcctgg tggtaggctg ctgttgcttg tttctgaggg ttgtatttat 240
ggcatgggta gagaggagga ggttttctta gggaataata cttgaattaa ctcatatttc 300
cacatatttt cttgcttaaa aaggtatttt aatgatccaa ataagtgttt tgacaagttt 360
tcatttatag ctacctcatt gaattattgg actagtaact ttaagaaagc aaaaataagt 420
agtgatttta gacataattt ttttttggaa tgaagtactg gctcctggta attgttgttt 480
actctacaga gcctatgaaa tcacacataa ttgattcaat aatattttat ggaaacttgc 540
cagaagtcga tgtcaaaaga acccatctct agattacaga atcaaacgcc cttttttttt 600
taacctggaa taataatttc tcttctataa ttttctatat cttcctcaca ttctctgggg 660
tttaaagtgg tttacataat cacaagatcg gaagagcgtc gtgtagggaa agagtgtaga 720
tctcggtggt cgccgtatca tt 742
<210>62
<211>451
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>62
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc 60
cgatctgcca atagatgtgt ataagagaca ggttcagttt ccaatcctgt ggttatcttc 120
tttgcaatgt tgcagcagtt ttcagaagtt taaaagcttt aataattgaa atctctgata 180
tgctttatag caatgtacac gcatatgctt ttattctgta ataatttact gacaaaattt 240
accacctgac tgctcgaagt tctttcagac ttaggagatg tttttccagc agctcaagaa 300
atgctttctt gggaggactt cccatgctcc agggacttta cacgcctcat ttctcttaat 360
tctcacaagc agccaaagag atgggtgata ccaagatcgg aagagcgtcg tgtagggaaa 420
gagtgtagat ctcggtggtc gccgtatcat t 451
<210>63
<211>455
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>63
caagcagaag acggcatacg agataaggcc acgtgactgg agttcagacg tgtgctcttc60
cgatctgcca atagatgtgt ataagagaca ggtagccaat tgatttttga ccagggaacc 120
acatttattc agtggggaaa gatagttcac caaatggtgc tagatttctg catacaaaag 180
aacaaagtta gacccctacc ttacaccata tacaaaaatc aactcaaaat tgaaaaacaa 240
cctaaatata agagttaaaa taccaagact cttagaagaa aacacagggg taaatcttta 300
tgaccttgga tttaacagtg gattcttaga tgtgtcacca aaagcacaag caacaaaaga 360
aaaaatagat aaatttgact tcatcagact ttaaactaga tcggaagagc gtcgtgtagg 420
gaaagagtgt agatctcggt ggtcgccgta tcatt 455

Claims (11)

1. A method for constructing next generation sequencing library based on single-chain linker is characterized by comprising the following steps
(1) Denaturing the double-stranded DNA fragment or the RNA/DNA hybrid fragment into single-stranded DNA;
(2) connecting a single-chain linker to the 3' end of the single-stranded DNA;
(3) extending the single-stranded DNA connected with the single-stranded linker with DNA polymerase to form double-stranded DNA;
(4) connecting a T adaptor to one end of the double-stranded DNA without the single-stranded linker;
(5) amplifying double-stranded DNA of which two ends are connected with the joint by PCR to form a DNA library which can be sequenced by next generation sequencing;
or: (a) assembling a rotary seat body by using Tn5 and Tn5 joints;
(b) treating a DNA sample or a cell sample with a transposome to fragment the DNA, and denaturing the fragmented double-stranded DNA fragments into single-stranded DNA;
(c) connecting a single-chain linker to the 3' end of the single-stranded DNA;
(d) extending the single-stranded DNA connected with the single-stranded linker with DNA polymerase to form double-stranded DNA; (e) amplification of double-stranded DNA with adaptors ligated at both ends by PCR into the next generation assay
Sequencing-enabled DNA libraries;
the single-chain linker is a double-stranded oligonucleotide with a sticky end; the cohesive end of the single chain linker is a plurality of random nucleotides protruding from the 3' end; the other 3' end of the single chain linker is a blunt end with a closed group; the length of the random nucleotides protruding from the 3' end is 1-4 nucleotides; the 5 'end of the cohesive end is a phosphate group, and the 5' end of the blunt end is a hydroxyl group; annealing the cohesive end of the single-stranded linker with the 3' end of the single-stranded DNA generated in the step (1) or (b), and catalyzing a 5' end phosphate group of the cohesive end of the single-stranded linker and a 3' end hydroxyl group of the single-stranded DNA generated in the step (1) or (b) to form a 3' -5 ' phosphodiester bond by using a nucleic acid ligase; the single-chain linker is a single-chain label linker, namely, a label sequence is added into a double-chain area of the single-chain linker, and the label sequence is close to random nucleotides protruding from the 3' end.
2. The method for constructing a next-generation sequencing library according to claim 1, wherein the double-stranded DNA fragments obtained in step (1) comprise ultrasonically-sheared DNA fragments, DNA fragments obtained by various enzyme cleavages, DNA fragments obtained by fragmentation based on transposable fragments, or naturally-degraded DNA fragments.
3. The method for constructing a next-generation sequencing library according to claim 1, wherein the Tn5 tag linker has the sequence structure: 5 '-primer annealing site sequence-tag sequence-ME sequence-3'; the ME sequence is double-stranded, and the primer annealing site sequence and the tag sequence can be single-stranded or double-stranded; the 3 'end of the double-chain ME sequence is hydroxyl, and the 5' end of the double-chain ME sequence is phosphate; wherein one strand of the ME sequence is 5'-AGATGTGTATAAGAGACAG-3', and the other complementary strand is 5 '-P-CTGTCTCTTATACACATCT-3', wherein P represents phosphate.
4. The method of claim 1, wherein the RNA/DNA hybrid of step (1) is an RNA/DNA hybrid generated by a reverse transcription reaction; the single-stranded DNA produced by the denaturation is complementary DNA.
5. The method of constructing a next generation sequencing library of claim 1, wherein said DNA polymerases of step (3) comprise various DNA polymerases; if the DNA polymerase is common Taq DNA polymerase, a protruding A base is naturally generated at the end of the 3' end of the double-stranded DNA generated by extension in the step (3), and the extension product of the DNA polymerase can be directly used for connecting a T joint in the step (4); and (3) if the DNA polymerase is other high-fidelity DNA polymerases, the tail end of the 3 'end of the double-stranded DNA generated by extension in the step (3) does not generate an outstanding A base, and the extension product needs to be treated by common Taq DNA polymerase and other enzymes with similar functions, so that the tail end of the 3' end of the extension product generates an outstanding A base, and then the A base is used for connecting the T joint in the step (4).
6. The method for constructing a next generation sequencing library according to claim 1, wherein the T-adaptor of step (4) is a double-stranded oligonucleotide with sticky ends; the cohesive end of the T joint is a T base protruded from the 3' end; the T base can anneal with the A base protruded from the 3' end of the double-stranded DNA generated in the step (3).
7. The method of claim 6, wherein the ligation in step (4) is performed by annealing the overhanging T base of the T-linker to the overhanging A base of the double-stranded DNA extended in step (3) at the 3' end, and then catalyzing the formation of a 3' -5 ' phosphodiester bond between the T-linker and the double-stranded DNA extended in step (3) by the ligase.
8. The method for constructing a next generation sequencing library according to claim 1, wherein the PCR amplification in step (5) is a double-stranded DNA fragment with a single-chain linker and a T-adaptor or a single-chain linker and a Tn5 tag adaptor connected to both ends of the double-stranded DNA fragment, and the annealing sites of the PCR primers in step (5) are provided by the single-chain linker and the T-adaptor or the single-chain linker and the Tn5 tag adaptor; PCR amplification is carried out by a pair of primers capable of annealing with the sequences of the single-chain joint and the T joint or the single-chain joint and the Tn5 tag joint, and PCR amplification products of DNA fragments can be generated; the PCR amplification product is a DNA library which can be sequenced by the next generation sequencing technology and can be used for next generation sequencing analysis.
9. The method for constructing the next-generation sequencing library according to claim 1, wherein the method is applied to genome DNA sequencing, cell chromatin opening analysis, gene expression detection and trace nucleic acid amplification, and the DNA sequencing analysis is based on a single-chain connector plus a T connector or based on a Tn5 connector plus a single-chain connector; the chromatin opening assay is based on "Tn 5 linker + single-chain linker"; the gene expression detection and the trace nucleic acid amplification are based on a single linker + T linker.
10. The use of claim 9, wherein the specific steps for use in cellular chromatin opening analysis are: (1) assembling a transposome body by using Tn5transposase and a Tn5 joint; (2) collecting different cells, and using a cell membrane mild lysis method to lyse the cell membrane but keeping the cell nucleus intact; centrifuging to collect cell nucleus and removing cell membrane debris and cytoplasm components; (3) treating the cell nucleus with a transposome to make it a chromatin fragment; (4) separating and purifying the genome double-stranded DNA fragment in the chromatin fragment; (5) denaturing the double DNA fragments into single-stranded DNA; (6) connecting a single-chain linker to the 3' end of the single-stranded DNA; (7) extending the single-stranded DNA connected with the single-stranded linker with DNA polymerase to form double-stranded DNA; (8) and amplifying double-stranded DNA with joints connected at two ends by PCR to obtain a DNA library which can be sequenced by next generation sequencing.
11. The use of claim 9, wherein the single-stranded linker, T-linker or Tn5 tag linker sequence contains T7 promoter sequence, and the double-stranded DNA fragment with linkers connected to both ends is subjected to in vitro transcription amplification and then reverse transcription to convert into DNA fragment; the reverse transcribed DNA can be used for high throughput sequencing analysis or low throughput detection analysis.
CN201710978737.4A 2017-10-19 2017-10-19 Single-chain-linker-based construction method and application of next-generation sequencing library Active CN107586835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710978737.4A CN107586835B (en) 2017-10-19 2017-10-19 Single-chain-linker-based construction method and application of next-generation sequencing library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710978737.4A CN107586835B (en) 2017-10-19 2017-10-19 Single-chain-linker-based construction method and application of next-generation sequencing library

Publications (2)

Publication Number Publication Date
CN107586835A CN107586835A (en) 2018-01-16
CN107586835B true CN107586835B (en) 2020-11-03

Family

ID=61052676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710978737.4A Active CN107586835B (en) 2017-10-19 2017-10-19 Single-chain-linker-based construction method and application of next-generation sequencing library

Country Status (1)

Country Link
CN (1) CN107586835B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280326B (en) * 2018-01-22 2021-06-11 哈尔滨工程大学 Method for eliminating DNA base tendency deviation in DNase high-throughput sequencing data based on deep recurrent neural network
CN110551794B (en) * 2018-06-04 2023-05-30 完整基因有限公司 Method and kit for processing RNA molecules and complex
CN109055486A (en) * 2018-08-02 2018-12-21 东南大学 A kind of construction method of high degradation of dna sequencing library and its application
CN110886021B (en) * 2018-09-07 2023-08-15 深圳华大生命科学研究院 Construction method of single-cell DNA library
CN109321568A (en) * 2018-10-25 2019-02-12 翌圣生物科技(上海)有限公司 The method of new-generation sequencing connector of the preparation with molecular label
CN109371109A (en) * 2018-10-25 2019-02-22 翌圣生物科技(上海)有限公司 A method of preparing the two generation sequence measuring joints with multiple molecular labels
CN109295164A (en) * 2018-10-25 2019-02-01 翌圣生物科技(上海)有限公司 A method of preparing the two generation sequence measuring joints with molecular label
CN109371108A (en) * 2018-10-25 2019-02-22 翌圣生物科技(上海)有限公司 The method of new-generation sequencing connector of the preparation with multiple molecular labels
CN109486924B (en) * 2018-11-23 2022-03-01 上海海洋大学 Tandem barcode based on Illumina sequencing, labeled DNA library thereof and construction method thereof
CN109680049A (en) * 2018-12-03 2019-04-26 东南大学 A kind of method and its application based on the dissociative DNA in blood high-flux sequence analysis affiliated individual physiological state of cfDNA
CN110257479A (en) * 2019-06-25 2019-09-20 北京全式金生物技术有限公司 A kind of method that rapid build RNA 3 ' holds gene expression library
CN110484532A (en) * 2019-08-09 2019-11-22 北京诺禾致源科技股份有限公司 Bis- generation of DNA sequencing library and its construction method, building kit
CN112391442B (en) * 2019-08-12 2023-03-10 深圳市真迈生物科技有限公司 Nucleic acid sample processing method, sequencing method and kit
CN112824534A (en) * 2019-11-20 2021-05-21 武汉华大医学检验所有限公司 Method for amplifying target region of nucleic acid, library construction and sequencing method and kit
CN111041563B (en) * 2019-12-31 2023-07-21 广州精科医学检验所有限公司 Target sequence capturing and PCR library building method
CN111575348B (en) * 2020-05-19 2024-01-09 广州微远医疗器械有限公司 Metagenomic library, library building method and application
CN111893576B (en) * 2020-07-08 2021-08-17 生物岛实验室 Construction method of trace cell genome sequencing library
CN113462759B (en) * 2021-08-02 2024-06-25 元码基因科技(北京)股份有限公司 Method for enrichment sequencing of single-stranded DNA sequence based on combination of multiplex amplification and probe capture and application of method in mutation detection
CN113862344A (en) * 2021-09-09 2021-12-31 成都齐碳科技有限公司 Method and apparatus for detecting gene fusion
CN118076734A (en) * 2022-01-26 2024-05-24 深圳华大智造科技股份有限公司 Method for simultaneously carrying out whole genome DNA sequencing and whole genome DNA methylation or/and hydroxymethylation sequencing
CN116694730A (en) * 2022-02-28 2023-09-05 南方科技大学 Construction method of single cell open chromatin and transcriptome co-sequencing library
CN115537408A (en) * 2022-10-08 2022-12-30 厦门大学 Single cell multi-omics library and construction method thereof
CN116287124A (en) * 2023-05-24 2023-06-23 中国农业科学院农业基因组研究所 Single-stranded joint pre-connection method, library construction method of high-throughput sequencing library and kit

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102264914A (en) * 2008-10-24 2011-11-30 阿霹震中科技公司 Transposon end compositions and methods for modifying nucleic acids
WO2016033251A2 (en) * 2014-08-26 2016-03-03 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
CN105463089A (en) * 2015-12-21 2016-04-06 同济大学 Assay for transposase accessible chromatin using sequencing (ATAC-seq) method applied to zebrafish embryos
WO2016169431A1 (en) * 2015-04-20 2016-10-27 深圳华大基因研究院 Method for constructing long fragment dna library
CN106754811A (en) * 2016-12-21 2017-05-31 南京诺唯赞生物科技有限公司 A kind of saltant type Tn5 transposases and its preparation method and application
CN106867995A (en) * 2017-03-01 2017-06-20 安徽安科生物工程(集团)股份有限公司 CfDNA builds joint, primer sets, kit and the banking process in storehouse

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102264914A (en) * 2008-10-24 2011-11-30 阿霹震中科技公司 Transposon end compositions and methods for modifying nucleic acids
WO2016033251A2 (en) * 2014-08-26 2016-03-03 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
WO2016169431A1 (en) * 2015-04-20 2016-10-27 深圳华大基因研究院 Method for constructing long fragment dna library
CN107250447A (en) * 2015-04-20 2017-10-13 深圳华大基因研究院 A kind of DNA long fragment library constructing method
CN105463089A (en) * 2015-12-21 2016-04-06 同济大学 Assay for transposase accessible chromatin using sequencing (ATAC-seq) method applied to zebrafish embryos
CN106754811A (en) * 2016-12-21 2017-05-31 南京诺唯赞生物科技有限公司 A kind of saltant type Tn5 transposases and its preparation method and application
CN106867995A (en) * 2017-03-01 2017-06-20 安徽安科生物工程(集团)股份有限公司 CfDNA builds joint, primer sets, kit and the banking process in storehouse

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Simple and Novel Method for RNA-seq Library Preparation of Single Cell cDNA Analysis by Hyperactive Tn5 Transposase;Scott Brouilette等;《DEVELOPMENTAL DYNAMICS》;20121231;第241卷;1584-1590 *
SALP, a new single-stranded DNA library preparation method especially useful for the high-throughput characterization of chromatin openness states;Jian Wu等;《BMC Genomics》;20180213;第19卷(第1期);1-12 *

Also Published As

Publication number Publication date
CN107586835A (en) 2018-01-16

Similar Documents

Publication Publication Date Title
CN107586835B (en) Single-chain-linker-based construction method and application of next-generation sequencing library
JP7229923B2 (en) Methods for assessing nuclease cleavage
CN110036117B (en) Method for increasing throughput of single molecule sequencing by multiple short DNA fragments
KR102423682B1 (en) Methods for generating double stranded dna libraries and sequencing methods for the identification of methylated cytosines
US7932029B1 (en) Methods for nucleic acid mapping and identification of fine-structural-variations in nucleic acids and utilities
JP7426370B2 (en) Preparative electrophoresis method for targeted purification of genomic DNA fragments
EP3208336B1 (en) Linker element and method of using same to construct sequencing library
WO2007076726A1 (en) Methods for nucleic acid mapping and identification of fine-structural-variations in nucleic acids and utilities
CN109266680B (en) Method for preparing CKO/KI animal model by using Cas9 technology
EP3098324A1 (en) Compositions and methods for preparing sequencing libraries
CN109563543B (en) Method for generating nucleic acid libraries
US20220333186A1 (en) Method and system for targeted nucleic acid sequencing
EP4159853A1 (en) Genome editing system and method
US11339427B2 (en) Method for target specific RNA transcription of DNA sequences
JP2023513606A (en) Methods and Materials for Assessing Nucleic Acids
JP7096812B2 (en) Nucleic Acid Sequence Determination How to remove the adapter dimer from the preparation
EP3812472B1 (en) A truly unbiased in vitro assay to profile off-target activity of one or more target-specific programmable nucleases in cells (abnoba-seq)
US20210388427A1 (en) Liquid sample workflow for nanopore sequencing
WO2023060539A1 (en) Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation
AU2017217868B2 (en) Method for target specific RNA transcription of DNA sequence
WO2024119461A1 (en) Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation
CN112301103B (en) Method and kit for non-specifically amplifying natural short-fragment nucleic acid
US20240182951A1 (en) Methods for targeted nucleic acid sequencing
CN110387362B (en) High-temperature-resistant restriction endonuclease capable of recognizing and cutting AGCT (accelerated glucose detection computed tomography) site
Granner et al. Molecular Genetics, Recombinant DNA, & Genomic Technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant