CN115197998A - Method for constructing single-stranded nucleic acid molecule sequencing library - Google Patents

Method for constructing single-stranded nucleic acid molecule sequencing library Download PDF

Info

Publication number
CN115197998A
CN115197998A CN202210377099.1A CN202210377099A CN115197998A CN 115197998 A CN115197998 A CN 115197998A CN 202210377099 A CN202210377099 A CN 202210377099A CN 115197998 A CN115197998 A CN 115197998A
Authority
CN
China
Prior art keywords
loop
region
stranded
stem
sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210377099.1A
Other languages
Chinese (zh)
Inventor
关媛妹
李洪洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Fapon Biotech Co Ltd
Original Assignee
Guangdong Fapon Biotech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Fapon Biotech Co Ltd filed Critical Guangdong Fapon Biotech Co Ltd
Publication of CN115197998A publication Critical patent/CN115197998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Analytical Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a method for constructing a single-stranded nucleic acid molecule sequencing library. The invention effectively realizes the construction of the sequencing library based on the single-stranded nucleic acid molecules, has simple and quick process, avoids complicated purification procedures, can effectively reduce the generation of the joint dimer by using the stem-loop joint, has higher yield, low cost and rich contained information, can effectively obtain more comprehensive sequencing information, and provides more favorable support for subsequent analysis.

Description

Method for constructing single-stranded nucleic acid molecule sequencing library
Technical Field
The invention relates to the field of biotechnology, specifically to the field of nucleic acid sequencing, and more specifically to a single-stranded nucleic acid molecule sequencing library construction method, a sequencing library, a sequencing method, a kit and a free nucleic acid sequencing method.
Background
Compared with a first-generation sequencing Sanger method, the high-throughput sequencing technology has obvious advantages in processing large-scale samples and is a core technology in the current genomics research. Before high-throughput sequencing is performed, it is necessary to complete the preparation of a sequencing library, which is a library of nucleic acids with linker sequences on at least one end of a nucleic acid to be tested having an unknown sequence. The sequencing platform provides various library construction methods aiming at double-stranded nucleic acid molecules, but the double-stranded nucleic acid molecule library construction process is not suitable for single-stranded nucleic acid molecule library construction, so the invention provides a novel sequencing library construction method aiming at single-stranded nucleic acid molecules.
Disclosure of Invention
The inventors have found that the DNA molecule subject to sequencing may contain not only double-stranded DNA but also single-stranded DNA molecules. For example, in the field of detecting circulating tumor DNA (ctDNA), ctDNA in plasma contains not only damaged double-stranded DNA but also a number of single-stranded DNAs; also for example, paraffin embedded (FFPE) samples, also contain large amounts of single stranded fragmented DNA; in addition, in the methylation library construction process, double-stranded DNA is degraded and denatured after bisulfite conversion to generate single-stranded DNA and the like. Single-stranded nucleic acid molecules cannot be sequenced using existing methods for sequencing double-stranded nucleic acid molecule libraries.
The present invention is directed to solving, at least in part, one of the technical problems in the related art. To this end, the present invention proposes a method for constructing a sequencing library for single-stranded nucleic acid molecules by using a stem-loop linker.
In a first aspect of the invention, the invention provides a method of single-stranded nucleic acid molecule sequencing library construction, which in some embodiments comprises the steps of: (1) Ligating both ends of the single-stranded nucleic acid molecule to a first stem-loop linker and a second stem-loop linker, respectively, to obtain ligation products; and (2) subjecting the ligation products of step (1) to an amplification treatment to obtain amplification products carrying sequencing primer regions, the amplification products constituting the sequencing library, wherein the first stem-loop linker has: a first single-stranded degenerate sequence region located 5' to said first stem-loop linker; a first double-stranded region attached to the side of the first single-stranded degenerate sequence region distal to the 5' end of the first stem-loop linker; and a first single-stranded loop region, both ends of the first single-stranded loop region being connected to both ends of the first double-stranded region on a side remote from the first single-stranded degenerate sequence region, respectively, the second stem-loop linker having: a second single-stranded degenerate sequence region located 3' to said second stem-loop linker; a second double-stranded region linked to the side of the second single-stranded degenerate sequence region distal to the 3' end of the second stem-loop linker; and a second single-stranded loop region, wherein two ends of the second single-stranded loop region are respectively connected with two ends of the second double-stranded region far away from one side of the second single-stranded degenerate sequence region.
In some embodiments, since the first stem-loop linker and the second stem-loop linker respectively have a single-stranded degenerate sequence region, a double-stranded structure can be formed at both ends of the single-stranded nucleic acid molecule, and further, a common double-stranded DNA ligase, such as T4DNA ligase, can be used to achieve the connection of the stem-loop linkers at both sides of the single-stranded nucleic acid molecule, thereby effectively reducing the cost of the ligase and improving the efficiency of connecting the stem-loop linkers. Wherein, by adopting the stem-loop structure, the interconnection or self-connection between the stem-loop joints can be reduced. In addition, after obtaining the ligation product, other nucleic acid sequences, such as sequencing primer regions, can be introduced at both ends of the ligation product by amplification treatment, thereby obtaining a sequencing library that can be used for sequencing single-stranded nucleic acid molecules, and efficiently realizing the construction of a sequencing library based on single-stranded nucleic acid molecules.
In a second aspect of the invention, the invention proposes a sequencing library obtained by the method described above.
In a third aspect of the invention, a sequencing method is provided. In some embodiments, the method comprises: constructing a sequencing library according to the method described above; and sequencing the sequencing library.
In a fourth aspect of the invention, the invention provides a kit for single stranded nucleic acid molecule sequencing library construction, which in some embodiments comprises: a first stem-loop joint and a second stem-loop joint, wherein the first stem-loop joint has: a first single-stranded degenerate sequence region located 5' to said first stem-loop linker; a first double-stranded region linked to the side of the first single-stranded degenerate sequence region distal to the 5' end of the first stem-loop linker; and a first single-stranded loop region, both ends of the first single-stranded loop region being respectively linked to both ends of the first double-stranded region on a side away from the first single-stranded degenerate sequence region, the second stem-loop linker having: a second single-stranded degenerate sequence region located 3' to said second stem-loop linker; a second double-stranded region attached to the side of the second single-stranded degenerate sequence region distal to the 3' end of the second stem-loop linker; and a second single-stranded loop region, wherein two ends of the second single-stranded loop region are respectively connected with two ends of the second double-stranded region far away from one side of the second single-stranded degenerate sequence region.
In a fifth aspect of the invention, a method of sequencing an isolated nucleic acid is provided. In some embodiments, the method comprises: obtaining free nucleic acid; subjecting the free nucleic acids to a denaturation treatment so as to obtain single-stranded nucleic acid molecules; constructing a sequencing library based on said single stranded nucleic acid molecule according to the method described above; sequencing the sequencing library to obtain sequencing data.
Drawings
FIG. 1 shows a schematic flow chart of a method for constructing a sequencing library of single-stranded nucleic acid molecules according to an embodiment of the present invention.
Fig. 2 shows a schematic structural view of a stem-loop joint according to an embodiment of the present invention.
FIG. 3 is a schematic diagram showing the comparison and analysis of the library construction results according to the embodiment of the present invention, wherein the single-stranded nucleic acids in examples 1, 10-12 are library construction 1 and 2 in two parallel samples; the conventional double-stranded nucleic acid libraries 1 and 2 in examples 1, 10-12 were two parallel samples; examples 2 to 9 were each two replicates of 1 and 2; "+" indicates <3%, "+" indicates <10%, "+ + +" indicates < 20%, with varying dimer content.
FIG. 4 shows a schematic diagram of the relative fluorescence units versus assay time according to an embodiment of the present invention, wherein "a" refers to example 1-single stranded nucleic acid library 1; "b" refers to example 9-1; "c" refers to example 1-conventional double stranded nucleic acid library 1.
FIG. 5 shows a schematic diagram of the analysis of relative fluorescence units versus analysis time according to an embodiment of the present invention, wherein "d" refers to example 10-single stranded nucleic acid library 1; "e" refers to example 10-conventional double stranded nucleic acid Bank 1.
FIG. 6 is a schematic diagram showing alignment analysis of sequencing results according to an embodiment of the present invention, wherein the single-stranded nucleic acids of examples 1, 10-12 are pooled in two parallel samples 1 and 2; the conventional double-stranded nucleic acid libraries 1 and 2 in examples 1, 10-12 were two parallel samples; example 9, 1 and 2 are both parallel.
Detailed Description
The following describes in detail embodiments of the present invention. The following examples are illustrative only and are not to be construed as limiting the invention. The examples, where specific techniques or conditions are not indicated, are to be construed according to the techniques or conditions described in the literature in the art or according to the product specifications. The reagents or instruments used are conventional products which are commercially available, and are not indicated by manufacturers.
In a first aspect of the invention, the invention provides a method of single-stranded nucleic acid molecule sequencing library construction, which in some embodiments, with reference to figure 1, comprises the steps of:
(1) Ligating both ends of the single-stranded nucleic acid molecule to a first stem-loop linker and a second stem-loop linker, respectively, to obtain ligation products; and
(2) Subjecting the ligation products of step (1) to an amplification treatment to obtain amplification products carrying a sequencing primer region, the amplification products constituting the sequencing library.
In some embodiments, "separately" in step (1) means that both ends of the single-stranded nucleic acid molecule are simultaneously linked to the first and second stem-loop linkers, respectively, or that both ends of the single-stranded nucleic acid molecule are linked to one of the first and second stem-loop linkers before being linked to the other.
In some embodiments, due to the reverse complementary region of the sequence itself, the single-stranded primer can anneal to the stem-loop linker structure with high efficiency, and thus, the probability of interconnection between the linkers can be reduced, thereby increasing the efficiency of the ligation reaction.
The term "linker" as used herein refers to a nucleic acid sequence attached to at least one end of a nucleic acid molecule to be sequenced (sometimes also referred to as an "insert"), typically a nucleic acid sequence that carries a sequence that facilitates subsequent manipulation, such as a sequence that can be used for amplification, a sequence that can be used for identity recognition, or a sequence that can be used in a sequencing reaction, and the like. As used herein, a "linker" is a linker having a stem-loop structure. "stem-loop structure" refers to an intramolecular base pairing pattern, and referring to FIG. 2, a stem-loop structure generally includes a complementary double-stranded region (stem region) and a single-stranded loop (loop region), and may also have a single-stranded overhang at the end of the stem region, and in some embodiments, the double strands of the stem region need not be completely complementary.
Referring to fig. 2, in some embodiments, the first and second stem-loop junctions have the following structure. It is to be noted that, as will be understood by those skilled in the art, since the first stem-loop linker and the second stem-loop linker are respectively linked to both ends (5 '-end and 3' -end) of a single-stranded nucleic acid (in some embodiments, a single-stranded DNA, also referred to as "ssDNA"), the 3 '-end of the first stem-loop linker is a 3' -hydroxyl end and the 5 '-end of the second stem-loop linker is a 5' -phosphate end for the first stem-loop linker and the second stem-loop linker. Wherein, the adding mode of the phosphate group can be as follows: the DNA is synthesized directly through a nucleic acid sequence, or a polynucleotide kinase (PNK) is added into a joint connection system for treatment. Thus, the first stem-loop linker is also referred to herein as a 5 '-end linker (5' loop) and the second stem-loop linker is also referred to herein as a 3 '-end linker (3' loop). For convenience, the "first" and "second" are not distinguished in fig. 2.
Specifically, referring to fig. 2, the first and second stem-loop linkers each have a single-stranded degenerate sequence region, a double-stranded region, and a single-stranded loop region. Wherein the single-stranded sequence of the single-stranded degenerate sequence region is randomly generated, for example, the first single-stranded degenerate sequence region and the second single-stranded degenerate sequence region respectively and independently comprise continuous N of 3-10 bp, such as 3bp, 4bp, 5bp, 6bp, 7bp, 8bp, 9bp, 10bp. Alternatively, a contiguous N of 5 to 8bp is included. Alternatively, 5 to 6bp of N are included consecutively. N may be any of A, T, G or C. The single-stranded degenerate sequence region sequences of each stem-loop linker may be the same or different. Thus, since the single-stranded degenerate sequence region encompasses a large number of possible sequences, it is possible to achieve with high probability, for a single-stranded nucleic acid molecule of unknown sequence, that at least one linker can match the sequence of an end of the single-stranded nucleic acid molecule, such that a double-stranded region with a gap between the end of the single-stranded nucleic acid molecule and the end of the linker double-stranded region is formed between the stem-loop linker and the single-stranded nucleic acid molecule. Thus, ligation of the stem-loop adaptor to the single-stranded nucleic acid molecule can be achieved using a commonly used double-stranded DNA ligase (e.g., T4DNA ligase). This can significantly reduce the cost of the ligation reaction and improve the efficiency of the ligation reaction. Previously, there have been some developments of the use of a single-stranded template linked to a single-stranded linker to solve the problem of single-stranded DNA library construction, but the efficiency of ligation of the Circligase is low and the cost of the enzyme is high. By CircLigase TM II ssDNA Ligase, 5000UCircLigase TM II ssDNA Ligase is several hundred times more expensive than 5000UT4 DNA Ligase. The present invention successfully solves this problem.
In some embodiments, in the step of ligating the 5 'end and the 3' end of the single-stranded nucleic acid molecule to the stem-loop linker, a T4DNA ligase is used. The concentration of T4DNA ligase was 400U/. Mu.L-800U/. Mu.L.
Specifically, the first stem-loop joint has: a first single-stranded degenerate sequence region located 5' to the first stem-loop linker; a first double-stranded region attached to the side of the first single-stranded degenerate sequence region distal to the 5' end of the first stem-loop linker; and a first single-stranded loop region, both ends of the first single-stranded loop region being linked to both ends of the first double-stranded region on a side remote from the first single-stranded degenerate sequence region, respectively, the second stem-loop linker having: a second single-stranded degenerate sequence region located at the 3' end of the second stem-loop linker; a second double-stranded region joined to the second single-stranded degenerate sequence region on a side distal to the 3' end of the second stem-loop linker; and a second single-stranded loop region, wherein two ends of the second single-stranded loop region are respectively connected with two ends of the second double-stranded region at the side far away from the second single-stranded degenerate sequence region.
In some embodiments, the sequences of the first duplex region and the second duplex region may be the same or different. In some embodiments, the first duplex region is not identical in sequence to the second duplex region. Therefore, a dimer structure formed between the first stem-loop connector and the second stem-loop connector can be effectively reduced, the efficiency of constructing a sequencing library is further improved, and the cost of constructing the sequencing library is reduced. And on the premise of less formation of dimer structures between the linkers, single-stranded nucleic acids can be connected and subjected to library building in a reaction system.
In some embodiments, the first duplex region and the second duplex region may be the same or different in length. In some embodiments, the first duplex region is the same length as the second duplex region. Therefore, under the same design of the stem-loop joint, the efficiency of constructing a sequencing library is further improved, and the cost is reduced. And on the premise of less formation of dimer structures between the linkers, single-stranded nucleic acids can be connected and subjected to library building in a reaction system.
In some embodiments, the first duplex region and the second duplex region may or may not be fully complementary. In some embodiments, the first duplex region is not fully complementary to the second duplex region. In some embodiments, there are at least 2 mismatched base sites between the sequences of the first and second duplex regions. Alternatively, there are at least 3, 4, 5, 6, 7, 8, 9 or 10 mismatched base sites.
In some embodiments, there are no consecutive 3 or more matched base sequences at the 3' end of the first duplex region to the second duplex region sequence. The absence of a consecutive 3 or more matched base sequences at the 3' end of the first duplex region and the second duplex region sequence means that: 3 bases are counted from the first base at the 3 '-end of the first duplex region toward the 5' -end of the first duplex region, and the 3 bases do not completely match the two nucleotide sequences constituting the second duplex region.
In some embodiments, there are no consecutive more than 3 matching base sequences present at the 3' end of the first duplex region and the second stem-loop linker sequence. The absence of consecutive more than 3 matched base sequences at the 3' end of the first duplex region and the second stem-loop adaptor sequence means that: 3 bases are counted from the first base at the 3 'end of the first duplex region in the direction of the 5' end of the first duplex region, said 3 bases not completely matching said second stem-loop adapter sequence.
In some embodiments, the first stem-loop junction and the second stem-loop junction are annealed from the sequences of the first stem-loop junction and the second stem-loop junction. Specifically, after determining the sequence of the stem-loop junction, the corresponding stem-loop structure can be obtained by performing a conventional annealing treatment (e.g., physical annealing such as temperature-reducing annealing), and in some embodiments, the temperature-reducing annealing condition includes a temperature reduction from 95 ℃ to 25 ℃, for example, the annealing condition may be: 95 ℃ for 5min;95 ℃ → 25 ℃ (-0.1 ℃/s); hold at 4 ℃. The sequences of the first duplex region and the second duplex region are configured such that no detectable dimer is formed between the base sequence of the first duplex region and the base sequence of the second duplex region under reduced temperature annealing conditions. The term "detectable" as used herein means that the dimer formed does not have a significant visible effect on the sequencing results, e.g., the dimer is present in less than 20%, preferably less than 10%, or even less than 5% of the sequencing library.
In some embodiments, the first and second duplex regions may be 13 to 20bp in length, e.g., 13bp, 14bp, 15bp, 16bp, 17bp, 18bp, 19bp, or 20bp. Alternatively, it may be 15 to 18bp. Thus, the corresponding ligation reaction can be efficiently accomplished.
In some embodiments, the lengths of the first single-stranded loop region and the second single-stranded loop region are each independently 7 to 30bp, e.g., 7bp, 8bp, 9bp, 10bp, 11bp, 12bp, 13bp, 14bp, 15bp, 16bp, 17bp, 18bp, 19bp, 20bp, 21bp, 22bp, 23bp, 24bp, 25bp, 26bp, 27bp, 28bp, 29bp, 30bp. Alternatively, each independently is 7 to 28bp. Alternatively, each independently is 10 to 28bp. Alternatively, each independently is 10 to 21bp. Alternatively, each independently is 10 to 17bp. Alternatively, each independently is 10 to 15bp.
In addition, as will be understood by those skilled in the art, the 5' -end of the single-stranded nucleic acid molecule may be phosphorylated in order to improve the efficiency of ligation between the single-stranded nucleic acid molecule and the linker. In some embodiments, a reagent suitable for 5' end phosphorylation of the single-stranded nucleic acid molecule is added to the linked reaction system. Due to the structural features between the first stem-loop linker and the second stem-loop linker of the present invention, the phosphorylation and ligation reactions can be performed in the same reaction system without affecting the reaction efficiency, eliminating the separate steps of phosphorylation and purification, saving the processing cost, and especially avoiding the loss of nucleic acid samples, which is crucial for analysis with a small sample size, such as free nucleic acids, e.g., tumor free nucleic acids or fetal free nucleic acids. In some embodiments, the phosphorylation treatment comprises a PNK treatment. In some embodiments, the PNK concentration is between 50U/. Mu.l and 250U/. Mu.l.
In some embodiments, the 5 'end of the first stem-loop linker and the 3' end of the second stem-loop linker may or may not be blocked. In some embodiments, the 5 'end of the first stem-loop linker and the 3' end of the second stem-loop linker are blocked.
In some embodiments, the 5' end of the second stem-loop linker may or may not be blocked. In some embodiments, the 5' end of the second stem-loop linker is not blocked.
In some embodiments, the 5' end of the second stem-loop linker may or may not be phosphorylated. In some embodiments, the 5' end of the second stem-loop linker is phosphorylated. In some embodiments, the 3' end of the second stem-loop linker may be blocked with any one of amino modification, phosphorylation modification, reverse dT modification, inter-wall modification, or dideoxy cytidylic acid modification to reduce the problem of inefficient ligation due to self-ligation of the linker. Thus, in some embodiments, by using a stem-loop linker with complementary pairing, single strands can be efficiently annealed into a stem-loop structure, altering competition between linker DNA dissociation and substrate template caused by insufficient annealing of the original upstream and downstream double-stranded DNA, and by base-capping modification, the production of by-products such as linker dimers can be further effectively reduced.
In some embodiments, the type of single-stranded nucleic acid molecule to which the technical solution of the present invention is applicable is not particularly limited, and may be a DNA molecule. In some embodiments, the single-stranded nucleic acid molecule can be from heat-denatured DNA, cleaved DNA, recycled DNA, paraffin-embedded (FFPE) sample DNA, or bisulfite-treated DNA. Wherein the method of interrupting is not subject to any limitation.
These samples, usually contain double-stranded DNA and single-stranded DNA, especially single-stranded DNA produced by denaturation or degradation during storage. Generally, when the samples are subjected to library construction, only double-stranded DNA contained in the samples is considered, and the conventional library construction method cannot realize the construction of a sequencing library aiming at the single-stranded DNA molecules. Therefore, important nucleic acid information is lost during the construction of the sequencing library, and the results of the subsequent bioinformatic analysis are also distorted. By using the method of the invention, the corresponding single-stranded DNA molecules can be recovered and the corresponding nucleic acid information obtained.
In some embodiments, before performing step (1), the method comprises: heat denaturation to obtain the single-stranded nucleic acid molecule. Therefore, all the nucleic acid information can be analyzed by converting all the corresponding DNA into single-stranded DNA, and the accuracy and the authenticity of the data analysis result are improved.
In some embodiments, the amplification treatment is performed using a first primer that recognizes at least a portion of at least one of the first double-stranded region and the first single-stranded loop region and a second primer that recognizes at least a portion of at least one of the second double-stranded region and the second single-stranded loop region. In some embodiments, at least one of the first primer and the second primer carries a sequencing primer sequence. The sequencing primer refers to a primer added during the machine sequencing of a sequencing library, for example, a nucleotide for extending and adding a label is used for realizing the sequencing.
As will be appreciated by those skilled in the art, after obtaining the ligation products, the ligation products may be subjected to a purification process, for example using magnetic Beads, such as Fapon DNA Clean Beads (NMK 012) or AMPure XP, to improve the efficiency of the subsequent amplification reaction. In some embodiments, the amplification enzyme may be hotspot hitaq DNA Polymerase (MD 026), KAPA HiFi hotspot Uracil + (KK 2800), or the like, which will not stop the reaction when encountering U base (e.g., FFPE sample, or BS-treated sample, U is more common) during the PCR reaction, thus ensuring the normal progress of the reaction.
Of course, in some embodiments, due to the structural characteristics of the stem-loop linker of the present invention, the ligation reaction has fewer byproducts, so that the amplification reaction can be performed directly without purification, and especially in the case of a small amount of nucleic acid sample such as circulating DNA, waste of the nucleic acid sample can be avoided.
Of course, it will be understood by those skilled in the art that all components required for sequencing, such as sequencing primer regions (for initiating sequencing extension reactions), sequencing tags (for distinguishing nucleic acid samples), and/or UMI regions (specific molecular tags), etc., can be introduced on both sides of a single-stranded nucleic acid molecule (or called "insert") by one amplification, multiple amplifications, or multiplex amplifications by adjusting the primer sequences. In some embodiments, since the double-stranded region is 13 to 20bp in length, the matching length of the primer to the double-stranded region can be set to be 13 to 20bp, so that the introduction of all the components can be completed by one PCR. Of course, in some embodiments, the corresponding double-stranded region can be designed with reference to the sequencing adapter amplification primers of each sequencing platform to increase amplification efficiency.
Thus, in some embodiments, the invention provides stem-loop linkers and methods of single-stranded nucleic acid molecule sequencing library construction. The single-stranded DNA of the stem-loop joint is adopted to build a library, after sample DNA is prepared, phosphorylation and connection of the 5 'end and the 3' end of the sample DNA and the stem-loop joint can be completed by one-step reaction, products with joint sequences at two ends are directly obtained, and then PCR amplification is carried out, so that a sequencing library can be obtained. Therefore, the method for constructing the single-stranded nucleic acid molecule sequencing library provided by the invention is simple, rapid and efficient in process, avoids complicated purification procedures, and can effectively reduce the generation of connector dimers by using the stem-loop connector to obtain a high-quality sequencing library meeting the sequencing requirements, particularly the second-generation sequencing requirements.
In some embodiments, the first stem-loop linker comprises SEQ ID No. 1, SEQ ID No. 5, SEQ ID No. 7, SEQ ID No. 9, SEQ ID No. 11, SEQ ID NO: 13. any one of SEQ ID NO. 15, SEQ ID NO. 17 or SEQ ID NO. 21, the second stem-loop junction comprises any one of SEQ ID NO. 2, SEQ ID NO. 6, SEQ ID NO. 8, SEQ ID NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 16, SEQ ID NO. 18 or SEQ ID NO. 22 which self contains complementary pairing base, the first primer sequence is SEQ ID NO. 3, SEQ ID NO. 19 or SEQ ID NO. 23, the second primer sequence is SEQ ID NO. 4, SEQ ID NO. 20 or SEQ ID NO. 24.
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively and independently 6bp, the lengths of the first double-chain region and the second double-chain region are respectively and independently 13bp; when the lengths of the first single-stranded loop region and the second single-stranded loop region are respectively and independently 17bp, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are shown as SEQ ID NO. 1-SEQ ID NO. 4 in sequence.
5'Loop(SEQ ID NO:1):
5’-NNNNNNGATCGTCGGACTGACACGTTCAGAGTTCTACAGTCCGACGATC-3’
3'Loop(SEQ ID NO:2):
5’PO 4 -AGATCGGAAGAGCACACGTCTGAACTCCAGGCTCTTCCGATCTNNNNNN-3’NH 2 Sealing of
First primer (SEQ ID NO: 3):
5’-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATC-3’
second primer (SEQ ID NO: 4):
5’-CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’
when the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively and independently 6bp, the lengths of the first double-chain region and the second double-chain region are respectively and independently 13bp, and the lengths of the first single-chain loop region and the second single-chain loop region are respectively and independently 12bp, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are sequentially shown as SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 3 and SEQ ID NO 4.
5'Loop(SEQ ID NO:5):
5’-NNNNNNGATCGTCGGACTGTTCAGAGTTCTACAGTCCGACGATC-3’
3'Loop(SEQ ID NO:6):
5’PO 4 -AGATCGGAAGAGCACACGTCTGAACGCTCTTCCGATCTNNNNNN-3’NH 2 Sealing of
A first primer: SEQ ID NO 3
A second primer: SEQ ID NO. 4
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively 6bp independently, the lengths of the first double-chain region and the second double-chain region are respectively 13bp independently, and the lengths of the first single-chain loop region and the second single-chain loop region are respectively 7bp independently, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are shown as SEQ ID NO. 7, SEQ ID NO. 8, SEQ ID NO. 3 and SEQ ID NO. 4 in sequence.
5'Loop(SEQ ID NO:7):
5’-NNNNNNGATCGTCGGACTGAGTTCTACAGTCCGACGATC-3’
3'Loop(SEQ ID NO:8):
5’PO 4 -AGATCGGAAGAGCACACGTCGCTCTTCCGATCTNNNNNN-3’NH 2 Sealing of
A first primer: SEQ ID NO. 3
A second primer: SEQ ID NO. 4
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively 6bp independently, the lengths of the first double-chain region and the second double-chain region are respectively 20bp independently, and the lengths of the first single-chain loop region and the second single-chain loop region are respectively 10bp independently, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are shown as SEQ ID NO 9, SEQ ID NO 10, SEQ ID NO 3 and SEQ ID NO 4 in sequence.
5'Loop(SEQ ID NO:9):
5’-NNNNNNGATCGTCGGACTGTAGAACTACACGTTCAGAGTTCTACAGTCCGACGATC-3’
3'Loop(SEQ ID NO:10):
5’PO 4 -AGATCGGAAGAGCACACGTCTGAACTCCAGGACGTGTGCTCTTCCGATCTNNNNNN-3’NH 2 Sealing of
A first primer: SEQ ID NO. 3
A second primer: SEQ ID NO 4
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively and independently 3bp, the lengths of the first double-chain region and the second double-chain region are respectively and independently 13bp, and the lengths of the first single-chain loop region and the second single-chain loop region are respectively and independently 17bp, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are sequentially shown as SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 3 and SEQ ID NO. 4.
5'Loop(SEQ ID NO:11):
5’-NNNGATCGTCGGACTGACACGTTCAGAGTTCTACAGTCCGACGATC-3’
3'Loop(SEQ ID NO:12):
5’PO 4 -AGATCGGAAGAGCACACGTCTGAACTCCAGGCTCTTCCGATCTNNN-3’NH 2 Sealing of
First primer SEQ ID NO 3
A second primer: SEQ ID NO 4
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively 10bp independently, the lengths of the first double-chain region and the second double-chain region are respectively 13bp independently, and the lengths of the first single-chain loop region and the second single-chain loop region are respectively 17bp independently, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are shown as SEQ ID NO. 13, SEQ ID NO. 14, SEQ ID NO. 3 and SEQ ID NO. 4 in sequence.
5'Loop(SEQ ID NO:13):
5’-NNNNNNNNNNGATCGTCGGACTGACACGTTCAGAGTTCTACAGTCCGACGAT C-3’
3'Loop(SEQ ID NO:14):
5’PO 4 -AGATCGGAAGAGCACACGTCTGAACTCCAGGCTCTTCCGATCTNNNNNNN NNN-3’NH 2 Sealing of
First primer SEQ ID NO. 3
A second primer: SEQ ID NO. 4
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively and independently 6bp, the length of the first double-chain region is 15bp, the length of the second double-chain region is 18bp, the length of the first single-chain loop region is 15bp, and the length of the second single-chain loop region is 12bp, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are sequentially shown as SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO 3 and SEQ ID NO 4.
5'Loop(SEQ ID NO:15):
5’-NNNNNNGATCGTCGGACTGTAACACGTTCAGAGTTCTACAGTCCGACGATC-3’
3'Loop(SEQ ID NO:16):
5’PO 4 -AGATCGGAAGAGCACACGTCTGAACTCCAGCGTGTGCTCTTCCGATCTNNNNNN-3’NH 2 Sealing of
A first primer: SEQ ID NO 3
A second primer: SEQ ID NO 4
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively and independently 6bp, the lengths of the first double-chain region and the second double-chain region are respectively and independently 13bp, the length of the first single-chain loop region is 28bp, and the length of the second single-chain loop region is 30bp, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are sequentially shown as SEQ ID NO. 17-SEQ ID NO. 20.
5'Loop(SEQ ID NO:17):
5’-NNNNNNATCACCGACTGCCCCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT-3’
3'Loop(SEQ ID NO:18):
5’PO 4 -ATCXXXXXXXXXXCTGAGTCGGAGACACGCAGGGATGAGATGGXXXXXXXXXXGATNNNNNN-3’NH 2 Sealing of
First primer (SEQ ID NO: 19): 5' CCACTACGCCTCCGCTTTCCTCTATG-3
Second primer (SEQ ID NO: 20): 5' CCATCTCATCCTGCGTTC-3
When the lengths of the first single-chain degenerate sequence region and the second single-chain degenerate sequence region are respectively 6bp independently, the lengths of the first double-chain region and the second double-chain region are respectively 13bp independently, the length of the first single-chain loop region is 20bp, and the length of the second single-chain loop region is 21bp, the sequences of the first stem-loop joint, the second stem-loop joint, the first primer and the second primer are shown as SEQ ID NO: 21-SEQ ID NO:24 in sequence.
5'Loop(SEQ ID NO:21):
5’-NNNNNNAGATCGGAAGAGCACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’
3'Loop(SEQ ID NO:22):
5’PO 4 -AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCTCTTCCGATCTNNNNNN-3’NH 2 Sealing of
First primer (SEQ ID NO: 23):
5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’
second primer (SEQ ID NO: 24):
5’-CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’
wherein N = A, T, G or C,5'Loop represents a first stem-loop linker, 3' Loop represents a second stem-loop linker, "XXXXXXX" is a primer index portion, "XXXXXXXXXXX" is 3'Loop containing 10 barcode sequences, 3' NH 2 Blocking means that the 3' amino terminus is blocked。
In a second aspect of the invention, the invention proposes a sequencing library obtained by the method of any one of the above. As mentioned above, the sequencing library obtained by the method can meet the requirements of the sequencing platform, such as Illumina SelextaGenomeAnalyzer, ion Torrent, applied biosystems SOLID system, etc. In addition, the sequencing library is low in cost and rich in contained information, and more real sequencing data of the nucleic acid sample can be obtained.
In a third aspect of the invention, a sequencing method is provided. In some embodiments, the method comprises: constructing a sequencing library according to the method of any one of the above; and sequencing the sequencing library.
In a fourth aspect of the invention, the invention provides a kit for single stranded nucleic acid molecule sequencing library construction, which in some embodiments comprises: the first and second stem-loop connectors described above.
In some embodiments, the kit may further comprise at least one of: a ligase; and the first primer and the second primer described above.
In some embodiments, since the first stem-loop linker and the second stem-loop linker respectively have a single-stranded degenerate sequence region, a double-stranded structure can be formed at both ends of the single-stranded nucleic acid molecule, and further, the respective connection of the stem-loop linkers at both sides of the single-stranded nucleic acid molecule can be achieved by using a common DNA ligase, such as T4DNA ligase, which effectively reduces the cost of the connection linker and improves the efficiency of the connection of the stem-loop linker. Wherein, by adopting a stem-loop structure, the interconnection or self-connection between stem-loop joints can be reduced. In addition, after obtaining the ligation product, other nucleic acid sequences, such as sequencing primer regions, can be introduced at both ends of the ligation product by amplification treatment, thereby obtaining a sequencing library that can be used for nucleic acid sequencing, and efficiently realizing the construction of a sequencing library based on single-stranded nucleic acid molecules.
In a fifth aspect of the invention, a method of sequencing an isolated nucleic acid is provided. In some embodiments, the method comprises: obtaining free nucleic acid; denaturing the free nucleic acids to obtain single-stranded nucleic acid molecules; constructing a sequencing library based on said single-stranded nucleic acid molecule according to the method of any of the above; sequencing the sequencing library to obtain sequencing data. Therefore, more real free nucleic acid sequencing data can be effectively obtained, and more favorable support can be provided for subsequent analysis.
The term "denaturation" as used herein refers to the process of hydrogen bond cleavage between double strands of DNA into single strands or local hydrogen bond cleavage of RNA into linear single-stranded structure, also known as melting. The denaturation mode is not strictly limited, and all modes can damage factors (such as hydrogen bonds and base accumulation force) which are beneficial to maintaining the double-helix conformation of the DNA and enhance factors (such as electrostatic repulsion of phosphate groups and internal energy of base molecules) which are not beneficial to maintaining the double-helix conformation of the DNA, such as heating, extreme pH, low ionic strength, organic reagents of methanol, ethanol, urea, formamide and the like, and can be flexibly selected according to actual needs.
It should be noted that the advantages and features described in connection with the different aspects of the invention are equally applicable in other aspects and will not be described in further detail here.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Example 1 cfDNA sequencing library construction
This example 1 employed a first stem-loop linker (SEQ ID NO: 1), a second stem-loop linker (SEQ ID NO: 2), a first primer (SEQ ID NO: 3) and a second primer (SEQ ID NO: 4).
This example provides a method of constructing a sequencing library, comprising the steps of:
1. extraction of cfDNA: the cfDNA free in plasma was extracted using a serum/plasma free DNA extraction kit (magnetic bead method) according to the instruction for use of the kit.
2. Constructing a cfDNA sequencing library:
1) Annealing SEQ ID NO. 1 and SEQ ID NO. 2 to obtain a first stem-loop joint and a second stem-loop joint respectively.
2) cfDNA heat denaturation treatment
10ng cfDNA is heated to 95 ℃ for 5min (hot cover at 105 ℃), and then immediately placed in an ice box for cooling.
3) 5' end phosphorylation reaction and linker reaction
Volume of
Denatured cfDNA 10ng
10 μ M first stem-loop junction 3μl
10 μ M second stem-loop junction 3μl
T4PNK 1μl
T4DNA ligase 3μl
10xT4 DNA ligation buffer 3μl
ddH 2 O Make up to 30. Mu.l
Total of 30μl
The reaction conditions are as follows: 25 deg.C, 10min,37 deg.C, 20min.
4) Ligation product purification
Magnetic bead purification was performed by adding 1.8XFapon DNA Clean Beads (NMK 012) to the ligation product, resulting in 20. Mu.l of the eluted product.
5) PCR amplification
The purified product was prepared according to the following reaction system:
Figure BDA0003590705180000131
Figure BDA0003590705180000141
reaction conditions are as follows: 95 ℃ for 10min; (94 ℃ for 30s,65 ℃ for 30s,72 ℃ for 30 s) 10cycles,72 ℃ for 10min; hold at 4 ℃.
6) PCR product purification
The ligation products were purified by magnetic bead purification by adding 1.0XFapon DNA Clean Beads (NMK 012) to give 20. Mu.l of eluted library products (i.e., sequencing library).
This example uses the Illumina platform for sequencing.
Example 2cfDNA sequencing library construction
This example 2 employed a first stem-loop linker (SEQ ID NO: 5), a second stem-loop linker (SEQ ID NO: 6), a first primer (SEQ ID NO: 3) and a second primer (SEQ ID NO: 4).
In this example 2, the Illumina platform is used for sequencing, and the database establishment method is basically the same as that in example 1.
Example 3cfDNA sequencing library construction
This example 3 employs a first stem-loop linker (SEQ ID NO: 7), a second stem-loop linker (SEQ ID NO: 8), a first primer (SEQ ID NO: 3), and a second primer (SEQ ID NO: 4).
In this example 3, the Illumina platform is applied for sequencing, and the database construction method is basically the same as that in example 1.
Example 4cfDNA sequencing library construction
This example 4 employed a first stem-loop linker (SEQ ID NO: 9), a second stem-loop linker (SEQ ID NO: 10), a first primer (SEQ ID NO: 3) and a second primer (SEQ ID NO: 4).
This example 4 is applied to the Illumina platform for sequencing, and the database construction method is basically the same as that of example 1.
Example 5cfDNA sequencing library construction
This example 5 employed a first stem-loop linker (SEQ ID NO: 11), a second stem-loop linker (SEQ ID NO: 12), a first primer (SEQ ID NO: 3), and a second primer (SEQ ID NO: 4).
This example 5 is applied to the Illumina platform for sequencing, and the database construction method is basically the same as that of example 1.
Example 6cfDNA sequencing library construction
This example 6 employed a first stem-loop linker (SEQ ID NO: 13), a second stem-loop linker (SEQ ID NO: 14), a first primer (SEQ ID NO: 3) and a second primer (SEQ ID NO: 4).
This example 6 was performed using the Illumina platform, and the library construction method was substantially the same as in example 1.
Example 7cfDNA sequencing library construction
This example 7 employed a first stem-loop linker (SEQ ID NO: 15), a second stem-loop linker (SEQ ID NO: 16), a first primer (SEQ ID NO: 3) and a second primer (SEQ ID NO: 4).
This example 7 was performed using the Illumina platform, and the library construction method was substantially the same as in example 1.
Example 8cfDNA sequencing library construction
This example 8 employed a first stem-loop linker (SEQ ID NO: 17), a second stem-loop linker (SEQ ID NO: 18), a first primer (SEQ ID NO: 19) and a second primer (SEQ ID NO: 20).
This example 8 is applied to Ion Torrent platform for sequencing, and the database establishment method is basically the same as that of example 1.
The sequencing library products obtained in examples 1 to 8 above were tested and subjected to fragment analysis and bioinformatics analysis using Qsep100 according to the instructions.
As a result: in the sequencing libraries obtained in examples 1 to 8, the proportion of fragments of the target product was not less than 80%, and the proportion of fragments of the dimer was less than 20%; under the same template starting conditions, the sequencing libraries of examples 1-8 have higher yield compared to conventional methods for constructing sequencing libraries. In addition, as can be seen from sequencing bioinformatics analysis, based on the same sequencing data amount, compared with a conventional sequencing library construction method, the methods and the adapters of examples 1 to 8 have the advantages of high unique comparison rate of data, high overall genome coverage rate and more comprehensive sequencing information.
Example 9cfDNA sequencing library construction
This example 9 employed a first stem-loop linker (SEQ ID NO: 21), a second stem-loop linker (SEQ ID NO: 22), a first primer (SEQ ID NO: 23), and a second primer (SEQ ID NO: 24).
This example 9 was sequenced using the Illumina platform using the substantially same library construction method as in example 1.
The sequencing library products obtained in this example 9 were tested by performing fragment analysis and bioinformatics analysis using Qsep100 according to the instructions.
As a result: in the sequencing library obtained in this example 9, the proportion of the target product fragment was not less than 75%, and the proportion of the dimer fragment was less than 25%; under the same template starting conditions, the sequencing library of this example 9 has higher yield compared to the conventional method for constructing sequencing libraries. In addition, it can be known from sequencing bioinformatics analysis that, based on the same sequencing data amount, compared with a conventional sequencing library construction method, the method and the linker of the embodiment 9 have the advantages of high unique comparison rate of data, high overall genome coverage rate and more comprehensive sequencing information.
Example 10 interrupted DNA library construction
Human genome extraction: extracting genome DNA in human cells by using a genome extraction kit, and performing ultrasonic disruption to obtain disrupted DNA.
The library construction method used in this example 10 is substantially the same as in example 1, and the stem-loop adapters and primers used are the same as in example 1.
As a result: based on the same sequencing data amount, compared with the conventional method for constructing a sequencing library, the method and the adaptor in the embodiment 10 have the advantages of high unique comparison rate of data, high overall genome coverage rate and more comprehensive sequencing information.
In addition, the inventor utilizes the stem-loop connectors, the primers and the method of the embodiments 2-9 to carry out library construction and sequencing analysis on the interrupted genome respectively, and compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the method and the connectors disclosed by the invention have the advantages that the unique comparison rate of the obtained data is high, the coverage rate of the whole genome is high, and the sequencing information is more comprehensive.
Example 11 construction of DNA sequencing library from Paraffin-Embedded samples
Extracting DNA of a paraffin-embedded sample: the DNA in paraffin-embedded samples was extracted using FFPE (formalin fixed-paraffin embedded) DNA extraction kit.
The library construction method used in this example 11 was substantially the same as in example 1, and the stem-loop adapters and primers used were the same as in example 1.
As a result: based on the same sequencing data amount, compared with the conventional method for constructing a sequencing library, the method and the adaptor in the embodiment 11 have the advantages of high unique comparison rate of data, high overall genome coverage rate and more comprehensive sequencing information.
In addition, the inventor utilizes the stem-loop connectors, the primers and the method of the embodiments 2 to 9 to carry out library construction and sequencing analysis on DNA in paraffin-embedded samples, and compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the method and the connectors of the invention have the advantages of high unique data comparison rate, high overall genome coverage rate and more comprehensive sequencing information.
Example 12 bisulfite treated DNA sequencing library construction
Extracting genome DNA from human body cell by using genome extraction kit, performing ultrasonic interruption, and performing EZ DNAlysis-Direct TM Kit (Catalog nos. D5020) treatment gave DNA treated with bisulfite.
The library construction method used in this example 12 was substantially the same as in example 1, and the stem-loop adapters and primers used were the same as in example 1.
As a result: based on the same sequencing data amount, compared with the conventional method for constructing a sequencing library, the method and the adaptor in the embodiment 12 have the advantages of high unique comparison rate of data, high overall genome coverage rate and more comprehensive sequencing information.
In addition, the inventor utilizes the stem-loop connectors, primers and methods of the embodiments 2 to 9 to carry out library construction and sequencing analysis on DNA treated by bisulfite, and compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the method and the connectors of the invention have the advantages of high unique data comparison rate, high overall genome coverage rate and more comprehensive sequencing information.
Verification example
1. The cfDNA sample extracted in example 1 was subjected to library construction of a commercial double-strand library construction kit (Fapon, NK 007), and the official specification of the kit was referred to for a specific operation procedure. The above product concentrations and library fragment analysis alignment are shown in FIG. 3 and FIG. 4, respectively. Sequencing was performed using the Illumina platform and the results of the sequencing alignment are shown in figure 6.
As a result, it was found that, under the same template starting conditions, the single-stranded nucleic acid molecule library of example 1 was more highly obtained and the dimer ratio was lower (+, < 3%) than the conventional method for constructing a sequencing library based on a double-stranded nucleic acid molecule. In addition, it can be known from sequencing bioinformatics analysis that, based on similar sequencing data volume, compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the method for constructing a library product by using single-stranded nucleic acid molecules in example 1 has the advantages of high unique data comparison rate, high overall genome coverage rate and good capture uniformity. Therefore, the invention can effectively complete single-strand library construction aiming at cfDNA.
2. The sequencing library products obtained in the above examples 2-8 were subjected to the concentration determination by Qubit, and the fragment analysis by Qsep100, and the results are shown in FIG. 3.
As a result: in the sequencing libraries obtained in examples 2 to 8, under the same template starting conditions, the sequencing libraries of examples 2 to 8 have no significant difference from the single-stranded nucleic acid library-building product yield and dimer ratio of example 1. In addition, it can be known from sequencing bioinformatics analysis that the data obtained from the sequencing libraries obtained in examples 2 to 8 still have significant advantages compared with the conventional construction of the sequencing library based on double-stranded nucleic acid molecules based on similar sequencing data amount.
3. The concentration of the sequencing library product obtained in this example 9 was determined by the Qubit, and the product concentration of the fragment analysis and the library fragment analysis alignment of the Qsep100 were shown in fig. 3 and fig. 4, respectively. The sequencing results are shown in FIG. 6.
As a result: under the same template starting conditions, the sequencing library obtained in this example 9 has a similar yield to the single-stranded nucleic acid library of example 1, but a higher dimer proportion (+ +, < 10%). However, the yield of the sequencing library obtained in this example 9 is higher, and the dimer ratio is lower than that of the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules. In addition, it can be seen from the sequencing bioinformatics analysis that the data obtained from the sequencing library obtained in example 9 still has significant advantages over the conventional construction of a sequencing library based on double-stranded nucleic acid molecules based on similar sequencing data amounts.
4. The DNA sample disrupted in example 10 was subjected to library construction using a commercial double-stranded library construction kit (Fapon, NK 007), and the detailed procedures were referred to the official instructions of the kit. The above product concentrations and library fragment analysis alignment are shown in FIG. 3 and FIG. 5, respectively. Sequencing was performed using the Illumina platform and the sequencing alignment is shown in figure 6.
As a result: compared with the conventional method for constructing a sequencing library based on a double-stranded nucleic acid molecule under the same template starting condition, the single-stranded nucleic acid library construction product of the embodiment 10 has higher yield and lower dimer proportion (+, < 3%). In particular, the single-stranded nucleic acid library building products of example 10 are more widely distributed and have smaller main peaks, indicating that the single-stranded library building method has higher utilization rate for template nucleic acid fragments. In addition, based on similar sequencing data quantity, compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the single-stranded nucleic acid library constructing method and the adaptor in the embodiment 10 have the advantages of high unique data comparison rate, high whole genome coverage rate and good capture uniformity. Thus, the present invention can efficiently perform single-stranded nucleic acid pooling for disrupted DNA.
In addition, the inventors carried out library construction analysis on the interrupted DNA by using the stem-loop linkers, primers and methods of examples 2 to 9, respectively, and compared with the method for constructing a sequencing library by using single nucleic acid molecules of example 10, the yield of the obtained single-stranded nucleic acid library construction product and the dimer proportion have no significant difference.
5. The paraffin-embedded sample DNA of example 11 was subjected to library construction using a commercial double-strand library construction kit (Fapon, NK 007), and the detailed procedures were referred to the official instructions of the kit. The results of the above product concentration alignment are shown in FIG. 3. Sequencing was performed using the Illumina platform and the results of the sequencing alignment are shown in figure 6.
As a result: under the same template starting conditions, compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the single-stranded nucleic acid library construction product of the example 11 has higher yield and lower dimer proportion (+, < 3%). Based on similar sequencing data quantity, compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the single-stranded nucleic acid library constructing method and the adaptor of the embodiment 11 have the advantages of high unique comparison rate of data, high overall genome coverage rate and good capturing uniformity, thereby showing that the method can effectively complete single-stranded library construction aiming at paraffin-embedded sample DNA.
In addition, the inventors carried out library construction analysis on the DNA in paraffin-embedded samples by using the stem-loop linkers, primers and methods of examples 2 to 9, respectively, and compared with the method for constructing sequencing libraries by using single nucleic acid molecules of example 11, there was no significant difference in the yield of single-stranded nucleic acid library construction products and the dimer proportion.
6. The DNA treated with bisulfite in example 12 was subjected to library construction using a commercial double-strand library construction kit (Fapon, NK 007), and the detailed procedures were described in the official manual of the kit. The results of the above product concentration alignment are shown in FIG. 3. Sequencing was performed using the Illumina platform and the sequencing alignment is shown in figure 6.
As a result: compared with the conventional method for constructing a sequencing library based on a double-stranded nucleic acid molecule under the same template starting condition, the single-stranded nucleic acid library construction product of the example 12 has higher yield and lower dimer proportion (+, < 3%). Based on similar sequencing data quantity, compared with the conventional method for constructing a sequencing library based on double-stranded nucleic acid molecules, the single-stranded nucleic acid library constructing method and the adaptor of the embodiment 12 have the advantages of high unique comparison rate of data, high overall genome coverage rate and good capture uniformity.
In addition, the inventors carried out library construction analysis on bisulfite treated DNA by using the stem-loop linkers, primers and methods of examples 2 to 9, respectively, and compared with the method for constructing sequencing library by using single nucleic acid molecules of example 12, there was no significant difference in yield of single-stranded nucleic acid library construction products and dimer proportion.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
SEQUENCE LISTING
<110> Guangdong Fengcong biological Co., ltd
<120> method for constructing single-stranded nucleic acid molecule sequencing library
<130> SI4220016
<160> 24
<170> PatentIn version 3.5
<210> 1
<211> 49
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop connector
<220>
<221> misc_feature
<222> (1)..(6)
<223> N is A, T, G or C
<400> 1
nnnnnngatc gtcggactga cacgttcaga gttctacagt ccgacgatc 49
<210> 2
<211> 49
<212> DNA
<213> Artificial Sequence
<220>
<223> second Stem-Loop Joint
<220>
<221> misc_feature
<222> (44)..(49)
<223> N is A, T, G or C
<400> 2
agatcggaag agcacacgtc tgaactccag gctcttccga tctnnnnnn 49
<210> 3
<211> 55
<212> DNA
<213> Artificial Sequence
<220>
<223> first primer
<400> 3
aatgatacgg cgaccaccga gatctacacg ttcagagttc tacagtccga cgatc 55
<210> 4
<211> 64
<212> DNA
<213> Artificial Sequence
<220>
<223> second primer
<220>
<221> misc_feature
<222> (25)..(30)
<223> "mmmmmmmmmmmm" is a primer index portion
<400> 4
caagcagaag acggcatacg agatmmmmmm gtgactggag ttcagacgtg tgctcttccg 60
atct 64
<210> 5
<211> 44
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop connector
<220>
<221> misc_feature
<222> (1)..(6)
<220>
<221> misc_feature
<222> (1)..(6)
<223> N is A, T, G or C
<400> 5
nnnnnngatc gtcggactgt tcagagttct acagtccgac gatc 44
<210> 6
<211> 44
<212> DNA
<213> Artificial Sequence
<220>
<223> second Stem-Loop Joint
<220>
<221> misc_feature
<222> (39)..(44)
<223> N is A, T, G or C
<400> 6
agatcggaag agcacacgtc tgaacgctct tccgatctnn nnnn 44
<210> 7
<211> 39
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop joint
<220>
<221> misc_feature
<222> (1)..(6)
<223> N is A, T, G or C
<400> 7
nnnnnngatc gtcggactga gttctacagt ccgacgatc 39
<210> 8
<211> 39
<212> DNA
<213> Artificial Sequence
<220>
<223> second stem-loop joint
<220>
<221> misc_feature
<222> (34)..(39)
<223> N is A, T, G or C
<400> 8
agatcggaag agcacacgtc gctcttccga tctnnnnnn 39
<210> 9
<211> 56
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop joint
<220>
<221> misc_feature
<222> (1)..(6)
<223> N is A, T, G or C
<400> 9
nnnnnngatc gtcggactgt agaactacac gttcagagtt ctacagtccg acgatc 56
<210> 10
<211> 56
<212> DNA
<213> Artificial Sequence
<220>
<223> second Stem-Loop Joint
<220>
<221> misc_feature
<222> (51)..(56)
<223> N is A, T, G or C
<400> 10
agatcggaag agcacacgtc tgaactccag gacgtgtgct cttccgatct nnnnnn 56
<210> 11
<211> 46
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop joint
<220>
<221> misc_feature
<222> (1)..(3)
<223> N is A, T, G or C
<400> 11
nnngatcgtc ggactgacac gttcagagtt ctacagtccg acgatc 46
<210> 12
<211> 46
<212> DNA
<213> Artificial Sequence
<220>
<223> second stem-loop joint
<220>
<221> misc_feature
<222> (44)..(46)
<223> N is A, T, G or C
<400> 12
agatcggaag agcacacgtc tgaactccag gctcttccga tctnnn 46
<210> 13
<211> 53
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop connector
<220>
<221> misc_feature
<222> (1)..(10)
<220>
<221> misc_feature
<222> (1)..(10)
<223> N is A, T, G or C
<400> 13
nnnnnnnnnn gatcgtcgga ctgacacgtt cagagttcta cagtccgacg atc 53
<210> 14
<211> 53
<212> DNA
<213> Artificial Sequence
<220>
<223> second Stem-Loop Joint
<220>
<221> misc_feature
<222> (44)..(53)
<223> N is A, T, G or C
<400> 14
agatcggaag agcacacgtc tgaactccag gctcttccga tctnnnnnnn nnn 53
<210> 15
<211> 51
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop joint
<220>
<221> misc_feature
<222> (1)..(6)
<223> N is A, T, G or C
<400> 15
nnnnnngatc gtcggactgt aacacgttca gagttctaca gtccgacgat c 51
<210> 16
<211> 54
<212> DNA
<213> Artificial Sequence
<220>
<223> second Stem-Loop Joint
<220>
<221> misc_feature
<222> (49)..(54)
<223> N is A, T, G or C
<400> 16
agatcggaag agcacacgtc tgaactccag cgtgtgctct tccgatctnn nnnn 54
<210> 17
<211> 60
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop joint
<220>
<221> misc_feature
<222> (1)..(6)
<223> N is A, T, G or C
<400> 17
nnnnnnatca ccgactgccc cactacgcct ccgctttcct ctctatgggc agtcggtgat 60
<210> 18
<211> 61
<212> DNA
<213> Artificial Sequence
<220>
<223> second stem-loop joint
<220>
<221> misc_feature
<222> (4)..(13)
<223> "mmmmmmmmmmmmmm" is 10 barcode sequences
<220>
<221> misc_feature
<222> (44)..(53)
<223> "mmmmmmmmmmmmmm" is 10 barcode sequences
<220>
<221> misc_feature
<222> (56)..(61)
<223> n is a, c, g, or t
<400> 18
atcmmmmmmm mmctgagtcg gagacacgca gggatgagat ggmmmmmmmm mmgatnnnnn 60
n 61
<210> 19
<211> 28
<212> DNA
<213> Artificial Sequence
<220>
<223> first primer
<400> 19
ccactacgcc tccgctttcc tctctatg 28
<210> 20
<211> 20
<212> DNA
<213> Artificial Sequence
<220>
<223> second primer
<400> 20
ccatctcatc cctgcgtgtc 20
<210> 21
<211> 52
<212> DNA
<213> Artificial Sequence
<220>
<223> first stem-loop connector
<220>
<221> misc_feature
<222> (1)..(6)
<223> N is A, T, G or C
<400> 21
nnnnnnagat cggaagagca cactctttcc ctacacgacg ctcttccgat ct 52
<210> 22
<211> 53
<212> DNA
<213> Artificial Sequence
<220>
<223> second stem-loop joint
<220>
<221> misc_feature
<222> (48)..(53)
<223> N is A, T, G or C
<400> 22
agatcggaag agcacacgtc tgaactccag tcacgctctt ccgatctnnn nnn 53
<210> 23
<211> 58
<212> DNA
<213> Artificial Sequence
<220>
<223> first primer
<400> 23
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58
<210> 24
<211> 64
<212> DNA
<213> Artificial Sequence
<220>
<223> second primer
<220>
<221> misc_feature
<222> (25)..(30)
<223> "mmmmmmmmmmmm" is a primer index portion
<400> 24
caagcagaag acggcatacg agatmmmmmm gtgactggag ttcagacgtg tgctcttccg 60
atct 64

Claims (22)

1. A method for constructing a single-stranded nucleic acid molecule sequencing library, which is characterized by comprising the following steps of:
(1) Ligating both ends of the single-stranded nucleic acid molecule to a first stem-loop linker and a second stem-loop linker, respectively, to obtain ligation products; and
(2) Subjecting the ligation products of step (1) to an amplification treatment to obtain amplification products carrying a sequencing primer region, the amplification products constituting the sequencing library,
wherein the first stem-loop connector has:
a first single-stranded degenerate sequence region located 5' to the first stem-loop linker;
a first double-stranded region attached to the side of the first single-stranded degenerate sequence region distal to the 5' end of the first stem-loop linker; and
a first single-stranded loop region, both ends of said first single-stranded loop region being linked to both ends of said first double-stranded region distal to said first single-stranded degenerate sequence region,
the second stem-loop joint having:
a second single-stranded degenerate sequence region located 3' to said second stem-loop linker;
a second double-stranded region linked to the side of the second single-stranded degenerate sequence region distal to the 3' end of the second stem-loop linker; and
and two ends of the second single-chain loop region are respectively connected with two ends of the second double-chain region far away from one side of the second single-chain degenerate sequence region.
2. The method of claim 1, wherein the first duplex region and the second duplex region differ in sequence.
3. The method of claim 1, wherein the first duplex region and the second duplex region are the same length.
4. The method of claim 1, wherein the first duplex region is not fully complementary to the second duplex region.
5. The method of claim 1, wherein there are at least 2 mismatched base sites between the sequences of the first duplex region and the second duplex region.
6. The method of claim 1, wherein there are no more than 3 consecutive matched base sequences at the 3' end of the first duplex region to the second duplex region sequence.
7. The method of claim 1, wherein there are no more than 3 consecutive matched base sequences at the 3' end of the first duplex region and the second stem-loop adapter sequence.
8. The method of claim 1, wherein the length of each of the first single-stranded degenerate sequence region and the second single-stranded degenerate sequence region is independently 3 to 10bp; alternatively, each independently is 5 to 8bp.
9. The method of claim 1, wherein the first duplex region and the second duplex region are each independently 13 to 20bp in length; alternatively, each independently 15 to 18bp.
10. The method according to claim 1, wherein the lengths of the first single-stranded loop region and the second single-stranded loop region are each independently 7 to 30bp; optionally, each independently is 7 to 28bp; optionally, each independently is 10-28 bp; optionally, each independently is 10-21 bp; optionally, each independently is 10-17 bp; alternatively, each independently is 10 to 15bp.
11. The method of claim 1, wherein the single-stranded nucleic acid molecule is a DNA molecule.
12. The method of claim 1, wherein the single-stranded nucleic acid molecule is derived from heat-denatured DNA, cleaved DNA, recycled DNA, paraffin-embedded sample DNA, or bisulfite-treated DNA.
13. The method of claim 1, wherein the amplification treatment is performed using a first primer and a second primer, wherein,
the first primer recognizes at least a portion of at least one of the first double-stranded region and the first single-stranded loop region,
the second primer recognizes at least a portion of at least one of the second double-stranded region and the second single-stranded loop region.
14. The method of claim 1, wherein at least one of the first primer and the second primer carries a sequencing primer sequence.
15. The method of claim 1, wherein the 5 'end of the first stem-loop adaptor and the 3' end of the second stem-loop adaptor are treated with a blocking treatment.
16. The method of claim 1, wherein the 5' end of the second stem-loop junction is not blocked.
17. The method of claim 1, wherein the 5' end of the second stem-loop linker is phosphorylated.
18. A sequencing library obtainable by the method of any one of claims 1 to 17.
19. A method of sequencing, comprising:
constructing a sequencing library according to the method of any one of claims 1 to 17; and
sequencing the sequencing library.
20. A kit for library construction based on sequencing of single stranded nucleic acid molecules, comprising: a first stem-loop adaptor and a second stem-loop adaptor as claimed in any one of claims 1 to 17.
21. The kit of claim 20, wherein the kit further comprises at least one of:
a ligase; and
the first primer and the second primer of claim 13.
22. A method for sequencing free nucleic acid, comprising:
obtaining free nucleic acid;
subjecting the free nucleic acids to a denaturation treatment so as to obtain single-stranded nucleic acid molecules;
constructing a sequencing library according to the method of any one of claims 1 to 17 based on said single stranded nucleic acid molecule;
sequencing the sequencing library to obtain sequencing data.
CN202210377099.1A 2021-04-13 2022-04-11 Method for constructing single-stranded nucleic acid molecule sequencing library Pending CN115197998A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021103932830 2021-04-13
CN202110393283 2021-04-13

Publications (1)

Publication Number Publication Date
CN115197998A true CN115197998A (en) 2022-10-18

Family

ID=83575079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210377099.1A Pending CN115197998A (en) 2021-04-13 2022-04-11 Method for constructing single-stranded nucleic acid molecule sequencing library

Country Status (1)

Country Link
CN (1) CN115197998A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117701679A (en) * 2024-02-06 2024-03-15 中国医学科学院基础医学研究所 Single-stranded DNA specific high-throughput sequencing method based on 5' connection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117701679A (en) * 2024-02-06 2024-03-15 中国医学科学院基础医学研究所 Single-stranded DNA specific high-throughput sequencing method based on 5' connection

Similar Documents

Publication Publication Date Title
US10961529B2 (en) Barcoding nucleic acids
CN105531375B (en) Method for targeted genomic analysis
CN110248675B (en) Construction of Next Generation Sequencing (NGS) libraries using competitive strand displacement
CN105400776B (en) Oligonucleotide linker and application thereof in constructing nucleic acid sequencing single-stranded circular library
WO2018024082A1 (en) Method for constructing serially-connected rad tag sequencing libraries
CN114829623A (en) Methods and compositions for high throughput sample preparation using dual unique dual indices
CN110904512A (en) High-throughput sequencing library construction method suitable for single-stranded DNA
JP7046097B2 (en) How to attach an adapter to sample nucleic acid
CN114540472B (en) Three-generation sequencing method
WO2018148289A2 (en) Duplex adapters and duplex sequencing
US20150284716A1 (en) Method for single cell sequencing of mirnas and other cellular rnas
CN115197998A (en) Method for constructing single-stranded nucleic acid molecule sequencing library
WO2018057779A1 (en) Compositions of synthetic transposons and methods of use thereof
WO2001079553A1 (en) Method and compositions for ordering restriction fragments
CN112941147A (en) High-fidelity target gene library building method and kit thereof
US20180100180A1 (en) Methods of single dna/rna molecule counting
CN107083427B (en) DNA ligase mediated DNA amplification technology
JP2003518953A (en) Methods for nucleic acid analysis
EP4048812B1 (en) Methods for 3&#39; overhang repair
CN115125624A (en) Barcode adaptor and medium-throughput multiple single-cell representative DNA methylation library construction and sequencing method
CN117887809A (en) Hi-C library construction method
CN117802205A (en) Single-cell Hi-C library construction method
CN117845338A (en) Hi-C library construction method of PCR free
EP1295942A2 (en) Method for processing a library using ligation inhibition
Devoe express

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination