CN117321194A

CN117321194A - Preparation method of nucleic acid sequencing library

Info

Publication number: CN117321194A
Application number: CN202280035174.4A
Authority: CN
Inventors: 阿尔瓦罗·戈迪内斯
Original assignee: Becton Dickinson and Co
Current assignee: Becton Dickinson and Co
Priority date: 2021-05-14
Filing date: 2022-05-12
Publication date: 2023-12-29

Abstract

The disclosure herein includes methods, compositions, and kits suitable for generating libraries for nucleic acid sequencing. In some embodiments, more than one protein complex is provided. Each protein complex may comprise a transposome and a programmable DNA binding unit capable of specifically binding to a user-selected binding site on target double-stranded DNA (dsDNA). The binding sites of each of the more than one protein complexes may be different from each other. The transposomes may comprise a transposase, a first adaptor and a second adaptor. The first adapter, the second adapter, or both may be sequencing adapters.

Description

Preparation method of nucleic acid sequencing library

RELATED APPLICATIONS

The present application claims the benefit of U.S. patent application Ser. No. 63/189,032 filed on day 2021, 5, 14 and U.S. patent application Ser. No. 63/243,443 filed on day 2021, 9, 13 in accordance with 35 U.S. C. ≡119 (e), the contents of these related applications are incorporated herein by reference in their entirety for all purposes.

Reference to sequence Listing

The present application is filed with a sequence listing in electronic format. The sequence listing is provided as a file titled 68eb_317326_wo_sequence_listing, which was created at 2022, 5 months, 12 days, and is 56.0 kilobytes in size. The information of the sequence listing in electronic format is incorporated herein by reference in its entirety.

Background

FIELD

The present disclosure relates generally to the field of molecular biology, such as tagging nucleic acids to generate customized locus-specific sequencing libraries.

Description of related Art

Conventional library preparation methods for nucleic acid sequencing may take several hours to make, and the process produces a randomly made library. The reason for these libraries is random is that the methods used to fragment the nucleic acids (including physical, enzymatic and chemical fragmentation methods) are performed in a random manner. Thus, the output of DNA sequencing cannot be controlled. Currently, two methods have been used for targeted sequencing. The first is amplicon sequencing. This method relies on the use of primers to amplify a region of interest by DNA amplification. This additional amplification step further increases the cost, time and resources of standard library preparation methods. The second targeted sequencing method is target capture. Such methods rely on the use of probes or pools of probes so that they can hybridize to a particular nucleic acid target. Hybridization of probes to their targets and separation of these targets is a time consuming process, which may take days. In addition, the probes used in this method are expensive to synthesize. There is a need for compositions, methods, systems, and kits for custom locus specific library preparation. There is a need for methods, compositions, kits and systems that enable rapid targeted sequencing (and thus rapid sequencing-based diagnostics, e.g., less than 2 hours), as well as therapeutic diagnostics (theranotics) that can provide simultaneous diagnostics and determine appropriate therapeutic methods.

SUMMARY

The disclosure herein includes compositions. In some embodiments, the composition comprises: more than one protein complex. In some embodiments, each of the more than one protein complexes comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target double-stranded DNA (dsDNA). In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. In some embodiments, the binding sites of each of the more than one protein complexes are different from each other.

In some embodiments, at least two of the more than one protein complexes comprise the same transposomes. In some embodiments, more than one protein complex all comprise the same transposomes. In some embodiments, more than one protein complex all comprise the same transposase. In some embodiments, the first adaptor and the second adaptor in the same transposome are the same. In some embodiments, the first adapter, the second adapter, or both in different transposomes are different. In some embodiments, the first adapter, the second adapter, or both are dsDNA or RNA/DNA duplex. In some embodiments, the length of the adapter is about 3-200 base pairs. In some embodiments, the first adapter, the second adapter, or both are sequencing adapters. In some embodiments, the sequencing adapter comprises a P5 or P7 primer sequence.

In some embodiments, the binding sites of at least two of the more than one protein complexes are located on the same target dsDNA. In some embodiments, the binding sites of at least two of the more than one protein complexes are about 1-50000 nucleotides apart on the same target dsDNA. In some embodiments, the distance between the binding sites of one pair of more than one protein complex is substantially the same as the distance between the binding sites of another pair of more than one protein complex. In some embodiments, the distance between the binding sites of one pair of more than one protein complex is different from the distance between the binding sites of another pair of more than one protein complex. In some embodiments, the binding sites of at least two of the more than one protein complexes are located on different strands of the target dsDNA. In some embodiments, at least two of the more than one protein complex are capable of specifically binding to different target dsDNA.

In some embodiments, more than one protein complex is capable of specifically binding to about 2-5000 targets dsDNA. In some embodiments, the transposase is a Tn5 transposase, a Tn7 transposase, a mariner Tc 1-like transposase, a Himar1C9 transposase, or a Sleeping Beauty (Sleeping Beauty) transposase. In some embodiments, the transposase is a superactive transposase. In some embodiments, the programmable DNA binding units include nuclease-deficient CRISPR-associated protein (dCAS protein) and guide RNAs (grnas) capable of specifically binding to a target dsDNA binding site. In some embodiments, the transposomes are associated with the programmable DNA binding unit by a linker linking the transposase and the dCAS protein. In some embodiments, the linker comprises a peptide linker, a chemical linker, or both. In some embodiments, the transposase is present as a fusion protein comprising a dCAS protein. In some embodiments, the dCAS protein is dCAS9, dCAS12, dCAS13, dCAS14, or SpRY dCAS. In some embodiments, the dCAS13 protein is dCAS13a, dCAS13b, dCAS13c, or dCAS13d.

In some embodiments, the programmable DNA binding unit comprises a protein component capable of specifically binding to a binding site on the target dsDNA. In some embodiments, the protein component comprises an endonuclease-deficient Zinc Finger Nuclease (ZFN), an endonuclease-deficient transcription activator-like effector nuclease (TALEN), an Argonaute protein, an endonuclease-deficient meganuclease, a recombinase, or a combination thereof. In some embodiments, the transposomes are associated with the programmable DNA binding unit through a linker linking the transposase and the protein component. In some embodiments, the linker comprises a peptide linker, a chemical linker, or both. In some embodiments, the peptide linker comprises more than one glycine, serine, threonine, alanine, lysine, glutamine, or a combination thereof. In some embodiments, the peptide linker comprises a GS linker. In some embodiments, the peptide linker is an XTEN linker. In some embodiments, the protein component is present as a fusion protein comprising a transposase.

The disclosure herein includes reaction mixtures. In some embodiments, the reaction mixture comprises: the compositions disclosed herein and sample nucleic acids suspected of comprising one or more target dsDNA. The reaction mixture may comprise: DNA polymerase, dntps, or a combination thereof. In some embodiments, the adapter is covalently attached to the target dsDNA or fragment thereof. The reaction mixture may comprise: more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of more than one protein complex at each end, respectively. In some embodiments, the sample nucleic acid comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. In some embodiments, the target dsDNA is genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. In some embodiments, the sample nucleic acid is from a biological sample, a clinical sample, an environmental sample, or a combination thereof. In some embodiments, the biological sample comprises stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue (respiratory tissue), exudates, bodily fluids, or combinations thereof.

The disclosure herein includes methods for tagging nucleic acids. In some embodiments, the method comprises: contacting a composition disclosed herein with a sample suspected of containing more than one target dsDNA to form a reaction mixture; and incubating the reaction mixture to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively.

The disclosure herein includes methods for generating sequencing libraries. In some embodiments, the method comprises: the compositions disclosed herein are contacted with a sample suspected of containing more than one target dsDNA to form a reaction mixture. The method may include: the reaction mixture is incubated to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively. The method may include: more than one dsDNA fragment is amplified with primers capable of binding to adaptors at the ends of the dsDNA fragments to generate a sequencing library.

In some embodiments, each primer is about 5-80 nucleotides in length. In some embodiments, amplification of more than one dsDNA fragment with primers is performed using Polymerase Chain Reaction (PCR). In some embodiments, PCR is loop-mediated isothermal amplification (LAMP), helicase-dependent amplification (HDA), recombinase Polymerase Amplification (RPA), strand Displacement Amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), nicking Enzyme Amplification Reaction (NEAR), rolling Circle Amplification (RCA), multiple Displacement Amplification (MDA), branched amplification (RAM), circular helicase-dependent amplification (cHDA), single Primer Isothermal Amplification (SPIA), signal-mediated RNA amplification technology (SMART), self-sustained sequence replication (3 SR), genomic index amplification reaction (GEAR), or Isothermal Multiple Displacement Amplification (IMDA). In some embodiments, the PCR is real-time PCR or quantitative real-time PCR (QRT-PCR). In some embodiments, the sample comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof.

In some embodiments, the more than one target dsDNA comprises genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. In some embodiments, the sample is, or is derived from, a biological sample, a clinical sample, an environmental sample, or a combination thereof. In some embodiments, more than one target dsDNA comprises DNA from at least 2 different organisms. In some embodiments, more than one target dsDNA comprises DNA from at least 2 different genes. The method may include: more than one target dsDNA is produced from more than one target RNA using reverse transcriptase. In some embodiments, the more than one target dsDNA comprises a target dsDNA produced from a target RNA with a reverse transcriptase.

In some embodiments, more than one target dsDNA comprises a genetic feature of interest (genetic signature). In some embodiments, the genetic feature of interest comprises one or more mutations of interest. In some embodiments, the one or more mutations of interest include point mutations, inversions, deletions, insertions, translocations, replications, copy number variations, or combinations thereof. In some embodiments, the one or more mutations of interest include nucleotide substitutions, deletions, insertions, or combinations thereof. In some embodiments, the genetic characteristic of interest is indicative of antibiotic resistance or antibiotic susceptibility of the organism from which the target dsDNA is derived. In some embodiments, the genetic feature of interest is indicative of the cancer status of the organism from which the target dsDNA is derived. In some embodiments, the genetic characteristic of interest is indicative of a state of a genetic disease of the target dsDNA-derived organism. In some embodiments, the genetic disease is a monogenic disorder. In some embodiments, the genetic disease is cystic fibrosis, huntington's disease, sickle cell anemia, hemophilia, duchenne muscular dystrophy, thalassemia, fragile X syndrome, familial hypercholesterolemia, polycystic kidney disease, type I neurofibromatosis, hereditary spherical erythromatosis (hereditary spherocytosis), ma Fanzeng syndrome, tay-saxosis, phenylketonuria, mucopolysaccharidosis, lysosomal acid lipase deficiency, glycogen storage disease, galactosylation, or hemochromatosis (thermochromatis).

In some embodiments, contacting more than one target dsDNA with more than one protein complex pair is performed at about 25 ℃ to about 80 ℃. In some embodiments, incubating the reaction mixture comprises incubating the reaction mixture at about 37 ℃ to about 55 ℃. In some embodiments, more than one protein complex pair and more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 2000:1. In some embodiments, more than one protein complex pair and more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 200:1.

The method may include: one or both ends of one or more of the more than one dsDNA fragments are labeled. The method may include: the two ends of one or more of the more than one dsDNA fragments are labeled differently. In some embodiments, labeling includes labeling with an anionic label, a cationic label, a neutral label, an electrochemical label, a protein label, a fluorescent label, a magnetic label, or a combination thereof. The method may include: enriching for the labeled dsDNA fragments, capturing the labeled dsDNA fragments, isolating the labeled dsDNA fragments, and/or visualizing the labeled dsDNA fragments.

Brief Description of Drawings

FIG. 1 depicts a non-limiting exemplary conventional library preparation method for next generation sequencing. The ligation-based library preparation shown was replicated from www.idtdna.com/pages/technology/next-generation-sequencing/library-preparation/ligation-based-library-prep.

FIG. 2 depicts a non-limiting exemplary conventional sequencing library prepared by the enzyme digestion process-replication Nextera XT Library Prep: tips and Troubleshooting (2015) from Illumina.

FIG. 3 depicts a non-limiting exemplary schematic of the custom locus specific library preparation (CLLP) disclosed herein.

Fig. 4 depicts a non-limiting exemplary embodiment of targeted sequencing using a genome editing tool (Cas 9 and guide RNA).

Fig. 5A-5F depict non-limiting exemplary embodiments of the custom locus specific library preparation (CLLP) disclosed herein.

Fig. 6 depicts a non-limiting exemplary embodiment showing an ONT rapid sequencing kit based on enzymatic cleavage fragmentation. The workflow described from the Nanopore rapid sequencing kit replicates.

Fig. 7A-7H depict non-limiting exemplary embodiments of genome editing enzyme digestion fragmentation (Genome Editing Tagmentation, GET) for generating sequencing libraries for existing sequencing platforms (e.g., sequencing platforms from Oxford Nanopore).

FIG. 8 depicts a non-limiting exemplary schematic of a plasmid construct (3 XFlag-Cas9-Fl26-Tn5; SEQ ID NO: 1) for use in generating the protein complexes provided herein.

FIG. 9 depicts a non-limiting exemplary schematic of a plasmid construct (3 XFlag-Cas9-xTen-Tn5; SEQ ID NO: 2) for use in generating the protein complexes provided herein.

FIG. 10 depicts a non-limiting exemplary schematic of a plasmid construct (pET-Tn 5-xTen-dCAs9; SEQ ID NO: 3) for use in generating the protein complexes provided herein.

FIG. 11 depicts the relative binding sites of exemplary sgRNAs for the Salmonella enterica (S.enterica) InvA gene.

FIG. 12 depicts the relative binding sites of exemplary sgRNAs for the Salmonella enterica fliC gene.

Fig. 13 shows a graph of exemplary bioanalyzer data showing that cleavage in genomic DNA is specific to the expected size, demonstrating that guide RNAs for salmonella enterica (Salmonella Enterica) are functional. See also table 2.

FIG. 14 depicts a graph showing a tape station analysis of amplification of fragments generated by Tn5 using adapter A as a primer pair. This suggests that adaptors are added to the 5 'and 3' ends of the cleavage molecules.

FIG. 15 depicts a graph showing a tape station analysis of amplification of fragments generated by Tn5 using adapter B as a primer pair. This suggests that adaptors are added to the 5 'and 3' ends of the cleavage molecules.

FIG. 16 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified dCS 9-Fl26-Tn5 fusion protein. Arrows point to the fusion protein bands.

FIG. 17 depicts bioanalyzer analysis of an exemplary electrophoresis gel of recombinantly expressed and purified dCS 9-Fl26-Tn5 fusion protein.

FIG. 18 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified dCS 9-xTen-Tn5 fusion protein. Arrows point to the fusion protein bands.

FIG. 19 depicts bioanalyzer data from an exemplary electrophoretic analysis of recombinantly expressed and purified dCS 9-xTen-Tn5 fusion proteins.

FIG. 20 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified Tn5-Fl26-dCAS9 fusion proteins. Arrows point to the fusion protein bands.

FIG. 21 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified Tn5-xTen-dCAS9 fusion proteins. Arrows point to the fusion protein bands.

Fig. 22 depicts a tape station analysis of an amplification reaction using only catalytically active Cas9 (no fusion protein). No amplification was observed, indicating that Cas9 by itself cannot add adaptors to the 5 'and 3' ends of the digested fragments. The visible signal was from samples incubated with Cas9, but not PCR. The lower peak (lower peak) is a marker of 100bp size, and the upper peak (upper peak) is genomic DNA.

FIG. 23 depicts the tape station analysis of the amplification reaction after cleavage reaction with enzyme digestion of dCAS9-Fl26-Tn 5. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The inclusion of gRNA in this reaction resulted in broad peaks indicating random cleavage fragmentation. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.

FIG. 24 depicts an exemplary tape station analysis of an amplification reaction after cleavage reaction with the enzyme cleavage of dCAS9-xTen-Tn 5. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The inclusion of gRNA in this reaction resulted in broad peaks indicating random cleavage fragmentation. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.

FIG. 25 depicts a tape station analysis of an amplification reaction after cleavage with 100nM dXas 9-Fl26-Tn5 fusion protease. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.

FIG. 26 depicts the tape station analysis of the amplification reaction after cleavage reaction with 1nM dXas 9-Fl26-Tn5 fusion protease. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.

FIG. 27 depicts the tape station analysis of the amplification reaction after cleavage reaction with 100pM dCAS9-Fl26-Tn5 fusion protease. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.

FIG. 28 depicts a tape station analysis of the amplification reaction after cleavage reaction with 100pM dCAS9-Fl26-Tn5 fusion protease from enlarged FIG. 27.

FIG. 29 depicts the tape station analysis of the amplification reaction after cleavage reaction with 100pM dCAS9-xTen-Tn5 fusion protease. Lower, lower 100bp markers.

FIG. 30 depicts the tape station analysis of the amplification reaction after cleavage reaction with 10pM dCAS9-xTen-Tn5 fusion protease. Lower, lower 100bp markers.

FIG. 31 depicts the tape station analysis of the amplification reaction after cleavage reaction with 1pM dCAS9-xTen-Tn5 fusion protease.

FIG. 32 depicts bioanalyzer analysis of amplification from libraries loaded with only one adapter (adapter B) prepared by Tn 5-only enzymatic cleavage.

FIG. 33 depicts bioanalyzer analysis of library amplifications prepared from cleavage by dCS 9-Fl26-Tn 5-directed enzymatic cleavage loaded with only one adapter (adapter B). In this experiment, a shorter incubation protocol was used.

FIG. 34 depicts bioanalyzer analysis of library amplifications prepared from cleavage by dCS 9-Fl26-Tn 5-directed enzymatic cleavage loaded with only one adapter (adapter B). In this experiment, a longer incubation protocol was used.

FIG. 35 depicts an exemplary bioanalyzer analysis of library amplifications prepared from cleavage by dCAS9-Fl26-Tn5 directed enzymatic cleavage loaded with both adaptors A and B. In this experiment, a longer incubation protocol was used.

FIG. 36 depicts an exemplary bioanalyzer analysis of library amplifications prepared from cleavage by dCAS9-Fl26-Tn5 directed enzymatic cleavage loaded with both adaptors A and B. In this experiment, a shorter incubation protocol was used.

FIG. 37 depicts an exemplary embodiment of DNA fragments labeled with NGS sequence adaptors using the library methods disclosed herein based on CasTn-NEBNext ligation.

FIG. 38 depicts an exemplary tape station analysis of PCR amplification of Salmonella enterica genomic DNA samples incubated with dCAS9-xTen-Tn5 loaded with Salmonella enterica sgRNA. Lower, lower 100bp markers.

FIG. 39 depicts an exemplary tape station analysis of PCR amplification of Salmonella enterica samples incubated with dCAS9-xTen-Tn5 without sgRNA. Lower, lower 100bp markers.

FIG. 40 shows a graphical representation of fragments produced by dCS 9-Tn5 using a single adapter (e.g., adapter B).

FIG. 41 depicts a graphical representation of dCS 9-Tn5 fragments resulting from a reaction in which Tn5 is loaded with two different adaptors (e.g., adaptor A and adaptor B).

Fig. 42A-42B depict illustrations of the preparation of nebnet ligation-based libraries for next generation sequencing. The symbols shown in the legend mark the portions of the adaptor and primer sequences. Preparation of NEBNExt library fragments generated by digestion with dCAS9-Tn5 are shown in FIG. 37.

FIG. 43 depicts a graphical representation of the preparation of an enzyme digestion based fragmented Nextera library for next generation sequencing.

FIG. 44 depicts the preparation of a library based on enzymatic fragmentation using dCS 9-Tn5 directed enzymatic fragmentation.

Detailed description of the preferred embodiments

The following detailed description references the accompanying drawings, which form a part hereof. In the drawings, like reference numerals generally identify like elements unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated and make part of the disclosure herein.

Regarding the related art, all patents, published patent applications, other publications, and sequences from GenBank and other databases mentioned herein are incorporated by reference in their entirety.

The disclosure herein includes reaction mixtures. In some embodiments, the reaction mixture comprises: the compositions disclosed herein and sample nucleic acids suspected of comprising one or more target dsDNA. The reaction mixture may comprise: DNA polymerase, dntps, or a combination thereof. In some embodiments, the adapter is covalently attached to the target dsDNA or fragment thereof. The reaction mixture may comprise: more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of more than one protein complex at each end, respectively. In some embodiments, the sample nucleic acid comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. In some embodiments, the target dsDNA is genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. In some embodiments, the sample nucleic acid is from a biological sample, a clinical sample, an environmental sample, or a combination thereof. In some embodiments, the biological sample comprises stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue, exudates, body fluids, or combinations thereof.

The disclosure herein includes methods for generating sequencing libraries. In some embodiments, the method comprises: the compositions disclosed herein are contacted with a sample suspected of containing more than one target double-stranded DNA (dsDNA) to form a reaction mixture. The method may include: the reaction mixture is incubated to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively. The method may include: more than one dsDNA fragment is amplified with primers capable of binding to adaptors at the ends of the dsDNA fragments to generate a sequencing library.

Definition of the definition

Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. See, e.g., singleton et al, dictionary of Microbiology and Molecular Biology, 2 nd edition, j.wiley & Sons (New York, NY 1994); sambrook et al Molecular Cloning, A Laboratory Manual, cold Spring Harbor Press (Cold Spring Harbor, NY 1989). For the purposes of this disclosure, the following terms are defined below.

As used herein, the term "adapter" may mean a sequence capable of facilitating amplification or sequencing of an associated nucleic acid. The associated nucleic acid may include a target nucleic acid. The associated nucleic acids may include one or more of a spatial marker, a target marker, a sample marker, an index marker, or a barcode sequence (e.g., a molecular marker). The adaptors may be linear. The adaptor may be a pre-adenylated adaptor (pre-adenylated adaptors). The adaptors may be double-stranded or single-stranded. One or more adaptors may be located at the 5 'end or the 3' end of the nucleic acid. When the adaptor comprises a known sequence at the 5 'end and the 3' end, the known sequences may be the same or different sequences. Adaptors located at the 5 'end and/or 3' end of the polynucleotide may be capable of hybridizing to one or more oligonucleotides immobilized on a surface. In some embodiments, the adapter may comprise a universal sequence. A universal sequence may be a region of nucleotide sequence that is common to two or more nucleic acid molecules. Two or more nucleic acid molecules may also have regions of different sequences. Thus, for example, a 5 'adapter may comprise the same and/or a universal nucleic acid sequence, and a 3' adapter may comprise the same and/or a universal sequence. A universal sequence that may be present in different members of more than one nucleic acid molecule may allow replication or amplification of more than one different sequence using a single universal primer that is complementary to the universal sequence. Similarly, at least one, two (e.g., a pair), or more universal sequences that may be present in different members of a collection of nucleic acid molecules may allow replication or amplification of more than one different sequence using at least one, two (e.g., a pair), or more single universal primers that are complementary to the universal sequences. Thus, the universal primers comprise sequences that can hybridize to such universal sequences. Molecules having target nucleic acid sequences can be modified to attach adaptors (e.g., non-target nucleic acid sequences) to one or both ends of different target nucleic acid sequences. The one or more universal primers attached to the target nucleic acid may provide sites for hybridization of the universal primers. The one or more universal primers attached to the target nucleic acid may be the same or different from each other.

As used herein, the term "associated" or "associated with" may mean that two or more substances may be identified as co-located at a point in time. Association may mean that two or more substances are or were in similar containers. The association may be an informatics association. For example, digital information about two or more substances may be stored and may be used to determine that one or more substances are co-located at a point in time. The association may also be a physical association. In some embodiments, two or more associated substances are "tethered", "attached" or "immobilized" to each other or to a common solid or semi-solid surface. Association may refer to covalent or non-covalent means for attaching the label to a solid or semi-solid support, such as a bead. The association may be a covalent bond between the target and the label. Association may include hybridization between two molecules, such as a target molecule and a label.

As used herein, the term "complementary" may refer to the ability to precisely pair between two nucleotides. For example, a nucleic acid is considered to be complementary to one another at a given position if the nucleotide at that position is capable of forming hydrogen bonds with the nucleotide of the other nucleic acid. Complementarity between two single-stranded nucleic acid molecules may be "partial" in that only some nucleotides bind, or it may be complete when there is complete complementarity between the single-stranded molecules. A first nucleotide sequence may be referred to as a "complement" of a second sequence if the first nucleotide sequence is complementary to the second nucleotide sequence. A first nucleotide sequence may be referred to as a "reverse complement" of a second sequence if the first nucleotide sequence is complementary to a sequence that is opposite (i.e., opposite in nucleotide order) the second sequence. As used herein, a "complement" sequence may refer to the "complement" or "reverse complement" of a sequence. It is understood from this disclosure that if one molecule can hybridize to another molecule, it can be complementary or partially complementary to the molecule to which it hybridizes.

As used herein, the term "one label" or "more than one label" may refer to a nucleic acid code associated with a target in a sample. The label may be, for example, a nucleic acid label. The label may be a fully or partially amplifiable label. The tag may be a fully or partially sequencable tag. The marker may be part of a natural nucleic acid that can be identified as distinct. The tag may be a known sequence. The marker may include a junction of nucleic acid sequences, such as a junction of natural and non-natural sequences. As used herein, the term "tag" may be used interchangeably with the terms "index," label, "or" tag-label. The indicia may convey information. For example, in various embodiments, a label may be used to determine the identity of the sample, the source of the sample, the identity of the cell, and/or the target.

As used herein, the term "nucleic acid" refers to a polynucleotide sequence or fragment thereof. The nucleic acid may comprise a nucleotide. The nucleic acid may be exogenous or endogenous to the cell. The nucleic acid may be present in a cell-free environment. The nucleic acid may be a gene or a fragment thereof. The nucleic acid may be DNA. The nucleic acid may be RNA. The nucleic acid may include one or more analogs (e.g., altered backbones, sugars, or nucleobases). Some non-limiting examples of analogs include: 5-bromouracil, peptide nucleic acids, heterologous nucleic acids, morpholino nucleic acids, locked nucleic acids, diol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or saccharide linked fluorescein), thiol-containing nucleotides, biotin linked nucleotides, fluorescent base analogs, cpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, braided glycosides, and hua russian glycosides. "nucleic acid", "polynucleotide", "target polynucleotide" and "target nucleic acid" are used interchangeably.

The nucleic acid may include one or more modifications (e.g., base modifications, backbone modifications) to provide the nucleic acid with new or enhanced features (e.g., improved stability). The nucleic acid may comprise a nucleic acid affinity tag. The nucleoside may be a base-sugar combination. The base portion of a nucleoside may be a heterocyclic base. Two of the most common classes of such heterocyclic bases are purine and pyrimidine. The nucleotide may be a nucleoside that also includes a phosphate group covalently linked to the sugar portion of the nucleoside. For those nucleosides that include a pentofuranose, the phosphate group can be attached to the 2', 3', or 5' hydroxyl moiety of the sugar. In forming nucleic acids, phosphate groups can covalently link adjacent nucleosides to one another to form a linear polymeric compound. In turn, each end of this linear polymeric compound may be further linked to form a cyclic compound; however, linear compounds are generally suitable. Furthermore, the linear compounds may have internal nucleotide base complementarity and thus may fold in a manner that results in a full or partial double chain compound. In nucleic acids, phosphate groups can generally be referred to as forming the internucleoside backbone of the nucleic acid. The linkage (linkage) or backbone (backbone) may be a 3 'to 5' phosphodiester linkage.

The nucleic acid may include a modified backbone and/or modified internucleoside linkages. Modified backbones may include those that retain phosphorus atoms in the backbone and those that do not have phosphorus atoms in the backbone. Suitable modified nucleic acid backbones in which phosphorus atoms are present may include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkylphosphonates such as 3' -alkylphosphonate, 5' -alkylphosphonate, chiral phosphonate, phosphonite, phosphoramidate (including 3' -phosphoramidate and phosphoramidate, phosphodiamidate, phosphorothioate), phosphorothioate, phosphoroselenate and borophosphate, analogs with normal 3' -5' linkages, 2' -5' linkages, and analogs with reversed polarity (where one or more internucleotide linkages are 3' to 3', 5' to 5' or 2' to 2' linkages).

The nucleic acid may comprise a polynucleotide backbone formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatoms, and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatom or heterocyclic internucleoside linkages. These may include those having morpholino linkages (formed in part from the sugar portion of the nucleoside); a siloxane backbone; sulfide, sulfoxide, and sulfone backbones; methylacetyl and thiomethylacetyl backbones; methylene methylacetyl and thiomethylacetyl backbones; a ribose acetyl backbone; an olefin-containing backbone; a sulfamate backbone; methylene imino and methylene hydrazino backbones; sulfonate and sulfonamide backbones; an amide backbone; and N, O, S and CH with mixing ₂ Other ones of the component parts.

The nucleic acid may comprise a nucleic acid mimetic. The term "mimetic" may be intended to include polynucleotides in which only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, and the replacement of only the furanose ring may also be referred to as a sugar substitute. The heterocyclic base moiety or modified heterocyclic base moiety can be maintained to hybridize to an appropriate target nucleic acid. One such nucleic acid may be a Peptide Nucleic Acid (PNA). In PNA, the sugar backbone of the polynucleotide may be replaced by an amide containing backbone, in particular by an aminoethylglycine backbone. The nucleotide may be retained and bound directly or indirectly to the nitrogen heteroatom of the amide portion of the backbone. The backbone in the PNA compound may comprise two or more linked aminoethylglycine units, which results in PNA having an amide containing backbone. The heterocyclic base moiety may be directly or indirectly bound to the aza nitrogen atom of the amide moiety of the backbone.

The nucleic acid may include a morpholino backbone structure. For example, the nucleic acid may comprise a 6-membered morpholino ring in place of the ribose ring. In some of these embodiments, a phosphodiamide ester or other non-phosphodiester internucleoside linkage may replace a phosphodiester linkage.

The nucleic acid can include linked morpholino units having a heterocyclic base attached to a morpholino ring (e.g., morpholino nucleic acid). The linking group can be attached to a morpholino monomer unit in the morpholino nucleic acid. Nonionic morpholino-based oligomeric compounds can have fewer undesired interactions with cellular proteins. Morpholino-based polynucleotides may be nonionic mimics of nucleic acids. Various compounds within the morpholino class may be linked using different linking groups. An additional class of polynucleotide mimics may be referred to as cyclohexenyl nucleic acids (CeNA). The furanose ring normally present in a nucleic acid molecule may be replaced by a cyclohexenyl ring. Using phosphoramidite chemistry, ceNA DMT protected phosphoramidite monomers can be prepared and used in oligomeric compound synthesis. Incorporation of CeNA monomers into nucleic acid strands can increase the stability of DNA/RNA hybrids. CeNA oligoadenylates can form complexes with nucleic acid complements, with similar stability as natural complexes. Additional modifications may include Locked Nucleic Acids (LNA) in which the 2 '-hydroxy group is attached to the 4' carbon atom of the sugar ring, thereby forming a 2'-C,4' -C-oxymethylene linkage, thereby forming a bicyclic sugar moiety. The linkage may be methylene (-CH) ₂ (-), a group bridging the 2 'oxygen atom and the 4' carbon atom, wherein n is 1 or 2. LNAs and LNA analogs can exhibit very high duplex thermal stability (tm= +3 ℃ to +10 ℃) with complementary nucleic acids, stability to 3' -exonuclease degradation and good solubility.

Nucleic acids may also includeNucleobase (often simply referred to as "base") modifications or substitutions. As used herein, "unmodified" or "natural" nucleobases can include purine bases (e.g., adenine (a) and guanine (G)), as well as pyrimidine bases (e.g., thymine (T), cytosine (C), and uracil (U)). The modified nucleobases may include other synthetic as well as natural nucleobases such as 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil (5-halouracil) and cytosine, 5-propynyl (-C.ident.C-CH) ₃ ) Uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halogen, 8-amino, 8-thio, 8-thioalkyl, 8-hydroxy and other 8-substituted adenine and guanine, 5-halogen, in particular 5-bromo, 5-trifluoromethyl and other 5-substituted uracil and cytosine, 7-methyl guanine and 7-methyl adenine, 2-F-adenine, 2-amino adenine, 8-aza guanine and 8-aza adenine, 7-deazaguanine and 3-deazaadenine. Modified nucleobases may include tricyclopyrimidines such as phenoxazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzoxazin-2 (3H) -one), phenothiazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzothiazin-2 (3H) -one), G-clamp (G-clamp) such as substituted phenoxazine cytidine (e.g., 9- (2-aminoethoxy) -H-pyrimido (5, 4- (b) (1, 4) benzoxazin-2 (3H) -one), phenothiazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzothiazin-2 (3H) -one), G-clamp (e.g., substituted phenoxazine cytidine (e.g., 9- (2-aminoethoxy) -H-pyrimido (5, 4) (1, 4) benzoxazin-2 (3H) -one), carbazole cytidine (2H-pyrimido (4, 5-b) indolo (3H) -one), phenothiazine-2 (3H-pyrido-2, 4': 2 (3H) -one) ]Pyrimidin-2-one).

As used herein, the term "target" may refer to a nucleic acid of interest (e.g., target dsDNA). In some embodiments, the target may be associated with an adapter and/or a barcode. Exemplary suitable targets for analysis by the disclosed methods, devices, and systems include oligonucleotides, DNA, RNA, mRNA, micrornas, trnas, and the like. The target may be single-stranded or double-stranded. In some embodiments, the target may be a protein, peptide, or polypeptide. In some embodiments, the target is a lipid. As used herein, "target" may be used interchangeably with "species".

As used herein, the term "reverse transcriptase" may refer to a group of enzymes having reverse transcriptase activity (i.e., catalyzing the synthesis of DNA from an RNA template). Typically, such enzymes include, but are not limited to, retrovirus reverse transcriptase, retrotransposon reverse transcriptase, retroplasmid reverse transcriptase, retrotransposon reverse transcriptase, bacterial reverse transcriptase, group II intron-derived reverse transcriptase, and mutants, variants or derivatives thereof. Non-retroviral reverse transcriptases include non-LTR retrotransposon reverse transcriptases, retroplasmid reverse transcriptases, retrotranscriptase and group II intron reverse transcriptases. Examples of group II intron reverse transcriptases include lactococcus lactis (Lactococcus lactis) LI.LtrB intron reverse transcriptase, haematococcus elongatus (Thermosynechococcus elongatus) TeI4c intron reverse transcriptase, or Geobacillus stearothermophilus (Geobacillus stearothermophilus) GsI-IIC intron reverse transcriptase. Other classes of reverse transcriptase may include many types of non-retroviral reverse transcriptase (i.e., in particular, retrons, group II introns, and diversity generating reverse transcription elements).

As used herein, the term "isolated nucleic acid" may refer to the purification of nucleic acid from one or more cellular components. Those skilled in the art will appreciate that a sample that is treated to "isolate nucleic acids" therefrom may include components and impurities other than nucleic acids. The sample comprising the isolated nucleic acid may be prepared from the sample using any acceptable method known in the art. For example, the cells may be lysed using known lysing agents, and the nucleic acids may be purified or partially purified from other cellular components. Suitable reagents and protocols for DNA and RNA extraction can be found, for example, in U.S. patent application publication nos. US2010-0009351 and US 2009-013650, respectively (each of which is incorporated herein by reference in its entirety).

As used herein, a "template" may refer to all or a portion of a polynucleotide comprising at least one target nucleotide sequence.

As used herein, a "primer" may refer to a polynucleotide that may be used to initiate a nucleic acid chain extension reaction. The length of the primer may vary, for example, from about 5 to about 100 nucleotides, from about 10 to about 50 nucleotides, from about 15 to about 40 nucleotides, or from about 20 to about 30 nucleotides. The length of the primer may be about 10 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 50 nucleotides, about 75 nucleotides, about 100 nucleotides, or a range between any two of these values. In some embodiments, the primer has a length of 10 to about 50 nucleotides, i.e., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides. In some embodiments, the primer has a length of 18 to 32 nucleotides.

As used herein, a "probe" may refer to a polynucleotide that is capable of hybridizing (e.g., specifically) to a target sequence in a nucleic acid under conditions that allow hybridization, thereby allowing detection of the target sequence or amplified nucleic acid. "target" of a probe generally refers to a sequence within an amplified nucleic acid sequence or a subset of amplified nucleic acid sequences that specifically hybridizes to at least a portion of a probe oligomer by standard hydrogen bonding (i.e., base pairing). Probes may comprise target-specific sequences and other sequences that contribute to the three-dimensional conformation of the probe. Sequences are "substantially complementary" if they allow stable hybridization of the probe oligomer under appropriate hybridization conditions to a target sequence that is not fully complementary to the target-specific sequence of the probe. The length of the probe may vary, for example, from about 5 to about 100 nucleotides, from about 10 to about 50 nucleotides, from about 15 to about 40 nucleotides, or from about 20 to about 30 nucleotides. The length of the probe may be about 10 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 50 nucleotides, about 100 nucleotides, or a range between any two of these values. In some embodiments, the probe has a length of 10 to about 50 nucleotides. For example, the primer and/or probe may be at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides. In some embodiments, the probe may be non-sequence specific.

Preferably, the primers and/or probes may be between 8 and 45 nucleotides in length. For example, the primer and/or probe may be at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 or more nucleotides in length. Primers and probes may be modified to contain additional nucleotides at the 5 'end or the 3' end or both. Those skilled in the art will appreciate that the additional bases at the 3' end of the amplification primer (not necessarily the probe) are typically complementary to the template sequence. Primer and probe sequences may also be modified to remove nucleotides at the 5 'end or the 3' end. Those skilled in the art will appreciate that in order to function for amplification, the primer or probe will have a minimum length and annealing temperature as disclosed herein.

Primers and probes can be detected below the melting temperature (T _m ) Is combined with their targets. As used herein, "T _m "and" melting temperature "are interchangeable terms referring to a 50% double-stranded multiple A temperature at which the population of nucleotide molecules dissociates into single strands. Calculation of Polynucleotide T _m Is well known in the art. For example, T may be calculated by the following equation _m ：T _m =69.3+0.41× (g+c)% -6-50/L, where L is the length of the probe in nucleotides. T of hybrid Polynucleotide _m Can also be estimated using the formula employed in hybridization assays from 1M salts, and is commonly used to calculate T for PCR primers _m : [ (amount of A+T) ×2deg.C+ (amount of G+C) ×4deg.C)]. See, e.g., C.R.Newton et al PCR, 2 nd edition, springer-Verlag (New York: 1997), page 24 (incorporated herein by reference in its entirety). There are other more complex calculations in the art, which are in calculating T _m The structural and sequence features are considered. The melting temperature of an oligonucleotide may depend on the complementarity between the oligonucleotide primer or probe and the binding sequence, as well as salt conditions. In some embodiments, the oligonucleotide primers or probes provided herein have a T of less than about 90℃in 50mM KCl, 10mM Tris-HCl buffer _m For example, about 89 ℃, 88 ℃, 87 ℃, 86 ℃, 85 ℃, 84 ℃, 83 ℃, 82 ℃, 81 ℃, 80 ℃, 79 ℃, 78 ℃, 77 ℃, 76 ℃, 75 ℃, 74 ℃, 73 ℃, 72 ℃, 71 ℃, 70 ℃, 69 ℃, 68 ℃, 67 ℃, 66 ℃, 65 ℃, 64 ℃, 63 ℃, 62 ℃, 61 ℃, 60 ℃, 59 ℃, 58 ℃, 57 ℃, 56 ℃, 55 ℃, 54 ℃, 53 ℃, 52 ℃, 50 ℃, 49 ℃, 48 ℃, 47 ℃, 46 ℃, 45 ℃, 44 ℃, 43 ℃, 42 ℃, 41 ℃, 40 ℃, 39 ℃ or less, including ranges between any two of the listed values.

In some embodiments, the primers disclosed herein, e.g., amplification primers, can be provided as an amplification primer pair, e.g., comprising a forward primer and a reverse primer (a first amplification primer and a second amplification primer). Preferably, the forward and reverse primers have T's that differ by no more than 10 ℃, e.g., by less than 10 ℃, less than 9 ℃, less than 8 ℃, less than 7 ℃, less than 6 ℃, less than 5 ℃, less than 4 ℃, less than 3 ℃, less than 2 ℃, or less than 1 ℃ _m 。

The primer sequence and the probe sequence can be modified by nucleotide substitutions (relative to the target sequence) within the oligonucleotide sequence, provided that the oligonucleotide comprises sufficient complementarity to specifically hybridize to the target nucleic acid sequence. In this way, at least 1, 2, 3, 4 or up to about 5 nucleotides may be substituted. As used herein, the term "complementary" may refer to sequence complementarity between regions of two polynucleotide strands or between two regions of the same polynucleotide strand. If at least one nucleotide of a first region of a polynucleotide is capable of base pairing with a base of a second region when the first region is aligned in an antiparallel manner with a second region of the same or a different polynucleotide, the two regions are complementary. Thus, two complementary polynucleotides are not required to base pair at each nucleotide position. "fully complementary" may refer to a first polynucleotide being 100% or "fully" complementary to a second polynucleotide and thus forming base pairs at each nucleotide position. "partially complementary" may also refer to a first polynucleotide that is not 100% complementary (e.g., 90%, 80%, or 70% complementary) and contains mismatched nucleotides at one or more nucleotide positions. In some embodiments, the oligonucleotide comprises a universal base.

As used herein, the term "substantially complementary" may refer to a continuous nucleic acid base sequence capable of hybridizing to another base sequence through hydrogen bonding between a series of complementary bases. The complementary base sequences may be complementary at each position in the oligomer sequence using standard base pairing (e.g., G: C, A: T or A: U), or may contain one or more non-complementary residues (including no base positions), but wherein the entire complementary base sequence is capable of specifically hybridizing to another base sequence under appropriate hybridization conditions. The contiguous bases may be at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100% complementary to the sequence to which the oligomer is intended to hybridize. A substantially complementary sequence may refer to a sequence having a percent identity in the range of 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 75, 70 or less, or any number therebetween, as compared to a reference sequence. One skilled in the art can readily select appropriate hybridization conditions, which can be predicted based on base sequence composition, or determined by using routine testing (see, e.g., green and Sambrook, molecular Cloning, A Laboratory Manual, 4 th edition (Cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y., 2012)).

As used herein, the term "multiplex PCR" refers to a type of PCR in which more than one set of primers are contained in a reaction, allowing for amplification of a single target or two or more different targets in a single reaction vessel (e.g., tube). Multiplex PCR can be, for example, real-time PCR.

The disclosure herein includes methods, compositions, kits, and systems that enable rapid targeted sequencing (and thus rapid sequencing-based diagnostics, e.g., less than 2 hours) and therapeutic diagnostics that require simultaneous diagnosis and determination of suitable therapeutic methods. In some embodiments, the application of the rapid targeted sequencing method may include: rapid pathogen diagnosis, rapid cancer diagnosis, rare disease diagnosis (e.g., for cystic fibrosis).

The disclosure herein includes methods of using genome editing tools (e.g., cas proteins, zinc Finger Nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and Argonaute proteins) to direct enzymes (e.g., transposases) to cleave nucleic acids at user-defined loci, thereby preparing custom locus-specific libraries for DNA and RNA sequencing. Enzymes (e.g., transposases) can be added to these site adaptors for sequencing, such as by next or third generation sequencing techniques (including, but not limited to, illumina, pacBio, roche, thermo Fisher, and Oxford Nanopore sequencing techniques).

Conventional library preparation methods for nucleic acid sequencing may take several hours to make, and the process produces a randomly generated library (fig. 1). The reason these libraries are random is that the output of DNA sequencing cannot be controlled because the methods used for nucleic acid fragmentation (including physical, enzymatic and chemical fragmentation methods) are fragmented in a random manner.

When there is interest in studying specific loci in the genome, millions of bases must be sequenced in hopes that sufficient sequence information will be available at these loci. After all of these data are obtained, bioinformatics methods must be used to extract information about the loci of interest. This process can be bioinformatic and computationally intensive, as most DNA that has been prepared and sequenced is not related to these regions of interest. Furthermore, there is a risk that these region information (coverage) is insufficient due to the randomness of the library preparation process. In this case, another library must be prepared and sequenced again in order to obtain adequate coverage of these areas, which wastes time and resources.

The rapid targeted library preparation method disclosed herein for sequencing using the custom locus specific library preparation (CLLP) method is a rapid process that takes only a few minutes to prepare, rather than the few hours required to prepare a library using conventional library preparation methods. In addition, libraries made by the CLLP methods disclosed herein are not random. In some embodiments, only selected loci are sequenced, while everything else is negligible, which provides cost-effectiveness, time and resource savings, and accuracy. Furthermore, by sequencing only the region of interest, the required bioinformatics resources and analysis will be minimal compared to standard methods. The custom locus specific library preparation (CLLP) methods disclosed herein enable DNA sequencing to be used as a rapid and affordable method for diagnostic and/or therapeutic diagnostics.

In some embodiments of CLLP, genome editing tools and transposases (e.g., superactive transposases) are used to achieve targeted fragmentation. Any genome editing tool that can make a user-defined double strand break in DNA or single strand break in RNA can be used. These means include, but are not limited to, CAS protein, ZFN, TALEN, argonaute protein, or any combination thereof. In some embodiments, genome editing tools are used to control and direct fragmentation of nucleic acids to specific regions of the genome that can be precisely selected. The cleavage by the genome editing tool can be used as a start site for a sequencing adapter. This in turn will severely bias the genomic region to be sequenced. The programmable fragmentation process disclosed herein can result in targeted sequencing. In addition, the method may be used with any sequencing technique, including but not limited to Illumina, pacBio, oxford Nanopore, roche and Thermo Fisher sequencing techniques. Fig. 5A-5F depict non-limiting exemplary embodiments of the custom locus specific library preparation (CLLP) disclosed herein.

In some embodiments, enzymatic fragmentation involves preparing a library for DNA sequencing that utilizes a superactive transposase. Enzymatic fragmentation double-stranded DNA was cleaved using transposons and DNA adaptors were attached at the cleavage sites (fig. 2). Cleavage fragmentation is a very rapid process that prepares the library in a relatively short period of time compared to standard library preparation methods. However, transposons cleave the genome in a random unbiased manner.

In some embodiments, to increase the speed of library preparation, the methods disclosed herein use a transposon that is linked to a genome editing tool. For example, the dCAS9 protein can be used as a genome editing tool. The dCAS9 protein can bind to guide RNAs that are programmable to specific regions of the genome. dCAS9 is a CAS9 protein that is mutated such that the nuclease activity of the CAS9 protein is lost, but retains target specificity. After dCAS9 binds its target, the transposase attached to CAS9 protein will cleave the DNA and attach to the cleavage site adapter for sequencing. The end result is a targeted DNA fragment ready for sequencing, shortening the targeted library preparation process to a few minutes instead of a few hours (fig. 3).

Non-limiting advantages that can be achieved by the methods, compositions, kits, and systems of the present disclosure include: faster acquisition time; using fewer laboratory, bioinformatics and computational resources than the prior art; allowing rapid detection and quantification of rare and low frequency variants; more samples than whole genome sequencing can be analyzed; can be used as a rapid diagnostic tool, capable of detecting more than one customizable number of targets simultaneously; simpler and clearer data analysis; and any combination thereof.

Currently, two methods have been used for targeted sequencing. The first is amplicon sequencing. This method relies on the use of primers to amplify a region of interest by DNA amplification. This additional amplification step further increases the cost, time and resources of standard library preparation methods. The second targeted sequencing method is target capture. Such methods rely on the use of probes or pools of probes so that they can hybridize to a particular nucleic acid target. Hybridization of probes to their targets and separation of these targets is a time consuming process, which may take days. In addition, the probes used in this method are expensive to synthesize.

In some embodiments, a superactive transposase Tn5 linked to a dCAS9 protein can be used. dCAS9 is a catalytically dead form of the CAS9 protein, which is mutated such that the nuclease activity of the CAS9 protein is lost, but which retains programmable DNA binding activity. The N-terminus of the dCAS9 protein is attached to the C-terminus of the Tn5 transposase by a linker (e.g., X-TEN), SNAP-tag or CLIP-tag. Although many different methods may be employed to attach the two proteins. TN5 transposase will be loaded with sequencing adaptors specific to the sequencing technology platform. dCas9 protein will be attached to guide RNAs (sgrnas) specific for user-defined loci. More than one sgRNA binds to the dCAS9 protein separately, each targeting a different locus for selection of more than one locus.

After dCAS9 attached to the sgRNA finds a molecule complementary to the sgRNA sequence, the attached Tn5 transposase can cleave the DNA at the designated site and attach to the cleavage site adapter for sequencing. The end result is a targeted DNA fragment ready for sequencing, shortening the targeted library preparation process to a few minutes instead of a few hours (fig. 3).

In some embodiments, the superactive transposase that is not a Tn5 transposase may be a mariner Tc 1-like transposon, a Himar1C9 transposase, a sleeping beauty transposase, a Tn7 transposon, or a combination thereof. In some embodiments, alternatives to dCas9 protein can be used for programmable DNA binding activity. For example, zinc fingers that do not bind to FOK1 nucleases can be used. Similarly, TALEN molecules without FOK1 nuclease can be used. In some embodiments, the use of a recombinase in combination with sequence-specific primers can be used as the programmable DNA binding molecule. In some embodiments, alternative methods of preparing a locus-specific library can be accomplished by using only genome editing tools (e.g., cas protein, zinc Finger Nuclease (ZFN), transcription activator-like effector nuclease (TALEN), argonaute protein) without the aid of a transposase. This will result in a programmable nucleic acid fragmentation method (FIG. 4) that can be further used to prepare a locus specific sequencing library.

Disclosed herein are the use of genome editing tools as programmable tools to target specific regions of the genome, and the use of transposases to cut and paste the adaptors required to create a sequencing library.

Some embodiments provide a disease group (e.g., sepsis group) configured to identify a pathogen/disease cause (genetic mutation) and simultaneously identify susceptibility to an antibiotic. In some embodiments, a cancer panel may include the identification of more than one mutation in a cancer cell. In some embodiments, the rare disease group may include sequencing of a particular locus associated with a mutation that may lead to a genetic disease (e.g., cystic fibrosis).

Each of the following patent application publications is hereby incorporated by reference in its entirety: WO2016028843A2 and WO2018175872A1, US20190144920A1 and CA3026206A1.

The disclosure herein includes compositions. In some embodiments, the composition comprises: more than one protein complex. In some embodiments, each of the more than one protein complexes comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target dsDNA. In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. In some embodiments, the binding sites of each of the more than one protein complexes are different from each other.

The disclosure herein includes reaction mixtures. In some embodiments, the reaction mixture comprises: the compositions disclosed herein and sample nucleic acids suspected of comprising one or more target dsDNA. The reaction mixture may comprise: DNA polymerase, dntps, or a combination thereof. The adapter may be covalently attached to the target dsDNA or fragment thereof. The reaction mixture may comprise: more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of more than one protein complex at each end, respectively.

The disclosure herein includes methods for generating sequencing libraries. In some embodiments, the method comprises: the compositions disclosed herein are contacted with a sample suspected of containing more than one target dsDNA to form a reaction mixture. The method may include: the reaction mixture is incubated to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively. The contacting of more than one target dsDNA with more than one protein complex pair may be performed at about 25 ℃ to about 85 ℃ (e.g., about 25 ℃, 26 ℃, 27 ℃, 28 ℃, 29 ℃, 30 ℃, 31 ℃, 32 ℃, 33 ℃, 34 ℃, 35 ℃, 36 ℃, 37 ℃, 38 ℃, 39 ℃, 40 ℃, 41 ℃, 42 ℃, 45 ℃, 50 ℃, 55 ℃, 60 ℃, 65 ℃, 70 ℃, 75 ℃, 80 ℃, 85 ℃, or numbers or ranges between any two of these values). Incubating the reaction mixture may include incubating the reaction mixture at about 37 ℃ to about 55 ℃ (e.g., about 37 ℃, 38 ℃, 39 ℃, 40 ℃, 41 ℃, 42 ℃, 43 ℃, 44 ℃, 45 ℃, 46 ℃, 47 ℃, 48 ℃, 49 ℃, 50 ℃, 51 ℃, 52 ℃, 53 ℃, 54 ℃, 55 ℃, or a value or range between any two of these values).

More than one protein complex pair and more than one target dsDNA may be present in a molecular ratio of about 2:1 to about 2000:1 (e.g., 2:1, 2.5:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 21:1, 22:1, 23:1, 24:1, 25:1, 26:1, 27:1, 28:1, 29:1, 30:1, 31:1, 32:1, 33:1, 34:1, 35:1, 36:1, 37:1, 38:1, 39:1, 40:1, 41:1, 42:1, 43:1, 44:1, 45:1, 46:1, 47:1, 48:1, 49:1, 50:1, 51:1, 52:1, 53:1, 54:1, 55:1, 56:1 }. 57:1, 58:1, 59:1, 60:1, 61:1, 62:1, 63:1, 64:1, 65:1, 66:1, 67:1, 68:1, 69:1, 70:1, 71:1, 72:1, 73:1, 74:1, 75:1, 76:1, 77:1, 78:1, 79:1, 80:1, 81:1, 82:1, 83:1, 84:1, 85:1, 86:1, 87:1, 88:1, 89:1, 90:1, 91:1, 92:1, 93:1, 94:1, 95:1, 96:1, 97:1, 98:1, 99:1, 100:1, 200:1, 300:1, 400:1, 500:1, 600:1, 700:1, 800:1, 900:1, 1000:1, 2000:1, or a number or range between any two of these values) is present in the reaction mixture. In some embodiments, more than one protein complex pair and more than one target dsDNA are present in a molecular ratio of about 2:1 to about 200:1 (e.g., 2:1, 2.5:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 21:1, 22:1, 23:1, 24:1, 25:1, 26:1, 27:1, 28:1, 29:1, 30:1, 31:1, 32:1, 33:1, 34:1, 35:1, 36:1, 37:1, 38:1, 39:1, 40:1, 41:1, 42:1, 43:1, 44:1, 45:1, 46:1, 47:1, 48:1, 49:1, 50:1, 51:1, 35:1, 38:1, 47:1, 39:1, 40:1, 42:1, and 42:1 52:1, 53:1, 54:1, 55:1, 56:1, 57:1, 58:1, 59:1, 60:1, 61:1, 62:1, 63:1, 64:1, 65:1, 66:1, 67:1, 68:1, 69:1, 70:1, 71:1, 72:1, 73:1, 74:1, 75:1, 76:1, 77:1, 78:1, 79:1, 80:1, 81:1, 82:1, 83:1, 84:1, 85:1, 86:1, 87:1, 88:1, 89:1, 90:1, 91:1, 92:1, 93:1, 94:1, 95:1, 96:1, 97:1, 98:1, 99:1, 100:1, 200:1, or a number or range between any two of these values) is present in the reaction mixture.

The binding sites of at least two of the more than one protein complexes may be on the same target dsDNA. Binding sites for at least two of more than one protein complex may be about 1-50000 nucleotides apart on the same target dsDNA. In some embodiments, the binding sites of at least two of the more than one protein complex may be or may be about the following nucleotides apart on the same target dsDNA: 1. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, or a quantity or range between any two of these values. In some embodiments, the binding sites of at least two of the more than one protein complex may be at least or at most the following nucleotides apart on the same target dsDNA: 1. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 10000, 8000, 20000, 200000, 90000, 100000, 70000, 100000, or 70000. The distance between the binding sites of one pair of more than one protein complex may be substantially the same as the distance between the binding sites of another pair of more than one protein complex. The distance between the binding sites of one pair of more than one protein complex may be different from the distance between the binding sites of another pair of more than one protein complex. The binding sites of at least two of the more than one protein complexes may be located on different strands of the target dsDNA. At least two of the more than one protein complexes are capable of specifically binding to different target dsDNA. More than one protein complex is capable of specifically binding between about 2 and 5000 targets dsDNA. In some embodiments of the present invention, in some embodiments, more than one protein complex is capable of specifically binding about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 128, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000 or a number or range of target dsDNA between any two of these values.

Swivel base

In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. At least two of the more than one protein complexes may comprise the same transposomes. All of the more than one protein complexes may comprise the same transposomes. All of the more than one protein complexes may comprise the same transposase. The transposase may be a Tn5 transposase, tn7 transposase, mariner Tc 1-like transposase, himar1C9 transposase or sleeping beauty transposase. The transposase may be a superactive transposase.

The transposase may be Tn5, tn7, muA or Vibrio harveyi (Vibrio harveyi) transposase or an active mutant thereof. In some embodiments, the transposase is a Tn5 transposase or a mutant thereof. In some embodiments, the Tn5 transposase is a superactive Tn5 transposase, or an active mutant thereof. In some embodiments, the Tn5 transposase is a Tn5 transposase as described in WO2015/160895, which is incorporated herein by reference. In some embodiments, the Tn5 transposase is a superactive Tn5 having a mutation at positions 54, 56, 372, 212, 214, 251 and 338 relative to the wild type Tn5 transposase. In some embodiments, the Tn5 transposase is a superactive Tn5 having the following mutations relative to the wild type Tn5 transposase: E54K, M56A, L372P, K212R, P214R, G251R and A338V. In some embodiments, the Tn5 transposase is a fusion protein. In some embodiments, the Tn5 transposase fusion protein comprises a fused elongation factor Ts (Tsf) tag. In some embodiments, the Tn5 transposase is a mutant superactive Tn5 transposase comprising at amino acids 54, 56 and 372 relative to the wild type sequence. In some embodiments, the superactive Tn5 transposase is a fusion protein. In some embodiments, the recognition site is a Tn5 transposase recognition site.

The transposase may comprise a single protein or comprise more than one protein subunit. The transposase may be an enzyme capable of forming a functional complex with a transposon end or a transposon end sequence. In some embodiments, the transposase complex comprises a transposase (e.g., tn5 transposase) dimer comprising first and second monomers. In some embodiments, the transposome complex comprises a dimer of two molecules of a transposase.

Transposases and/or transposomes may vary depending on the embodiment. The transposase may comprise a Tn5 transposase. THE transposase may be a Tn transposase (e.g., tn3, tn5, tn7, tn10, tn552, tn 903), a MuA transposase, a Vibhar transposase (e.g., from Vibrio harveyi), ac-Ds, ascot-1, bs1, cin4, copia, en/Spm, F elements, hobo, hsmar1, hsmar2, IN (HIV), IS1, IS2, IS3, IS4, IS5, IS6, IS10, IS21, IS30, IS50, IS51, IS150, IS256, IS407, IS427, IS630, IS903, IS911, IS982, IS1031, ISL2, L1, mariner, P elements, tam3, tc1, tc3, tel, THE-1, tn/O, tnA, tn3, tn5, tn7, tn10, tn552, tol1, tn 2, tn1, ty1, or any of THE other transposases listed or a transposase derived from any of THE organisms thereof. In some embodiments, a transposase associated with and/or derived from a parent transposase may comprise a corresponding peptide fragment of the parent transposase Peptide fragments having at least about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99% amino acid sequence homology. The peptide fragment may be at least about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 60, about 70, about 80, about 90, about 100, about 150, about 200, about 250, about 300, about 400, or about 500 amino acids in length. For example, a Tn5 derived transposase may comprise a peptide fragment 50 amino acids in length and about 80% homologous to the corresponding fragment in the parent Tn5 transposase. In some cases, insertion may be facilitated and/or triggered by the addition of one or more cations. The cation may be a divalent cation, such as Ca ²⁺ 、Mg ²⁺ And Mn of ²⁺ 。

Adapter

The first adaptor and the second adaptor in the same transposome may be the same. The first adapter, the second adapter, or both in different transposomes may be different. The first adaptor, the second adaptor, or both may be dsDNA or an RNA/DNA duplex. The length of the adapter can be about 3-200 base pairs (e.g., about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200 or a number or range of nucleotides between any two of these values). In some embodiments, the length of the adapter can be 3-500 base pairs (e.g., a number or range of nucleotides between about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 300, 400, 500, or any two of these values in length). The first adapter, the second adapter, or both may be sequencing adapters. The sequencing adaptors may comprise one or more components employed in a given sequencing scheme, such as sequencing platform adaptor constructs, indexing domains, clustering domains (clustering domains), and the like. The sequencing adapter may comprise a P5 or P7 primer sequence. In some embodiments, the first adapter and/or the second adapter comprises a barcode (e.g., a random barcode). In some embodiments, the first adapter and/or the second adapter comprises a universal sequence. In some embodiments, the first adaptor and/or the second adaptor comprises a single stranded portion and/or a double stranded portion. In some embodiments, the adapter comprises a transposon end sequence that binds to a transposase. The transposon end sequences may be double stranded. In some embodiments, the transposon end sequence is a Mosaic End (ME) sequence. In particular embodiments, the transposon end is a mosaic end, or a superactive form of a transposon end. The adaptor sequence may be attached to one of the two transposon end sequences. Thus, in some embodiments, the first adaptor transposon end sequence is an ME sequence and the second adaptor end sequence is an ME' sequence.

The first adapter and/or the second adapter may comprise one or more nucleotides (or analogs thereof) that are modified or otherwise non-naturally occurring. For example, the first adapter and/or the second adapter can include one or more nucleotide analogs (e.g., LNA, FANA, 2'-O-Me RNA, 2' -fluoro RNA, etc.), linkage modifications (e.g., phosphorothioate, 3'-3', and 5'-5' reverse linkages), 5 'and/or 3' terminal modifications (e.g., 5 'and/or 3' amino, biotin, DIG, phosphate, thiol, dye, quencher, etc.), one or more fluorescently labeled nucleotides, or any other feature that provides a desired function.

The first adapter and/or the second adapter may comprise all or a component of a sequencing platform adapter construct. "sequencing platform adapter construct" refers to a nucleic acid construct that includes at least a portion of a nucleic acid domain used by a sequencing platform of interest (e.g., a sequencing platform adapter nucleic acid sequence), e.g., bySequencing platforms provided (e.g., hiSeq ^TM 、MiSeq ^TM And/or Genome Analyzer ^TM A sequencing system); ion Torrent ^TM (e.g., ion PGM) ^TM And/or Ion Proton ^TM A sequencing system); pacific Biosciences (e.g., PACBIO RS II sequencing System); life Technologies ^TM Company (e.g., SOLiD sequencing system); roche (e.g., 454GS FLX+ and/or GS Junior sequencing systems); or any other sequencing platform of interest. The first adaptor and/or the second adaptor may comprise one or more nucleic acid domains selected from the group consisting of: specific binding surface attached sequencing platform oligonucleotides (e.g., attached to +.>A domain (e.g., a "capture site" or "capture sequence") of a P5 or P7 oligonucleotide on the surface of a flow cell in a sequencing system; sequencing primer binding domains (e.g., +.>A domain to which a read 1 or read 2 primer of the platform can bind); a barcode domain (e.g., a domain that uniquely identifies the sample source of a nucleic acid being sequenced by labeling each molecule from a given sample with a specific barcode or "tag" to effect sample multiplexing); barcode sequencing primer binding domain (the domain of primer binding for barcode sequencing); a molecular identification domain (e.g., a molecular index tag, such as a randomized tag of 4, 6, or other number of nucleotides) for uniquely labeling a molecule of interest to determine expression levels based on the number of instances that the unique tag is sequenced; or any combination of such domains. In some embodiments, the barcode domain (e.g., sample index tag) and the molecular identification domain (e.g., molecular index tag) may be contained in the same nucleic acid.

When present in the first adapter and/or the second adapter, the sequencing platform adapter domain may include one or more nucleic acid domains of any length and sequence suitable for the sequencing platform of interest. The nucleic acid domain may have a polynucleotide (e.g., oligonucleotide) that allows the sequencing platform of interest to employNucleotide) is capable of specifically binding to the length and sequence of a nucleic acid domain, for example for solid phase amplification and/or sequencing by synthesis of a cDNA insert flanked by nucleic acid domains. Exemplary nucleic acid domains are included on the basis ofP5, P7, read 1 primer and read 2 primer domains used on the sequencing platform of (c). Other example nucleic acid domains are included in Ion-based Torrent ^TM A-adaptor and P1-adaptor domains employed on the sequencing platform of (c).

The nucleotide sequence of the nucleic acid domains that can be used for sequencing on the sequencing platform of interest can change and/or vary over time. The adaptor sequences are typically provided by the manufacturer of the sequencing platform (e.g., in technical documentation provided by the sequencing system and/or available on the manufacturer's website). Based on this information, the sequence of the adaptors provided herein can be designed to include all or a portion of one or more nucleic acid domains configured to sequence the target dsDNA on the platform of interest.

The first adaptor and/or the second adaptor may comprise Ion PGM ^TM Sequencing platforms (e.g., ion PGM ^TM And/or Ion Proton ^TM Sequencing system). The first adapter and/or the second adapter may comprise a P1 adapter, an A adapter, an Ion Xpress ^TM Barcode adaptors, ion P1 adaptors, and/or Ion Xpress ^TM Barcode X adapter.

The first adapter and/or the second adapter may comprise a hairpin. The first adapter and/or the second adapter may be configured to generate SMRT bell ^TM Technical library. The methods provided herein can result in ligation of hairpin adaptors to the ends of the double-stranded fragments to produce a circular template molecule having a central double-stranded portion and a single-stranded hairpin loop of the ends (see from PacificSMRTbell of (a) ^TM ). For example in U.S. Pat. No. 8,003,330 entitled "Error-free amplification of DNA for clonal sequencingPreparation and use are described in US2009/0280538 entitled "Methods and compositions for nucleic acid sample preparation", for exampleThe method of annular template of the template, the entire disclosure of which is hereby incorporated by reference for all purposes.

The first adapter and/or the second adapter may be configured for downstream use of the tagged nucleic acid on an ONT instrument (e.g., smidgION, minION, gridION, promethION). Fig. 6 depicts a non-limiting exemplary embodiment of a rapid sequencing kit based on enzymatic cleavage fragmentation showing ONT rapid sequencing kit. The first adapter and/or the second adapter may comprise (i) a spacer; (ii) A motor protein that is arrested on the spacer, wherein an active site of the motor protein is occupied by the spacer; and/or (iii) a blocking moiety bound to the adapter, wherein the blocking moiety prevents the motor protein from exiting the spacer. The first adapter and/or the second adapter may comprise hairpin loop adapters. Hairpin loop adaptors may be adaptors comprising a single polynucleotide strand, wherein the ends of the polynucleotide strands are capable of hybridizing to each other, or are hybridized to each other, and wherein the middle portion of the polynucleotide forms a loop. Suitable hairpin loop adaptors can be designed using methods known in the art. The first adapter and/or the second adapter may comprise a linear adapter. The first adapter and/or the second adapter may be a Y adapter. Y adaptors are typically polynucleotide adaptors. The Y adapter is typically double-stranded and includes (a) a region where the two strands hybridize together at one end, and (b) a region where the two strands are not complementary at the other end. The non-complementary portions of the strand typically form a protruding portion. The presence of non-complementary regions in the Y-adaptor gives the adaptor a Y-shape, since, unlike the double stranded portion, the two strands will not typically hybridize to each other. The two single stranded portions of the Y adapter may be of the same length or may be of different lengths. The motor protein may bind to a protruding portion of an adapter, such as a Y adapter. In some embodiments, the motor protein may bind to a double stranded region. In some embodiments, the motor protein may bind to single-and/or double-stranded regions of the adapter. In some embodiments, a first motor protein may bind to a single-stranded region of such an adapter, and a second motor protein may bind to a double-stranded region of the adapter. The first adapter and/or the second adapter may comprise additional binding components that facilitate the nanopore sequencing reaction, such as binding enzymes (e.g., helicases, polymerases, or other motor proteins), membrane-binding moieties (e.g., cholesterol), and the like. Typically, the motor protein is a helicase, a polymerase, an exonuclease, a topoisomerase, or a variant thereof. In some embodiments, the motor protein on the spacer of the polynucleotide adapter is modified to prevent the motor protein from being detached from the spacer (except by removing the ends of the spacer). The motor protein may be modulated in any suitable manner. Fig. 7A-7H depict non-limiting exemplary embodiments of genome editing enzyme digestion fragmentation (GET) for generating sequencing libraries for existing sequencing platforms (e.g., sequencing platforms from Oxford Nanopore).

The adaptors provided herein (e.g., first adaptors and/or second adaptors) can include barcodes, such as random barcodes, and can include one or more markers. Barcoding, such as random barcoding, has been described in, for example, fu et al, proc Natl Acad Sci u.s.a., 201mmay 31,108 (22): 9026-31; US2011/0160078; fan et al, science,2015,347 (6222): 1258367; US2015/0299784 and WO 2015/031691; the content of each of these, including any supporting or supplemental information or material, is incorporated herein by reference in its entirety. In some embodiments, the barcodes disclosed herein may be random barcodes, which may be polynucleotide sequences that may be used to randomly label (e.g., barcoded, tagged) a target. If the ratio of the number of different barcode sequences of the random barcode to the number of occurrences of any target to be labeled can be or can be about the following: a bar code may be referred to as a random bar code if it is a number or range between 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, or any two of these values. The target may be an mRNA species comprising mRNA molecules having the same or nearly the same sequence. If the ratio of the number of different barcode sequences of the random barcode to the number of occurrences of any target to be labeled is at least or at most: 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, or 100:1, then the bar code may be referred to as a random bar code. The barcode sequence of a random barcode may be referred to as a molecular marker.

The adapter and/or barcode may include one or more universal labels. In some embodiments, one or more universal labels may be the same for all barcodes and/or adaptors. In some embodiments, the universal label may include a nucleic acid sequence capable of hybridizing to a sequencing primer. Sequencing primers can be used to sequence barcodes comprising universal labels. Sequencing primers (e.g., universal sequencing primers) can include sequencing primers associated with a high throughput sequencing platform. In some embodiments, the universal label may comprise a nucleic acid sequence capable of hybridizing to a PCR primer. In some embodiments, the universal label may include a nucleic acid sequence capable of hybridizing to a sequencing primer and a PCR primer. A universally tagged nucleic acid sequence capable of hybridizing to a sequencing primer or PCR primer may be referred to as a primer binding site. A universal tag may include sequences that can be used to initiate transcription of a barcode. The universal label may include a sequence that may be used to extend the barcode or a region within the barcode. The length of the universal mark may be the following or may be about the following: 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 nucleotides or a number or range of nucleotides between any two of these values. For example, a universal label may comprise at least about 10 nucleotides. The length of the universal mark may be at least or may be at most: 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 200 or 300 nucleotides.

The bar code (e.g., a random bar code) may include one or more indicia. Exemplary labels may include universal labels, cellular labels, barcode sequences (e.g., molecular labels), sample labels, plate labels, spatial labels, and/or pre-spatial labels (pre-spatial labels). The bar code may comprise a universal label, a dimensional label, a spatial label, a cellular label, and/or a molecular label. The order of the different labels in the bar code (including but not limited to universal labels, dimensional labels, spatial labels, cellular labels, and molecular labels) may vary. For example, the universal label may be a 5 'most label and the molecular label may be a 3' most label. The spatial marker, the dimensional marker and the cell marker may be in any order. In some embodiments, the universal label, the spatial label, the dimensional label, the cellular label, and the molecular label are in any order. In some embodiments, the labels (e.g., universal labels, dimensional labels, spatial labels, cellular labels, and barcode sequences) of the barcode may be separated by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more nucleotides.

A marker (e.g., a cell marker) may comprise a unique set of nucleic acid subsequences of defined length, e.g., seven nucleotides each (corresponding to the number of bits used in some hamming error correction codes), which may be designed to provide error correction capability. A set of error-correcting sequences comprising seven nucleotide sequences may be designed such that any pairwise combination of sequences in the set exhibits a defined "genetic distance" (or number of mismatched bases), e.g., a set of error-correcting sequences may be designed to exhibit a genetic distance of three nucleotides. In this case, the review of the error correction sequences in the sequence data set of the labeled target nucleic acid molecule (described in more detail below) may allow one to detect or correct amplification errors or sequencing errors. In some embodiments, the nucleic acid subsequences used to generate the error-correction code may vary in length, e.g., they may be or may be about the following: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 31, 40, 50 nucleotides or a number or range of nucleotides between any two of these values. In some embodiments, other lengths of nucleic acid subsequences may be used to generate error correction codes.

CRISPR related proteins

The programmable DNA binding unit can comprise a nuclease-deficient CRISPR-associated protein (dCAS protein) and a guide RNA (gRNA) capable of specifically binding to a binding site of a target dsDNA. The dAS protein may be dAS 9, dAS 12, dAS 13, dAS 14 or SpRY dAS. The dAS 13 protein may be dAS 13a, dAS 13b, dAS 13c or dAS 13d.

In some embodiments, the Cas9 protein has an inactive (e.g., inactive) DNA cleavage domain. Nuclease-inactivated Cas9 protein may be interchangeably referred to as "dCas9" protein (Cas 9 representing nuclease death). Methods for producing Cas9 proteins (or fragments thereof) with inactive DNA cleavage domains are known (see, e.g., jink et al, science.337:816-821 (2012); qi et al, (2013) cell.28;152 (5): 1173-83, each of which is incorporated herein by reference in its entirety). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, namely an HNH nuclease subdomain and a RuvC1 subdomain. HNH subdomains cleave the strand complementary to gRNA, while RuvCl subdomains cleave the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas 9. For example, mutations D10A and H840A completely inactivate the nuclease activity of Streptococcus pyogenes (S.pyogens) Cas9 (Jinek et al and Qi et al).

The programmable DNA-binding unit can comprise a suitable nuclease-deficient Cas protein, which can still bind the guide RNA. The programmable DNA-binding unit can comprise a type 2 type II Cas protein. The class 2 type II Cas protein may be a mutated Cas protein compared to the wild-type counterpart. The mutated Cas protein may be nuclease-deficient. The mutated Cas protein may be a mutated Cas9. The mutated Cas9 may be Cas9D10A. Other examples of mutations in Cas9 include H820A, D839A, H840A, N863A or any combination thereof, e.g., D10A/H820A, D10A, D a/D839A/H840A and D10A/D839A/H840A/N863A. The mutations described herein refer to SpCas9 and also include similar mutations in CRISPR proteins other than SpCas 9. The programmable DNA-binding units can include streptococcus pyogenes Cas9 (SpCas 9), staphylococcus aureus (Staphylococcus aureus) Cas9 (SaCas 9), cas1B, cas2, cas3, cas4, cas5, cas6, cas7, cas8, cas9, cas100, csy1, csy2, csy3, cse1, cse2, csc1, csc2, csa5, csn2, csm3, csm4, csm5, csm6, cmr1, cmr3, cmr4, cmr5, cmr6, csb1, csb2, csb3, csx17, csx14, csx10, csx16, csaX, csx3, csx1, csx15, csf1, csf2, csf3, csf4, cpf1, C2C1, C3, 12a, cas12b, cas12C, cas12d, cas13b, cas13C, or any combination thereof. Cas9 molecules of a variety of species may be used in the methods and compositions described herein. Although streptococcus pyogenes and staphylococcus aureus Cas9 molecules are the subject of much of the disclosure herein, cas9 molecules derived from or based on Cas9 proteins of other species listed herein may also be used. These include, for example, cas9 molecules from the following: watermelon acidophilus (Acidovorax avenae), actinobacillus pleuropneumoniae (Actinobacillus pleuropneumoniae), actinobacillus succinogenes (Actinobacillus succinogenes), actinobacillus suis (Actinobacillus suis), actinomyces species (Actinomyces sp.), denitrifying Bacillus circulans (cycliphilus denitrificans), aminomonas baumannii (Aminomonas paucivorans), bacillus cereus (Bacillus cereus), bacillus smithii (Bacillus smithii), bacillus thuringiensis (Bacillus thuringiensis), bacteroides species (Bacillus cereus sp.), blastopirellula marina, rhizobium species (Bradyrhizobium sp.), brevibacterium laterosporus (Brevibacillus laterosporus), campylobacter coli (Campylobacter coli), campylobacter jejuni (Campylobacter coli), gull campylobacter (Campylobacter coli), campylobacter coli, clostridium cellulolyticum (Campylobacter coli), clostridium perfringens (Campylobacter coli), corynebacterium crowded (Campylobacter coli), corynebacterium diphtheriae (Campylobacter coli), corynebacterium equi (Campylobacter coli), bacillus longum (Campylobacter coli), bacillus gammae (Campylobacter coli), haemophilus (Campylobacter coli), lactobacillus acidophilus (Campylobacter coli) and lactobacillus (Campylobacter coli), listeria monocytogenes (Listeria monocytogenes), listeriaceae (Listeriaceae) bacteria, methylspora species (methylcysts sp.), methylcurved bacteria (Methylosinus trichosporium), shy campylobacter (Mobiluncus mulieris), neisseria bacilliformis, neisseria cinerea (Neisseria cinerea), neisseria flavum (Neisseria flavescens), neisseria alani (Neisseria lactamica), neisseria meningitidis (Neisseria meningitidis), neisseria species (neissenia sp.), neisseria wadsworthii, nitromonas species (Nitrosomonas sp.), parvibaculum lavamentivorans, pasteurella multocida (Pasteurella multocida), phascolarctobacterium succinatutens, ralstonia syzygii, rhodopseudomonas palustris (Rhodopseudomonas palustris), rhodopseudomonas species (Rhodovulum sp.), simonsii (Simonsiella muelleri), sphingomonas species (Sphingomonas sp.), sporolactobacillus vineae, staphylococcus lugdunensis (3772), streptococcus species (Streptococcus), micrococcus species (strococcus sp.), micrococcus sp (spirococcus sp.), or helicoid (Verminephrobacter eiseniae). Methods for catalyzing inactivating mutations and assessing nuclease activity of the mutants are known to those skilled in the art.

The programmable DNA binding unit may comprise a guide molecule. The guide RNA molecule (sgRNA) may consist of two separate molecules: target-specific crrnas and tracrRNA bound to Cas molecules. In some embodiments, the crRNA and tracrRNA are provided as separate molecules and one must anneal them to make a functional sgRNA. As used herein, the terms "guide sequence" and "guide molecule" in the context of a CRISPR-Cas system include any polynucleotide sequence that has sufficient complementarity to a selected binding site to hybridize to the selected binding site and to direct the specific binding of a programmable DNA binding unit to the sequence of the selected binding site. A gRNA molecule can refer to a nucleic acid that promotes specific targeting or homing of the gRNA molecule/Cas 9 molecule complex to a target binding site. The gRNA molecules can be single-molecular (with a single RNA molecule) (e.g., chimeric) or modular (comprising more than one, and typically two independent RNA molecules). The guide sequences prepared using the methods disclosed herein can be full length guide sequences, truncated guide sequences, full length sgRNA sequences, truncated sgRNA sequences, or e+fsgrna sequences. In some embodiments, the degree of complementarity of the guide sequence to a given binding site is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using a suitable alignment algorithm. In certain exemplary embodiments, the guide molecule comprises a guide sequence that can be designed to have at least one mismatch with the binding site such that an RNA duplex is formed between the guide sequence and the binding site. Thus, the degree of complementarity is preferably less than 99%. For example, when the guide sequence consists of 24 nucleotides, the degree of complementarity is more specifically about 96% or less. In certain embodiments, the guide sequence is designed with segments of two or more adjacent mismatched nucleotides such that the degree of complementarity of the entire guide sequence is further reduced. For example, when the guide sequence consists of 24 nucleotides, the degree of complementarity is more specifically about 96% or less, more specifically about 92% or less, more specifically about 88% or less, more specifically about 84% or less, more specifically about 80% or less, more specifically about 76% or less, more specifically about 72% or less, depending on whether a segment of two or more mismatched nucleotides comprises 2, 3, 4, 5, 6, or 7 nucleotides, and the like. In some embodiments, the degree of complementarity, in addition to the one or more segments of mismatched nucleotides, is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using a suitable alignment algorithm. The optimal alignment may be determined using any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, the Burrows-Wheeler transformation-based algorithm (e.g., burrows-Wheeler aligners), clustal W, clustal X, clustal Omega, BLAT, novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, san Diego, calif.), SOAP (available at SOAP. Genemics. Org. Cn), and Maq (available at maq. Sourceforge. Net). The ability of the guide sequence (within the nucleic acid-targeted guide RNA) to direct sequence-specific binding of the programmable DNA binding unit to the selected binding site can be assessed by any suitable assay. In some embodiments, the guide sequence is an RNA sequence between 10 and 50nt in length, but more specifically about 20-30nt, advantageously about 20nt, 23-25nt or 24nt. The guide sequence may be selected to ensure hybridization with the selected binding site.

Death guidance sequence (DeadGuide Sequences)

The programmable DNA binding unit can comprise a CRISPR-associated protein (CAS protein) and a guide RNA (gRNA) capable of specifically binding to a binding site of a target dsDNA. In some embodiments, the guide sequence is modified in a manner that allows formation of a CRISPR Cas complex and successful binding to the binding site, while not allowing successful nuclease activity. Such modified guide sequences are referred to as "dead guides" or "dead guide sequences". In terms of nuclease activity, these dead guidance or dead guidance sequences may be considered catalytically inactive or conformationally inactive. The programmable DNA-binding unit can comprise a functional Cas protein and a guide RNA (gRNA) or crRNA, wherein the gRNA or crRNA comprises a dead guide sequence, whereby the gRNA is capable of hybridizing to the selected binding site such that the Cas protein is directed to the selected binding site without detecting the cleavage activity of the non-mutant Cas protein. The ability of the dead guidance sequence to direct sequence-specific binding of the CRISPR complex to the binding site can be assessed by any suitable assay. The dead guide sequence may generally be shorter than the corresponding guide sequence that results in active cleavage. In particular embodiments, the death guidance is 5%, 10%, 20%, 30%, 40%, 50% shorter than the corresponding guidance for the same sequence.

Protein component

The programmable DNA binding unit may comprise a protein component capable of specifically binding to a binding site on the target dsDNA. The protein component may include an endonuclease-deficient Zinc Finger Nuclease (ZFN), an endonuclease-deficient transcription activator-like effector nuclease (TALEN), an Argonaute protein, an endonuclease-deficient meganuclease, a recombinase, or a combination thereof. In some embodiments, the programmable DNA binding unit does not have a nuclease domain. In some embodiments, the programmable DNA binding unit has a nuclease domain that has been rendered catalytically inactive by one or more mutations. Methods for catalyzing inactivating mutations and assessing nuclease activity of the mutants are known to those skilled in the art.

Transcription activator-like effector (TALE)

The programmable DNA binding unit may comprise an endonuclease-deficient transcription activator-like effector nuclease (TALEN), a functional fragment thereof, or a variant thereof. Transcription activator-like effectors (TALEs) can be engineered to bind to virtually any desired DNA sequence. For example, it can be found in Cerak T.Doyle EL.Christian M.Wang L.Zhang Y.Schmidt C et al Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting.nucleic Acids Res.2011;39:e82; zhang F.Cong L.Lodato S.Kosuri S.Church GM.Arlotta PEfficient construction of sequence-specific TAL effectors for modulating mammalian transmission.Nat Biotechnol.2011;29:149-153, and U.S. Pat. Nos. 8,450,471, 8,440,431 and 8,440,432, all of which are specifically incorporated by reference, find exemplary methods of targeting using the TALEN system.

The programmable DNA binding unit may comprise a TALE polypeptide. TALEs are transcription factors from the plant pathogen Xanthomonas (Xanthomonas) that can be easily engineered to bind new DNA targets. In some embodiments provided herein, TALEs are not linked to the catalytic domain of an endonuclease (e.g., fokl). In some embodiments provided herein, the programmable DNA binding unit may comprise a TALEN, wherein the endonuclease domain is catalytically inactive. TALE polypeptides comprise a nucleic acid binding domain consisting of tandem repeats of highly conserved monomeric polypeptides, which are predominantly 33, 34 or 35 amino acids in length and differ from each other predominantly in amino acid positions 12 and 13. As used herein, the term "polypeptide monomer" or "TALE monomer" will be used to refer to a highly conserved repeat polypeptide sequence within the TALE nucleic acid binding domain, and the term "repeat variable diradicals" or "RVDs" will be used to refer to highly variable amino acids at positions 12 and 13 of the polypeptide monomer. TALE monomers have nucleotide binding affinities that are determined by the identity of the amino acids in their RVDs. For example, a polypeptide monomer with RVD NI preferentially binds adenine (a), a polypeptide monomer with RVD NG preferentially binds thymine (T), a polypeptide monomer with RVD HD preferentially binds cytosine (C), and a polypeptide monomer with RVD NN preferentially binds adenine (a) and guanine (G). In another embodiment provided herein, the polypeptide monomer of RVD IG preferentially binds T. Thus, the number and order of polypeptide monomer repeats in the nucleic acid binding domain of TALE determines its nucleic acid target specificity. In some embodiments, the polypeptide monomer of RVD NS recognizes all four base pairs and can bind A, T, G or C. TALE has the structure and function described, for example, in Moscou et al, science 326:1501 (2009); boch et al, science 326:1509-1512 (2009); and Zhang et al, nature Biotechnology 29:149-153 (2011), each of which is incorporated herein by reference in its entirety. The programmable DNA binding unit may comprise a polypeptide monomer repeat designed to target a particular nucleic acid sequence.

As described in Zhang et al, nature Biotechnology, 29:149-153 (2011), TALE polypeptide binding efficiency can be increased by including an amino acid sequence from the "capping region" directly N-terminal or C-terminal to the DNA binding region of a naturally occurring TALE into the N-terminal or C-terminal position of the engineered TALE DNA binding region in the engineered TALE. Thus, in certain embodiments, a TALE polypeptide described herein further comprises an N-terminal capping region and/or a C-terminal capping region.

As used herein, the predetermined "N-terminal" to "C-terminal" direction of the N-terminal capping region, the DNA binding domain comprising the repeat TALE monomer, and the C-terminal capping region provide a structural basis for the organization of the different domains in a TALE or polypeptide provided herein.

The complete N-terminal and/or C-terminal capping regions are not necessary to enhance the binding activity of the DNA binding region. Thus, in certain embodiments, fragments of the N-terminal and/or C-terminal capping regions are included in the TALE polypeptides described herein.

In certain embodiments, a TALE polypeptide described herein comprises an N-terminal capping region fragment comprising at least 10, 20, 30, 40, 50, 54, 60, 70, 80, 87, 90, 94, 100, 102, 110, 117, 120, 130, 140, 147, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, or 270 amino acids of an N-terminal capping region. In certain embodiments, the N-terminal capping region fragment amino acid belongs to the C-terminus of the N-terminal capping region (proximal to the DNA binding region). As described in Zhang et al, nature Biotechnology, 29:149-153 (2011), N-terminal capping region fragments comprising the C-terminal 240 amino acids have enhanced binding activity equal to the full-length capping region, while fragments comprising the C-terminal 147 amino acids retain greater than 80% of the full-length capping region and fragments comprising the C-terminal 117 amino acids retain greater than 50% of the full-length capping region activity.

In some embodiments, a TALE polypeptide described herein comprises a C-terminal capping region fragment comprising at least 6, 10, 20, 30, 37, 40, 50, 60, 68, 70, 80, 90, 100, 110, 120, 127, 130, 140, 150, 155, 160, 170, 180 amino acids of the C-terminal capping region. In certain embodiments, the C-terminal capping region fragment amino acid is N-terminal to the C-terminal capping region (proximal to the DNA binding region). As described in Zhang et al, nature Biotechnology, 29:149-153 (2011), the C-terminal capping region fragment comprising the C-terminal 68 amino acids had enhanced binding activity equal to the full-length capping region, while the fragment comprising the C-terminal 20 amino acids retained greater than 50% of the full-length capping region.

Zinc Finger (ZF) proteins

The programmable DNA binding unit may comprise a Zinc Finger (ZF) nuclease, a functional fragment thereof, or a variant thereof. The programmable DNA binding unit may comprise an endonuclease-deficient ZF nuclease, a functional fragment thereof, or a variant thereof, wherein the domain of the endonuclease (e.g., fokl) is catalytically inactive or absent. The programmable DNA binding unit may comprise a ZF protein (ZFP). ZFP may be engineered to bind to a selected target site. See, for example, beerli et al (2002) Nature Biotechnol.20:135-141; pabo et al (2001) Ann.Rev.biochem.70:313-340; isalan et al (2001) Nature Biotechnol.19:656-660; segal et al (2001) curr.Opin.Biotechnol.12:632-637; choo et al (2000) curr.Opin.struct biol.10:411-416; U.S. Pat. nos. 6,453,242; 6,534,261; 6,599,692; 6,503,717; 6,689,558; 7,030,215; 6,794,136; 7,067,317; 7,262,054; 7,070,934; 7,361,635; 7,253,273; and U.S. patent publication 2005/0064474; 2007/0218528; 2005/0267061. ZFP may comprise an array of ZF modules that target the desired DNA binding site. Each finger module in the ZF array can target three DNA bases. Custom arrays of individual zinc finger domains can be assembled into ZFPs.

Meganuclease

The programmable DNA binding unit may be an endonuclease-deficient meganuclease, a functional fragment thereof, or a variant thereof. The DNA binding domain of meganuclease may have a double-stranded DNA target sequence of 12 to 45 bp. In some embodiments, meganucleases are dimerases, wherein each meganuclease domain is located on a monomer, or a monomeric enzyme comprising two domains on a single polypeptide. Protein engineering has produced not only wild-type meganucleases, but also various meganuclease variants to cover a myriad of unique sequence combinations. In some embodiments, can also use with meganuclease A half and protein B half of the site consisting of recognition sites chimeric meganuclease. Specific examples of such chimeric meganucleases include the protein domains of I-Dmo I and I-CreI. Examples of meganucleases include homing endonucleases from the LAGLIDADG family. "LAGLIDADG meganuclease" refers to a homing endonuclease from the LAGLIDADG family as defined by Stoddard et al (Stoddard, 2005) or an engineered variant comprising a polypeptide having at least 80%, 85%, 90%, 95%, 97.5%, 99% or more identity or similarity to said native homing endonuclease. Such engineered LAGLIDADG meganucleases can be derived from monomeric or dimeric meganucleases. When derived from a dimer meganuclease, such an engineered LAGLIDADG meganuclease may be a single-stranded or a dimer endonuclease. Meganucleases can be targeted to specific sequences by modifying their recognition sequences using techniques well known to those skilled in the art. See, e.g., epinat et al 2003,Nuc.Acid Res, 31 (l l): 2952-62 and Stoddard,2005,Quarterly Review of Biophysics,pp.1-47.

The LAGLIDADG meganuclease can be I-SceI, I-ChuI, I-CreI, I-CsmI, PI-SceI, PI-TliI, PI-MtuI, I-CeuI, I-SceII I-SceIII, HO, PI-CivI, PI-CtrI, PI-AaeI, PI-BsiI, PI-DhaI, PI-DraI PI-MavI, PI-MchI, PI-MfuI, PI-MflI, PI-MgaI PI-MgoI, PI-MinI, PI-MKAI, PI-MKEI, PI-MKHI, PI-MsmI, PI-Mthi, PI-MtuI, PI-MxeI, PI-NpuI, PI-PfuI, PI-RmaI, PI-SpbI, PI-SspI, PI-FacI, PI-MjaI, PI-PhoI, PI-TagI, PI-Thyl, PI-TkoI, PI or I-MsoI; or may be a functional mutant or variant thereof, whether homodimeric, heterodimeric or monomeric. In some embodiments, the LAGLIDADG meganuclease is an I-CreI derivative. In some embodiments, the LAGLIDADG meganuclease has at least 80% similarity to the native I-CreI LAGLIDADG meganuclease. In some embodiments, the LAGLIDADG meganuclease has at least 80% similarity to residues 1-152 of the native I-CreI LAGLIDADG meganuclease. In some embodiments, the LAGLIDADG meganuclease may be composed of two monomers that have at least 80% similarity with residues 1-152 of the natural I-CreI LAGLIDADG meganuclease linked together, with or without a linker peptide.

Argonaute protein

In some embodiments, the programmable DNA binding unit comprises Argonaute without nuclease activity. In some embodiments, the programmable DNA binding unit comprises an Argonaute protein (NgAgo) from a saline-alkali bacillus griseus (Natronobacterium gregoryi), a functional fragment thereof, or a variant thereof. NgAgo is a ssDNA-directed endonuclease. NgAgo binds 5' phosphorylated ssDNA (gDNA) of about 24 nucleotides, directs it to its target site, and will double-strand break the DNA at the gDNA site. In some embodiments, the programmable DNA binding unit comprises NgAgo (dNgAgo) with no nuclease activity. Characterization and use of NgAgo has been described in Gao et al, nat biotechnol, epub 2016May 2.PubMed PMID:27136078; swarts et al, nature.507 (7491) (2014): 258-61; and Swarts et al, nucleic Acids Res.43 (10) (2015): 5120-9, each of which is incorporated herein by reference. The NgAgo-based programmable DNA binding unit may comprise at least one guide DNA element or a nucleic acid comprising a nucleic acid sequence encoding a guide DNA element and achieve specific targeting or recognition of the binding site by base pairing directly with the DNA of the binding site. Prokaryotic homologs of the Argonaute protein are known and have been described, for example, in Makarova K. Et al, "Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements", biol. Direct.2009Aug.25; described in 4:29.Doi:10.1186/1745-6150-4-29, which is incorporated herein by reference. In some embodiments, the programmable DNA binding unit is a Marinitoga piezophila Argunaute (MpAgo) protein, functional fragment thereof, or variant thereof.

Recombinant enzyme

In some embodiments, the programmable DNA binding unit comprises a recombinase configured to bind to a binding site on the target dsDNA. Site-specific recombinases are well known in the art and may be generally referred to as invertases, resolvers, or integrases. Non-limiting examples of site-specific recombinases include, but are not limited to: lambda integrase, cre, int, IHF, xis, flp, fis, hin, gin, phiC31, cin, tn3 resolvase, tndX, xerC, xerD, tnpX, hjc, gin, spCCEl and ParA.

Joint

The transposomes may be bound to the programmable DNA binding unit by a linker linking the transposase and the dCAS protein. The linker may comprise a peptide linker, a chemical linker, or both. The transposase may be present as a fusion protein comprising a dCAS protein. The transposomes may be bound to the programmable DNA binding unit by a linker linking the transposase and the protein component. The peptide linker may comprise more than one glycine, serine, threonine, alanine, lysine, glutamine, or a combination thereof. The peptide linker may include a GS linker. The peptide linker can be an XTEN linker. The protein component may be present as a fusion protein comprising a transposase. The term "linker" as used herein refers to a molecule that facilitates interactions between molecules or molecular moieties. In some embodiments, the linker is a polypeptide linker. In some embodiments, the linker is a chemical linker. The term "peptide linker" or "polypeptide linker" as used herein refers to a peptide or polypeptide comprising two or more amino acid residues joined by peptide bonds. Such peptide or polypeptide linkers are well known in the art. The linker may include naturally occurring and/or non-naturally occurring peptides or polypeptides. The linker may be associated with the C-terminus and/or the N-terminus of the transposase and/or the programmable DNA binding unit.

The linker may be a chemical linker or a peptide linker. Thus, embodiments relate to polypeptides conjugated to other molecules via peptide bonds and polypeptides conjugated to other molecules via chemical conjugation.

Peptide linkers with a certain flexibility may be used. The peptide linker may have virtually any amino acid sequence, bearing in mind that a suitable peptide linker will have a sequence that results in a generally flexible peptide. The use of small amino acids such as glycine and alanine can be used to produce flexible peptides. The creation of such sequences is routine to those skilled in the art.

Suitable linkers can be readily selected and can have any suitable length, for example from 1 amino acid (e.g., gly) to 50 amino acids, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 amino acids, or a number or range between any two of these values (or any derivable range therein).

The preferred peptide linker sequences adopt a flexible extended conformation and do not exhibit a tendency to form ordered secondary structures. In certain embodiments, the linker may be a chemical moiety, which may be a monomer, dimer, multimer, or polymer. Preferably, the linker comprises an amino acid. Typical amino acids in flexible linkers include Gly, asn and Ser. Thus, in particular embodiments, the linker comprises a combination of one or more of Gly, asn, and Ser amino acids. Other near neutral amino acids, such as Thr and Ala, may also be used in the linker sequence. Exemplary flexible linkers include glycine polymer (G) n (SEQ ID NO: 32), glycine-serine polymers (including, for example, (GS) n (SEQ ID NO: 33), (GSGGS) n (SEQ ID NO: 34), (G4S) n (SEQ ID NO: 35) and (GGGS) n (SEQ ID NO: 36), where n is an integer of at least 1. In some embodiments, n is at least, up to or just 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 (or any derivable range therein), glycine-alanine polymers, alanine-serine polymers and other flexible linkers are known in the art. Glycine and glycine-serine polymers may be used, gly and Ser are relatively unstructured and thus may be used as neutral tethers between components. Glycine polymers may be used, glycine may even acquire more phi-psi space than alanine and be less restricted than side chains. Exemplary spacers may contain amino acid sequences, including but not limited to amino acid sequences such as GGG 37, SG (SG) and G.8, 9 or 10 (or any derivable range therein), GSID sequences such as those shown in SEQ ID NO:40, GSID G (GSID NO: 40), GSID sequence (GSID sequence of GSID NO: 40) may vary, without significantly affecting the function or activity of the fusion protein (see, e.g., U.S. patent No. 6,087,329). In some embodiments, the linker may be at least, up to, or exactly 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 amino acid residues (or any range derivable therein).

In some embodiments, the polypeptide linker is an XTEN linker. In some embodiments, the linker is an XTEN linker or a variant of an XTEN linker, e.g., SGSETPGTSESA (SEQ ID NO: 43), SGSETPGTSESATPES (SEQ ID NO: 44) or SGSETPGTSESATPEGGSGGS (SEQ ID NO: 45). XTEN linkers are described, for example, in Schellenberger et al (2009), nature Biotechnology 27:1186-1190, the entire contents of which are incorporated herein by reference.

Suitable linkers for use in the methods provided herein are well known to those skilled in the art and include, but are not limited to, straight or branched chain carbon linkers, heterocyclic carbon linkers, or peptide linkers. However, as used herein, the linker may also be a covalent bond (carbon-carbon bond or carbon-heteroatom bond). In certain embodiments, the linker is used to separate the transposome and the programmable DNA binding unit a sufficient distance to ensure that each protein retains its desired functional properties.

The linker can be used to fuse two protein partners to form a fusion protein. A "linker" may be a chemical group or molecule that connects two molecules or moieties, e.g., two domains of a fusion protein. Typically, a linker is located between (flanking) two groups, molecules, domains or other moieties, and is attached to each group by a covalent bond, thereby linking the two. In some embodiments, the linker is an amino acid or more than one amino acid (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, a group, a polymer (e.g., a non-natural polymer, a non-peptide polymer), or a chemical moiety. In some embodiments, the linker includes a direct bond or atom, such as oxygen (O) or sulfur (S), units such as-NR-, wherein R is hydrogen or alkyl, -C (O) -, -C (O) O-, -C (O) NH-, SO ₂ 、-SO ₂ NH-or chain of atoms, e.g. substituted or unsubstituted alkyl, substituted or unsubstitutedSubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, heteroarylalkyl. In some embodiments, one or more methylene groups in the atomic chain may be replaced by O, S, S (O), SO ₂ 、-SO ₂ NH-、-NR-、-NR ₂ -C (O) -, -C (O) O-, -C (O) NH-, a cleavable linking group, a substituted or unsubstituted aryl, a substituted or unsubstituted heteroaryl, and a substituted or unsubstituted heterocycle. Examples of linkers may also include chemical moieties and conjugation agents, such as sulfo-succinimidyl derivatives (sulfo-SMCC, sulfo-SMPB), disuccinimidyl lignan (DSS), disuccinimidyl glutarate (DSG), and disuccinimidyl tartrate (DST). Examples of linkers also include linear carbon chains such as CN (where n=l-100 carbon atoms). In some embodiments, the linker may be a dipeptide linker, such as a valine-citrulline (val-cit), phenylalanine-lysine (phe-lys) linker, or a maleimidocaproyl-valine-citrulline-p-aminobenzylcarbonyl (vc) linker. In some embodiments, the linker is sulfosuccinimidyl-4- [ N-maleimidomethyl ] ]Cyclohexane-l-carboxylate (smcc). Sulfo-smcc conjugation occurs through a maleimide group that reacts with a thiol group (thiol, -SH), while its sulfo-NHS ester reacts with a primary amine (as found in lysine and the N-terminus of proteins or peptides). Furthermore, the linker may be maleimide caproyl ester (me). In some embodiments, covalent linkages can be achieved by using Traut reagents.

FIGS. 8-10 depict non-limiting exemplary schematic diagrams of plasmid constructs 3XFlag-Cas9-Fl26-Tn5 (SEQ ID NO: 1), 3XFlag-Cas9-xTen-Tn5 (SEQ ID NO: 2), and pET-Tn5-xTen-dCAs9 (SEQ ID NO: 3), respectively, for use in generating the protein complexes provided herein. The protein complexes, adaptors, programmable DNA binding units and/or transposases disclosed herein can be encoded by nucleotide sequences that are at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% identical to the protein complexes, adaptors, programmable DNA binding units and/or transposases encoded in SEQ ID NOs 1-3 or a range between any two of these values.

Amplification of

The method may include: more than one dsDNA fragment is amplified with primers capable of binding to adaptors at the ends of the dsDNA fragments. Amplification may yield nucleic acid amplification products. The nucleic acid amplification products may constitute a library (e.g., a sequencing library). Each primer can be about 5-80 (e.g., about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80 nucleotides in length or a number or range between any two of these values) nucleotides in length. More than one dsDNA fragment can be amplified using Polymerase Chain Reaction (PCR) primers. PCR can be loop-mediated isothermal amplification (LAMP), helicase-dependent amplification (HDA), recombinase Polymerase Amplification (RPA), strand Displacement Amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), nicking Enzyme Amplification Reaction (NEAR), rolling Circle Amplification (RCA), multiple Displacement Amplification (MDA), branched amplification (RAM), circular helicase-dependent amplification (cHDA), single Primer Isothermal Amplification (SPIA), signal-mediated RNA amplification technology (SMART), self-sustained sequence replication (3 SR), genomic index amplification reaction (GEAR), or Isothermal Multiple Displacement Amplification (IMDA). The PCR may be real-time PCR or quantitative real-time PCR (QRT-PCR).

As used herein, nucleic acid amplification can refer to any known procedure that uses sequence-specific methods to obtain more than one copy of a target nucleic acid sequence or its complement or fragment thereof. Examples of known amplification methods include, but are not limited to, polymerase Chain Reaction (PCR), ligase Chain Reaction (LCR), loop-mediated isothermal amplification (LAMP), strand Displacement Amplification (SDA) (e.g., multiple Displacement Amplification (MDA)), replicase-mediated amplification, immune amplification, nucleic acid sequence-based amplification (NASBA), self-sustained sequence replication (3 SR), rolling circle amplification, and transcription-mediated amplification (TMA). In some embodiments, two or more of the above-described nucleic acid amplification methods may be performed, for example, sequentially.

For example, LCR amplification uses at least four separate oligonucleotides to amplify a target and its complementary strand using more than one cycle of hybridization, ligation, and denaturation. SDA is amplified by using primers that contain recognition sites for restriction endonucleases that nick one strand of a DNA duplex that includes semi-modification of the target sequence, followed by amplification in a series of primer extension and strand displacement steps.

PCR is a well known method in the art for nucleic acid amplification. PCR involves amplifying a target sequence using two or more extendible sequence-specific oligonucleotide primers flanking the target sequence. In the presence of primers, thermostable DNA polymerase (e.g., taq polymerase), and various dntps, a nucleic acid comprising a target sequence of interest is subjected to multiple thermal cycling (denaturation, annealing, and extension) procedures, resulting in amplification of the target sequence. PCR uses multiple rounds of primer extension reactions in which complementary strands of a defined region of a DNA molecule are simultaneously synthesized by a thermostable DNA polymerase. At the end of each cycle, each newly synthesized DNA molecule acts as a template for the next cycle. During the repeated rounds of these reactions, the number of newly synthesized DNA strands increases exponentially, so that after 20 to 30 reaction cycles, the original template DNA will be replicated thousands or millions of times.

PCR can produce double stranded amplification products suitable for post amplification processing. If desired, the amplified product may be detected by agarose gel electrophoresis visualization, by an enzyme immunoassay format using probe-based colorimetric detection, by fluorescence emission techniques, or by other detection means known to those skilled in the art.

Examples of PCR methods include, but are not limited to, real-time PCR, end-point PCR, amplified fragment length polymorphism PCR (AFLP-PCR), alu-PCR, asymmetric PCR, colony PCR, DD-PCR, degenerate PCR, hot start PCR, in situ PCR, inverse PCR, long PCR (Long-PCR), multiplex PCR, nested PCR, PCR-ELISA, PCR-RFLP, PCR-single strand conformation polymorphism (PCR-SSCP), quantitative competitive PCR (QC-PCR), cDNA end rapid amplification PCR (RACE-PCR), polymorphic DNA random amplification PCR (RAPD-PCR), real-time PCR, repeated gene foreign palindromic PCR (Rep-PCR), reverse transcriptase PCR (RT-PCR), TAIL-PCR, touchdown PCR (touchdown PCR), and Vectotte PCR.

Real-time PCR, also known as real-time quantitative polymerase chain reaction (QRT-PCR), can be used to simultaneously quantify and amplify specific parts of a given nucleic acid molecule. It can be used to determine whether a particular sequence is present in a sample; and if it is present, determining the copy number of the sequence present. The term "real-time" may refer to periodic monitoring during PCR. Certain systems, such as the ABI 7700 and 7900HT sequence detection systems (Applied Biosystems, foster City, CA), monitor at predetermined or user-defined points during each thermal cycle. Real-time PCR analysis with Fluorescence Resonance Energy Transfer (FRET) probes measures the change in fluorescent dye signal cycling to cycle, preferably subtracting any internal control signal. Real-time procedures follow the general pattern of PCR, but nucleic acids are quantified after each round of amplification. Two examples of quantification methods are the use of fluorescent dyes (e.g., SYBRGreen) that intercalate double stranded DNA and modified DNA oligonucleotide probes that fluoresce when hybridized to complementary DNA. Intercalators have relatively low fluorescence when unbound and relatively high fluorescence when bound to double stranded nucleic acids. Thus, intercalators can be used to monitor the accumulation of double stranded nucleic acid during a nucleic acid amplification reaction. Examples of such non-specific dyes that may be used in the embodiments disclosed herein include intercalators such as SYBR Green I (Molecular Probes), propidium iodide, ethidium bromide, and the like.

Marking

The methods described herein may include: one or both ends of one or more of the more than one dsDNA fragments are labeled (e.g., with a detectable label). The method may include: two ends of one or more of the more than one dsDNA fragments are differentially labeled. The labeling can include labeling with a detectable label (e.g., anionic label, cationic label, neutral label, electrochemical label, protein label, fluorescent label, magnetic label, or a combination thereof). The method may include: enriching for the labeled dsDNA fragments, capturing the labeled dsDNA fragments, isolating the labeled dsDNA fragments, and/or visualizing the labeled dsDNA fragments. The method may include monitoring (e.g., chemical monitoring) of the detectable label.

In some embodiments, the detectable moiety (e.g., a detectable label) comprises an optical moiety, a luminescent moiety, an electrochemically active moiety, a nanoparticle, or a combination thereof. In some embodiments, the luminescent moiety comprises a chemiluminescent moiety, an electroluminescent moiety, a photoluminescent moiety, or a combination thereof. In some embodiments, the photoluminescent moiety comprises a fluorescent moiety, a phosphorescent moiety, or a combination thereof. In some embodiments, the fluorescent moiety comprises a fluorescent dye. In some embodiments, the nanoparticle comprises a quantum dot. In some embodiments, the method comprises performing a reaction to convert a precursor of the detectable moiety to the detectable moiety. In some embodiments, performing the reaction to convert the precursor of the detectable moiety to the detectable moiety comprises contacting the precursor of the detectable moiety with a substrate. In some such embodiments, contacting the precursor of the detectable moiety with the substrate produces a detectable by-product of the reaction between the two molecules.

Detection and quantification of amplification products

Some methods provided herein comprise amplifying more than one dsDNA fragment to produce a nucleic acid amplification product. The methods described herein may also include detecting and/or quantifying a nucleic acid amplification product or a product thereof. The amplification product or products thereof may be detected and/or quantified by any suitable detection and/or quantification method, including, for example, any of the detection methods or quantification methods described herein. Non-limiting examples of detection and/or quantification methods include molecular beacons (e.g., real-time, end-point), lateral flow, fluorescence Resonance Energy Transfer (FRET), fluorescence Polarization (FP), surface capture, 5 'to 3' exonuclease hydrolysis probes (e.g., TAQMAN), intercalating/binding dyes, absorbance methods (e.g., colorimetry, nephelometry), electrophoresis (e.g., gel electrophoresis, capillary electrophoresis), mass spectrometry, nucleic acid sequencing, digital amplification, primer extension methods (e.g., iPLEX ^TM ) From Affymetrix Molecular Inversion Probe (MIP) techniques, restriction Fragment Length Polymorphism (RFLP) analysis, allele Specific Oligonucleotide (ASO) analysis, methylation Specific PCR (MSPCR), pyrosequencing analysis, acycloprime analysis, reverse dot blot, geneChip microarray, dynamic Allele Specific Hybridization (DASH), peptide Nucleic Acid (PNA) and Locked Nucleic Acid (LNA) probes, alphaScreen, SNPstream, genetic Bit Analysis (GBA), multiplex micro-sequencing, SNaPshot, GOOD assays, microarray miniseq, array Primer Extension (APEX), microarray primer extension, tag arrays, encoding microspheres, template Directed Incorporation (TDI), colorimetric Oligonucleotide Ligation Assay (OLA), sequence encoding OLA, microarray ligation, ligase chain reaction, padlock probes, invader assay (invader assay), hybridization using at least one probe, hybridization using at least one fluorescent labeled probe, cloning and sequencing, use of hybridization probes and quantitative real-time polymerase chain reaction (QRT-PCR), nanopore, sequencing chips, and combinations thereof. Detecting nucleic acid amplification products may include using real-time detection methods (i.e., detecting and/or continuously monitoring the products during the amplification process), using endpoint detection methods (i.e., detecting the products after completion or cessation of the amplification process), or both. The nucleic acid detection method may also use labeled nucleotides that are directly incorporated into the target sequence or into probes containing the target complementary sequence. Such labels may be radioactive and/or fluorescent in nature and may be resolved in any of the ways discussed herein. In some embodiments, quantification of nucleic acid amplification products is achieved using one or more of the following detection methods. The detection method may be used in combination with measurement of signal intensity and/or generation of a standard curve and/or a look-up table for quantification of nucleic acid amplification products (or reference).

Detecting the nucleic acid amplification product may include using molecular beacon techniques. The term molecular beacon generally refers to a detectable molecule, wherein the detectable property of the molecule is detectable under certain conditions, thereby enabling the molecule to function as a specific and informative signal. Non-limiting examples of detectable properties include optical properties (e.g., fluorescence), electrical properties, magnetic properties, chemical properties, and time or speed of passage through an opening of known size. The molecular beacon for detecting a nucleic acid molecule may be, for example, a hairpin oligonucleotide that contains a fluorophore at one end and a quenching dye at the other end. The loop of the hairpin may comprise a probe sequence complementary to the target sequence, while the stem is formed by annealing of complementary arm sequences located on either side of the probe sequence. The fluorophore and quencher molecule may be covalently linked at opposite ends of each arm. Under conditions that prevent hybridization of the oligonucleotide to its complementary target, or when the molecular beacon is free in solution, the fluorescent molecule and the quencher molecule are in proximity to each other, thereby preventing FRET. When a molecular beacon encounters a target molecule (e.g., a nucleic acid amplification product), hybridization can occur and the ring structure is converted to a stable, more rigid conformation, resulting in separation of the fluorophore and quencher molecules, thereby producing fluorescence. Due to the specificity of the probe, fluorescence is usually generated entirely due to the synthesis of the desired amplification product. In some embodiments, the molecular beacon probe sequence hybridizes to a sequence in the amplification product that is identical or complementary to a sequence in the target nucleic acid. In some embodiments, the molecular beacon probe sequence hybridizes to a sequence in the amplification product that is not identical or complementary to a sequence in the target nucleic acid (e.g., hybridizes to a sequence added to the amplification product by a tailed amplification primer or ligation). Molecular beacons can be synthesized with different colored fluorophores and different target sequences, enabling the simultaneous detection of several products in the same reaction (e.g., in multiple reactions). For quantitative amplification processes, molecular beacons can specifically bind to amplified targets after each amplification cycle, and because non-hybridized molecular beacons are dark, it is not necessary to isolate probe-target hybrids to quantitatively determine the amount of amplified product. The signal generated is proportional to the amount of amplified product. Detection using molecular beacons may be done in real time or as an endpoint detection method.

Detecting nucleic acid amplification products can include the use of lateral flow, which generally includes a solid phase fluid-permeable flow path through which fluid flows by capillary forces. Example devices include, but are not limited to, dipstick assays and thin layer chromatography plates with various suitable coatings. Immobilized on the flow path are various binding reagents for the sample, binding partners or conjugates involving binding partners for the sample and the signal generating system. Detection can be achieved in several ways, including, for example, enzymatic detection, nanoparticle detection, colorimetric detection, and fluorescent detection.

Detecting the nucleic acid amplification product may include using FRET, which is an energy transfer mechanism between two chromophores: donor and acceptor molecules. Briefly, a donor fluorophore molecule is excited at a specific excitation wavelength. When the donor molecule returns to the ground state, subsequent emission of the donor molecule can transfer excitation energy to the acceptor molecule through long Cheng Ouji-dipole interactions. The emission intensity of the acceptor molecule can be monitored and varies with the distance between the donor and acceptor, the overlap of the donor emission spectrum and acceptor absorption spectrum, and the orientation of the donor emission dipole moment and acceptor absorption dipole moment. FRET can be used to quantify molecular dynamics, for example, in DNA-DNA interactions described for molecular beacons. To monitor the production of a particular product, the probe may be labeled with a donor molecule at one end and an acceptor molecule at the other end. Probe-target hybridization results in a change in the distance or orientation of the donor and acceptor, and FRET changes are observed.

Detection of nucleic acid amplification products involves the use of FP, which is generally based on the principle that a fluorescently labeled compound will emit fluorescence with a degree of polarization inversely proportional to its rotation rate when excited by linearly polarized light. Thus, when a molecule with a fluorescent label, such as a tracer-nucleic acid conjugate, is excited by linearly polarized light, the emitted light remains highly polarized because the fluorophore is restricted from rotating between the time the light is absorbed and emitted. When a free tracer compound (i.e., not bound to a nucleic acid) is excited by linearly polarized light, it rotates much faster than the corresponding tracer-nucleic acid conjugate and the molecules are more randomly oriented, and therefore, the emitted light is depolarized. Thus, fluorescence polarization provides a quantitative method for measuring the amount of tracer-nucleic acid conjugate produced in an amplification reaction.

Detection of nucleic acid amplification products involves the use of surface capture, which can be achieved by immobilizing specific oligonucleotides to a surface, resulting in a highly sensitive and selective biosensor.

Detecting the nucleic acid amplification product may include using a 5 'to 3' exonuclease hydrolysis probe (e.g., TAQMAN). For example, TAQMAN probes are hydrolysis probes that can increase the specificity of a quantitative amplification method (e.g., quantitative PCR). The TAQMAN probe principle relies on 1) 5 'to 3' exonuclease activity of Taq polymerase to cleave dual labeled probes during hybridization with complementary target sequences and 2) fluorophore-based detection. The fluorescent signal generated allows for quantitative measurement of the accumulation of amplified product during the exponential phase of the amplification.

Detection of nucleic acid amplification products includes the use of intercalating and/or binding dyes, e.g., dyes that are capable of specifically staining nucleic acids. For example, intercalating dyes exhibit enhanced fluorescence upon binding to DNA or RNA. Non-limiting examples of dyes include82. Acridine orange, ethidium bromide, hoechst dye,/->Propidium iodide,>(asymmetric cyanine dye)>II. Toso (thiazole orange dimer) and yoyoyo (oxazole yellow dimer).

Detection of the nucleic acid amplification product includes the use of absorbance methods (e.g., colorimetry, nephelometry). For example, detection and/or quantification of nucleic acids can be accomplished by directly converting absorbance (e.g., UV absorbance measurements at 260 nm) to concentration. Direct measurement of nucleic acids can be converted to concentration using Beer Lambert's law, which uses measured path length and extinction coefficient to relate absorbance to concentration.

Detecting the nucleic acid amplification product may include using electrophoresis (e.g., gel electrophoresis, capillary electrophoresis), mass spectrometry, nucleic acid sequencing, digital amplification (e.g., digital PCR), or any combination thereof.

Genetic features of interest

More than one target dsDNA may include a genetic feature of interest (e.g., a biomarker feature). The genetic feature of interest may include one or more mutations (e.g., biomarkers) of interest. The one or more mutations of interest may include point mutations, inversions, deletions, insertions, translocations, duplications, copy number variations, or combinations thereof. The one or more mutations of interest may include nucleotide substitutions, deletions, insertions, or combinations thereof. The genetic trait of interest may be indicative of antibiotic resistance or antibiotic susceptibility of an organism derived from the target dsDNA. The genetic characteristic of interest may be indicative of the cancer status of the organism from which the target dsDNA originates. The genetic characteristic of interest may be indicative of the status of a genetic disease of the organism from which the target dsDNA originates. The genetic disease may be a monogenic disorder. The genetic disease may be cystic fibrosis, huntington's disease, sickle cell anemia, hemophilia, duchenne muscular dystrophy, thalassemia, fragile X syndrome, familial hypercholesterolemia, polycystic kidney disease, type I neurofibromatosis, hereditary spherical erythromatosis, ma Fanzeng syndrome, tay-saxose disease, phenylketonuria, mucopolysaccharidosis, lysosomal acid lipase deficiency, glycogen storage disease, galactosylation, or hemochromatosis. Genetic features of interest (e.g., biomarker features) can be detected using the methods and compositions provided herein. Diagnostic evaluation can be performed using the methods and compositions provided herein.

Diagnostic evaluation is performed based on biomarker characteristics (e.g., genetic characteristics of interest), alone or in combination with other evaluations or factors, as described herein. Provided herein are compositions and methods for assessing the risk of developing a disease or condition, prognosing the disease or condition, monitoring the progression or regression of the disease or condition, assessing the efficacy of a treatment, or identifying compounds capable of ameliorating or treating the disease or condition based on a biomarker signature (e.g., a genetic signature of interest).

Diseases and conditions

The methods provided herein can be applied to a variety of diseases or conditions based on the biomarker signature (e.g., genetic signature of interest) associated therewith. Exemplary diseases or conditions having genetic characteristics of interest according to the disclosed compositions and methods include cardiovascular diseases or conditions, kidney-related diseases or conditions, prenatal or pregnancy-related diseases or conditions, neurological or neuropsychiatric diseases or conditions, autoimmune or immune-related diseases or conditions, cancer, infectious diseases or conditions, pediatric diseases, disorders or conditions, mitochondrial diseases, respiratory-gastrointestinal diseases or conditions, reproductive diseases or conditions, ophthalmic diseases or conditions, musculoskeletal diseases or conditions, or dermatological diseases or conditions.

Sample of

The sample may comprise eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. The more than one target dsDNA may comprise genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. The sample may be, or be derived from, a biological sample, a clinical sample, an environmental sample, or a combination thereof. More than one target dsDNA may comprise DNA from at least 2 different organisms. More than one target dsDNA may comprise DNA from at least 2 different genes. The method may include: more than one target dsDNA is produced from more than one target RNA using reverse transcriptase. More than one target dsDNA may comprise target dsDNA produced from a target RNA with a reverse transcriptase. The sample nucleic acid may include eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. The target dsDNA may be genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. The sample nucleic acid may be from a biological sample, a clinical sample, an environmental sample, or a combination thereof. The biological sample may include stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue, exudates, body fluids, or combinations thereof.

The nucleic acids used in the methods described herein can be obtained from any suitable biological specimen or sample, and are typically isolated from a sample obtained from a subject. The subject may be any living or non-living organism including, but not limited to, a human, a non-human animal, a plant, a bacterium, a fungus, a virus, or a protozoan. Any human or non-human animal may be selected, including but not limited to mammals, reptiles, birds, amphibians, fish, ungulates, ruminants, bovine (e.g., cattle), equine (e.g., horses), caprine (caprine) and ovine (ovine) (e.g., sheep, goats), porcine (e.g., pigs), camelid (e.g., camels, llamas, alpacas), monkeys, apes (e.g., gorillas, chimpanzees), bear (e.g., bears), poultry, dogs, cats, mice, rats, fish, dolphins, whales and sharks. The subject may be male or female, and the subject may be of any age (e.g., embryo, fetus, infant, child, adult).

The sample or test sample may be any specimen isolated or obtained from a subject or portion thereof. Non-limiting examples of samples include fluids or tissues from a subject, including, but not limited to, blood or blood products (e.g., serum, plasma, etc.), umbilical cord blood, bone marrow, chorionic villus, amniotic fluid, cerebrospinal fluid, spinal fluid, lavage fluid (e.g., bronchoalveolar, stomach, peritoneum, catheter, ear, arthroscope), biopsy samples, laparoscopy samples, cells (e.g., blood cells) or portions thereof (e.g., mitochondria, nuclei, extracts, etc.), washes of the female genital tract, urine, stool, sputum, saliva, nasal mucus, prostatic fluid, lavage fluid, semen, lymph, bile, tears, sweat, breast milk, breast fluid, hard tissue (e.g., liver, spleen, kidney, lung, or ovary), etc., or combinations thereof. The term blood includes whole blood, blood products or any fraction of blood, such as serum, plasma, buffy coat or the like as conventionally defined. Plasma refers to the whole blood fraction produced by centrifugation of blood treated with an anticoagulant. Serum refers to the aqueous portion of the fluid that remains after the blood sample has coagulated. Fluid or tissue samples are typically collected according to standard protocols commonly followed by hospitals or clinics. For blood, an appropriate amount of peripheral blood is typically collected (e.g., between 3-40 milliliters) and may be stored according to standard procedures either before or after preparation.

The sample or test sample may comprise a sample containing spores, viruses, cells, nucleic acids from prokaryotes or eukaryotes, or any free nucleic acid. For example, the methods described herein can be used to detect nucleic acids outside of spores (e.g., without lysis). The sample may be isolated from any material suspected of containing the target sequence, for example from a subject as described above. In some embodiments, the target sequence is present in air, plants, soil, or other material suspected of containing a biological organism.

Nucleic acids may be derived (e.g., isolated, extracted, purified) from one or more sources by methods known in the art. Any suitable method may be used to isolate, extract and/or purify nucleic acids from biological samples, non-limiting examples of which include DNA preparation methods in the art, and various commercially available reagents or kits, such as Qiaamp cycle nucleic acid kit for Qiagen, qiaAmp DNA mini-kit or QiaAmp DNA blood mini-kit (Qiagen, hilden, germany), genomicPrep ^TM Blood DNA isolation kit (Promega, madison, wis.) and GFX ^TM Genomic blood DNA purification kits (Amersham, piscataway, NJ), and the like, or combinations thereof.

In some embodiments, a cell lysis procedure is performed. Cell lysis may be performed prior to the initiation of the reactions provided herein. Cell lysis procedures and reagents are known in the art and can generally be performed by chemical (e.g., detergents, hypotonic solutions, enzymatic procedures, etc., or a combination thereof), physical (e.g., french press, sonication, etc.), or electrolytic lysis methods. Any suitable cleavage procedure may be used. For example, chemical methods typically use lysing agents to destroy cells and extract the nucleic acids from the cells, followed by treatment with chaotropic salts. In some embodiments, cell lysis includes the use of detergents (e.g., ionic, nonionic, anionic, zwitterionic). In some embodiments, cell lysis includes the use of an ionic detergent (e.g., sodium Dodecyl Sulfate (SDS), sodium Lauryl Sulfate (SLS), deoxycholate, cholate, sarkosyl). Physical methods such as grinding after freezing/thawing, crushing using cells, and the like may also be useful. High salt lysis procedures may also be used. For example, an alkaline lysis procedure may be used. The latter procedure traditionally involves the use of phenol-chloroform solutions, and alternative phenol-chloroform free procedures involving three solutions may be used. In the latter procedure, a solution may contain 15mM Tris, pH 8.0;10mM EDTA and 100. Mu.g/ml RNase; the second solution may contain 0.2N NaOH and 1% SDS; and the third solution may comprise, for example, 3m koac, ph 5.5. In some embodiments, the cell lysis buffer is used in combination with the methods and components described herein.

Nucleic acids may be provided for performing the methods described herein without processing a sample containing the nucleic acids. For example, in some embodiments, nucleic acids are provided for use in performing the amplification methods described herein without prior nucleic acid purification. In some embodiments, the target sequence is amplified directly from the sample (e.g., without performing any nucleic acid extraction, isolation, purification, and/or partial purification steps). In some embodiments, after processing a sample containing nucleic acids, the nucleic acids are provided for performing the methods described herein. For example, nucleic acids may be extracted, isolated, purified, or partially purified from a sample. The term "isolated" generally refers to a nucleic acid that is removed from its original environment (e.g., natural environment if it is naturally occurring, host cell if it is exogenously expressed), and thus altered from its original environment by human intervention (e.g., "by human hand"). The term "isolated nucleic acid" may refer to a nucleic acid that is removed from a subject (e.g., a human subject). The isolated nucleic acid may provide less non-nucleic acid components (e.g., proteins, lipids, carbohydrates) than are present in the source sample. Compositions comprising isolated nucleic acids may be free of about 50% to greater than 99% of non-nucleic acid components. Compositions comprising isolated nucleic acids may be free of about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater than 99% non-nucleic acid components. The term "purified" generally refers to a nucleic acid provided that contains less non-nucleic acid components (e.g., proteins, lipids, carbohydrates) than the amount of non-nucleic acid components present prior to subjecting the nucleic acid to a purification procedure. A composition comprising purified nucleic acid may be free of about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater than 99% of other non-nucleic acid components.

Nucleic acids may be provided for performing the methods described herein without modifying the nucleic acids. Modifications may include, for example, denaturation, digestion, nicking, melting, incorporation and/or ligation of heterogeneous sequences, addition of epigenetic modifications, addition of labels (e.g., radiolabels such as ³² P、 ³³ P、 ¹²⁵ I or ³⁵ S, S; enzyme labels such as alkaline phosphatase; fluorescent labels such as Fluorescein Isothiocyanate (FITC); or other labels such as biotin, avidin, digoxin, antigen, hapten, fluorescent dye) and the like. Thus, in some embodiments, unmodified nucleic acids are amplified.

The methods of the present disclosure for detecting a target nucleic acid sequence (single-or double-stranded DNA and/or RNA) in a sample can detect the target nucleic acid sequence (e.g., DNA or RNA) with high sensitivity. In some embodiments, the methods of the present disclosure can be used to detect target RNA/DNA present in a sample comprising more than one RNA/DNA (including target RNA/DNA and more than one non-target RNA/DNA), wherein the target RNA/DNA is present at every 10 ⁷ One or more copies of non-target RNA/DNA (e.g., every 10 ⁶ One or more copies of each 10 non-target RNA/DNA ⁵ One or more copies of each 10 non-target RNA/DNA ⁴ One or more copies of each 10 non-target RNA/DNA ³ One or more copies of each 10 non-target RNA/DNA ² One or more copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 50 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 20 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 10 copies of non-target RNA/DNA, or one or more copies of non-target RNA/DNA per 5 copies). In some embodiments, the methods of the present disclosure can be used to detect target RNA/DNA present in a sample comprising more than one RNA/DNA (including target RNA/DNA and more than one non-target RNA/DNA), wherein the target RNA/DNA is present at every 10 ¹⁸ One or more copies of non-target RNA/DNA (e.g., every 10 ¹⁵ One or more copies of each 1 of non-target RNA/DNA0 ¹² One or more copies of each 10 non-target RNA/DNA ⁹ One or more copies of each 10 non-target RNA/DNA ⁶ One or more copies of each 10 non-target RNA/DNA ⁵ One or more copies of each 10 non-target RNA/DNA ⁴ One or more copies of each 10 non-target RNA/DNA ³ One or more copies of each 10 non-target RNA/DNA ² One or more copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 50 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 20 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 10 copies of non-target RNA/DNA, or one or more copies of non-target RNA/DNA per 5 copies). As used herein, the terms "RNA/DNA" and "RNAs/DNAs" shall be given their ordinary meaning and shall also refer to DNA, or RNA, or a combination of DNA and RNA.

In some embodiments, the methods of the present disclosure can detect target RNA/DNA present in a sample, wherein the target RNA/DNA is present at every 10 ⁷ One copy of each non-target RNA/DNA to every 10 copies of non-target RNA/DNA (e.g., every 10 copies of non-target RNA/DNA ⁷ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁵ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁶ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁶ One copy of each non-target RNA/DNA to every 10 copies of each non-target RNA/DNA, every 10 copies ⁶ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ⁵ One copy of the non-target RNA/DNA,every 10 ⁵ One copy of each non-target RNA/DNA to every 10 copies of each non-target RNA/DNA, every 10 copies ⁵ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁵ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA, or every 10 ⁵ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA).

In some embodiments, the methods of the present disclosure can detect target RNA/DNA present in a sample, wherein the target RNA/DNA is present at every 10 ¹⁸ One copy of each non-target RNA/DNA to every 10 copies of non-target RNA/DNA (e.g., every 10 copies of non-target RNA/DNA ¹⁸ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ¹⁵ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ¹² One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁹ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁵ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁶ One copy of non-target RNA/DNA per 10 ⁶ One copy of each non-target RNA/DNA to every 10 copies of each non-target RNA/DNA, every 10 copies ⁶ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ⁵ One copy of non-target RNA/DNA per 10 ⁵ One copy of each non-target RNA/DNA to one copy of each 10 non-target RNA/DNA, each10 ⁵ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁵ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA, or every 10 ⁵ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA).

In some embodiments, the methods of the present disclosure can detect target RNA/DNA present in a sample, wherein the target RNA/DNA is present at every 10 ⁷ One copy of each non-target RNA/DNA to every 100 non-target RNA/DNA copies (e.g., every 10 copies ⁷ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁵ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ⁶ One copy of non-target RNA/DNA per 10 ⁷ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁶ One copy of each non-target RNA/DNA to every 100 copies of each non-target RNA/DNA, every 10 copies ⁶ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA per 10 ⁶ One copy of non-target RNA/DNA to every 10 ⁵ One copy of non-target RNA/DNA per 10 ⁵ One copy of each non-target RNA/DNA to every 100 copies of each non-target RNA/DNA, every 10 copies ⁵ One copy of non-target RNA/DNA to every 10 ² One copy of non-target RNA/DNA per 10 ⁵ One copy of non-target RNA/DNA to every 10 ³ One copy of non-target RNA/DNA, or every 10 ⁵ One copy of non-target RNA/DNA to every 10 ⁴ One copy of non-target RNA/DNA).

In some embodiments, for the methods of the invention for detecting target RNA/DNA in a sample, the detection threshold is 10nM or less. The term "detection threshold" as used herein describes the minimum amount of target RNA/DNA that must be present in a sample in order for detection to occur. Thus, as an illustrative example, when the detection threshold is 10nM, then a signal can be detected when the target RNA/DNA is present in the sample at a concentration of 10nM or higher. In some embodiments, the methods of the present disclosure have a detection threshold of 5nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 1nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.5nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.1nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.05nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.01nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.005nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.001nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.0005nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.0001nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.00005nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.00001nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 10pM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 1pM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 500fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 250fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 100fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 50fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 500aM (attomole/liter) or less. In some embodiments, the methods of the present disclosure have a detection threshold of 250aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 100aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 50aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 10aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 1aM or less.

In some embodiments, the detection threshold (for detection of target RNA and/or DNA in the methods of the invention) is in the range of 500fM to 1nM (e.g., 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM) (where concentration refers to a threshold concentration of target RNA/DNA that can detect target RNA/DNA). In some embodiments, the methods of the present disclosure have a detection threshold ranging from 800fM to 100 pM. In some embodiments, the methods of the present disclosure have detection thresholds ranging from 1pM to 10 pM. In some embodiments, the methods of the present disclosure have a detection threshold ranging from 10fM to 500fM, for example, 10fM to 50fM, 50fM to 100fM, 100fM to 250fM, or 250fM to 500fM.

In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range from 500fM to 1nM (e.g., 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM). In some embodiments, the minimum concentration at which target RNA/DNA can be detected in the sample is in the range of 800fM to 100 pM. In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range of 1pM to 10 pM.

In some embodiments, the detection threshold (for detection of target RNA/DNA in the methods of the invention) is in the range from 1aM to 1nM (e.g., 1aM to 500pM, 1aM to 200pM, 1aM to 100pM, 1aM to 10pM, 1aM to 1pM, 100aM to 1nM, 100aM to 500pM, 100aM to 200pM, 100aM to 100pM, 100aM to 10pM, 100aM to 1pM, 250aM to 1nM, 250aM to 500pM, 250aM to 200pM, 250aM to 100pM, 250aM to 10pM, 250aM to 1pM, 500aM to 1nM, 500aM to 500pM, 500aM to 200pM, 500aM to 100pM, 750aM to 1nM, 750aM to 500pM, 750aM to 200pM, 750aM to 100pM, 750aM to 10pM 750aM to 1pM, 1fM to 1nM, 1fM to 500pM, 1fM to 200pM, 1fM to 100pM, 1fM to 10pM, 1fM to 1pM, 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM) (where concentration refers to a threshold concentration of target RNA/DNA at which target RNA/DNA can be detected). In some embodiments, the methods of the present disclosure have detection thresholds ranging from 1aM to 800 aM. In some embodiments, the methods of the present disclosure have detection thresholds ranging from 50aM to 1 pM. In some embodiments, the methods of the present disclosure have detection thresholds ranging from 50aM to 500 fM.

In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range from 1aM to 1nM (e.g., 1aM to 500pM, 1aM to 200pM, 1aM to 100pM, 1aM to 10pM, 1aM to 1pM, 100aM to 1nM, 100aM to 500pM, 100aM to 200pM, 100aM to 100pM, 100aM to 10pM, 100aM to 1pM, 250aM to 1nM, 250aM to 500pM, 250aM to 200pM, 250aM to 100pM, 250aM to 10pM, 250aM to 1pM, 500aM to 1nM, 500aM to 500pM, 500aM to 200pM, 500aM to 100pM, 500aM to 10pM, 500aM to 1pM, 750aM to 1nM, 750aM to 500pM, 750aM to 200pM 750aM to 100pM, 750aM to 10pM, 750aM to 1pM, 1fM to 1nM, 1fM to 500pM, 1fM to 200pM, 1fM to 100pM, 1fM to 10pM, 1fM to 1pM, 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM). In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range of 1aM to 500 pM. In some embodiments, the minimum concentration at which target RNA/DNA can be detected in the sample is in the range of 100aM to 500 pM.

In some embodiments, the disclosed compositions or methods exhibit an attomole per liter (aM) detection sensitivity. In some embodiments, the disclosed compositions or methods exhibit femtomolar (fM) detection sensitivity. In some embodiments, the disclosed compositions or methods exhibit picomolar (pM) detection sensitivity. In some embodiments, the disclosed compositions or methods exhibit nanomolar/liter (nM) detection sensitivity.

The disclosed samples include sample nucleic acids (e.g., more than one sample nucleic acid). The term "more than one" is used herein to mean two or more. Thus, in some embodiments, the sample comprises two or more (e.g., 3 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more) sample nucleic acids (e.g., RNAs). The disclosed methods can be used as very sensitive methods for detecting the presence of a target nucleic acid in a sample (e.g., in a complex mixture of nucleic acids such as RNA). In some embodiments, the sample comprises 5 or more DNAs (e.g., 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more DNAs) that differ in sequence from one another. In some embodiments, the sample comprises 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 10 ³ One or more, 5X 10 ³ One or more of 10 ⁴ One or more, 5X 10 ⁴ One or more of 10 ⁵ One or more, 5X 10 ⁵ One or more of 10 ⁶ One or more, 5X 10 ⁶ One or more or 10 ⁷ One or more DNA. In some embodiments, the sample comprises 10 to 20, 20 to 50, 50 to 100, 100 to 500, 500 to 10 ³ Seed, 10 ³ Seed to 5x 10 ³ Seed, 5x 10 ³ Seed to 10 ⁴ Seed, 10 ⁴ Seed to 5x 10 ⁴ Seed, 5x 10 ⁴ Seed to 10 ⁵ Seed, 10 ⁵ Seed to 5x 10 ⁵ Seed, 5x 10 ⁵ Seed to 10 ⁶ Seed, 10 ⁶ Seed to 5x 10 ⁶ Seed, or 5x 10 ⁶ Seed to 10 ⁷ Species, or more than 10 ⁷ A DNA. In some embodiments, the sample comprises 5 to 10 ⁷ Species RNAs (e.g., RNAs that differ from each other in sequence) (e.g., from 5 to 10 ⁶ Seed, from 5 to 10 ⁵ Seed, from 5 to 50,000, from 5 to 30,000, from 10 to 10 ⁶ Seed, from 10 to 10 ⁵ Seed, from 10 to 50,000, from 10 to 30,000, from 20 to 10 ⁶ Seed, from 20 to 10 ⁵ Species, from 20 to 50,000, or from 20 to 30,000 RNAs). In some embodiments, the sample comprises 20 or more RNAs that differ in sequence from one another. In some embodiments, the sample comprises RNA from a cell lysate (e.g., eukaryotic cell lysate, mammalian cell lysate, human cell lysate, prokaryotic cell lysate, plant cell lysate, etc.). For example, in some embodiments, the sample comprises DNA from a cell, such as a eukaryotic cell, e.g., a mammalian cell, e.g., a human cell.

The term "sample" as used herein shall have its ordinary meaning and shall include any sample comprising RNA and/or DNA (e.g., to determine whether a target DNA and/or target RNA is present in a population of RNA and/or DNA). The sample may be derived from any source, e.g., the sample may be a synthetic combination of purified DNA and/or RNA; the sample may be a cell lysate, a DNA/RNA-rich cell lysate, or DNA/RNA isolated and/or purified from a cell lysate. The sample may be from a patient (e.g., for diagnostic purposes). The sample may be from permeabilized cells. The sample may be from crosslinked cells. The sample may be a tissue slice. The sample may be from a tissue prepared by cross-linking followed by degreasing and conditioning to form a uniform refractive index.

Suitable samples include, but are not limited to, saliva, blood, serum, plasma, urine, aspirate, and biopsy samples. The sample may be from a patient and encompasses other liquid samples of blood and biological origin, solid tissue samples such as biopsy samples or tissue cultures or cells derived therefrom and their progeny. The definition also includes samples that are manipulated in any way after they are obtained, such as by treating, washing or enriching certain cell populations, such as cancer cells, with reagents. The definition also includes samples that have been enriched for a particular type of molecule (e.g., RNA). The term "sample" encompasses biological samples, such as clinical samples, such as blood, plasma, serum, aspirate, cerebrospinal fluid (CSF), and also includes tissue obtained by surgical excision, tissue obtained by biopsy, cells in culture, cell supernatants, cell lysates, tissue samples, organs, bone marrow, and the like. "biological sample" includes biological fluids derived therefrom (e.g., cancerous cells, infected cells, etc.), such as RNA-containing samples obtained from such cells (e.g., cell lysates or other cell extracts containing RNA).

In some embodiments, the source of the sample is (or is suspected of being) a diseased cell, fluid, tissue or organ. In some embodiments, the source of the sample is normal (non-diseased) cells, fluids, tissues or organs. In some embodiments, the source of the sample is (or is suspected of being) a pathogen-infected cell, tissue or organ. For example, the source of the sample may be an individual that may or may not be infected, and the sample may be any biological sample collected from the individual (e.g., blood, saliva, biopsy, plasma, serum, bronchoalveolar lavage, sputum, fecal specimen, cerebrospinal fluid, fine needle aspirate, swab sample (e.g., oral swab, cervical swab, nasal swab), interstitial fluid, synovial fluid, nasal discharge, tears, buffy coat, mucosal sample, epithelial cell sample (e.g., epithelial scraping), etc.). In some embodiments, the sample is a cell-free liquid sample. In some embodiments, the sample is a liquid sample that may comprise cells. Pathogens include viruses, fungi, helminths, protozoa, malaria parasites, plasmodium (Plasmodium) parasites, toxoplasma (Toxoplasma) parasites, schistosome (Schistonoma) parasites, and the like. "helminths" include roundworms, heart worms, and phytophagous nematodes (Nematoda), trematodes (trematoda), acanthocellates, and cestodes (Cestoda). Protozoal infections include Giardia species (Giardia spp.) infections, trichomonas species (Trichomonas spp.), trypanosomiasis, amebic dysentery, babesia, balania dysentery, cha Jiashi disease, coccidiosis, malaria, and toxoplasmosis. Examples of pathogens such as parasite/protozoan pathogens include, but are not limited to: plasmodium falciparum (Plasmodium falciparum), plasmodium vivax (Plasmodium vivax), trypanosoma cruzi (Trypanosoma cruzi), and toxoplasma gondii (Toxoplasma gondii). Fungal pathogens include, but are not limited to: cryptococcus neoformans (Cryptococcus neoformans), histoplasma capsulatum (Histoplasma capsulatum), coccidioidomycosis (Coccidioides immitis), blastodermatitidis (Blastomyces dermatitidis), chlamydia trachomatis (Chlamydia trachomatis) and Candida albicans (Candida albicans). Pathogenic viruses include, for example, immunodeficiency viruses (e.g., HIV); influenza virus; dengue fever; west nile virus; herpes virus; yellow fever virus; hepatitis c virus; hepatitis a virus; hepatitis b virus; papilloma virus; etc. Pathogenic viruses may include DNA viruses, for example: papovaviruses (e.g., human Papilloma Virus (HPV), polyomavirus); hepadnaviridae (e.g., hepatitis B Virus (HBV)); herpes viruses (e.g., herpes Simplex Virus (HSV), varicella Zoster Virus (VZV), epstein-Barr virus (EBV), cytomegalovirus (CMV), lymphophilic herpes virus (Pityriasis Rosea), kaposi sarcoma-associated herpes virus); adenoviruses (e.g., thymus, avirus, ichtadenovirus, mammalian adenovirus, sialidase adenovirus); poxviruses (e.g., smallpox, vaccinia virus, monkey pox virus, orf virus, pseudovaccinia, bovine papulostomatitis virus, tanapox virus, yaba monkey tumor virus, infectious soft wart virus (MCV)); parvovirus (e.g., adeno-associated virus (AAV), parvovirus B19, human bocavirus, bufo virus, human parvovirus 4G 1); geminiviridae; the family of nanoviridae; the family of algae viruses; etc. Pathogens may include, for example, DNA viruses [ e.g.: papovaviruses (e.g., human Papilloma Virus (HPV), polyomavirus); hepadnaviridae (e.g., hepatitis B Virus (HBV)); herpes viruses (e.g., herpes Simplex Virus (HSV), varicella Zoster Virus (VZV), epstein Barr Virus (EBV), cytomegalovirus (CMV), lymphophilic herpesvirus, pityriasis rosea, kaposi sarcoma-associated herpesvirus); adenoviruses (e.g., thymus, avirus, ichtadenovirus, mammalian adenovirus, sialidase adenovirus); poxviruses (e.g., smallpox, vaccinia virus, monkey pox virus, orf virus, pseudovaccinia, bovine papulostomatitis virus, tanapox virus, yaba monkey tumor virus, infectious soft wart virus (MCV)); parvoviruses (e.g., adeno-associated virus (AAV), parvovirus B19, human bocavirus, bufaviviridae, human parv 4G 1); geminiviridae; the family of nanoviridae; algae DNA virus family; etc. ], mycobacterium tuberculosis (Mycobacterium tuberculosis), streptococcus agalactiae (Streptococcus agalactiae), methicillin-resistant Staphylococcus aureus, legionella pneumophila (Legionella pneumophila), streptococcus pyogenes, escherichia coli (Escherichia coli), neisseria gonorrhoeae (Neisseria gonorrhoeae), neisseria meningitidis (Neisseria meningitidis), pneumococcus (Pneumococcus), cryptococcus neoformans (Cryptococcus neoformans), histoplasma capsulatum (Histoplasma capsulatum), haemophilus influenzae type B (Hemophilus influenzae B), treponema pallidum (Treponema pallidum), leme's disease spirochete, pseudomonas aeruginosa (Pseudomonas aeruginosa), mycobacterium leptospire (Mycobacterium leprae), brucella abortus (Brucella abortus), rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvovirus respiratory syncytial virus, varicella zoster virus, hepatitis B virus, hepatitis C virus, measles virus, adenovirus, T cell leukemia virus, epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, sindbis virus, lymphocytic choriomeningitis virus, wart virus, bluetongue virus, sendai virus, feline leukemia virus, reovirus, poliovirus, simian virus 40, mouse mammary tumor virus, dengue virus, rubella virus, west Nile virus, plasmodium falciparum (Plasmodium falciparum), plasmodium vivax, toxoplasma gondii, trypanosoma rangeli, cruz trytis (Trypanosoma cruzi), trypanosoma robusta (Trypanosoma rhodesiense), trypanosoma brucei (Trypanosoma brucei), schistosoma mansoni (Schistosoma mansoni), schistosoma japonicum (Schistosoma japonicum), babesia bovis (babisia bovis), eimeria tenella (Eimeria tenella), filarial (Onchocerca volvulus), leishmania tropicalis (Leishmania tropica), mycobacterium tuberculosis (Mycobacterium tuberculosis), trichina (Trichinella spiralis), taylor minutissima (Theileria parva), taenia tenacissifolia (Taenia hydatigena), taenia ovis (Taenia ovis), taenia tenacissima (Taenia samita), echinococcus granulosa (Echinococcus granulosus), midwia kohlrabi (Mesocestoides corti), mycoplasma arthritis (Mycoplasma arthritidis), mycoplasma hyorhini (M.hyorhinis), mycoplasma stomatae (M.orale), mycoplasma argyi (M.arginii), mycoplasma hyopneumoniae (Acholeplasma laidlawii), mycoplasma salivarium (M.salii) and mycoplasma pneumoniae (M.M.pneumonitium). Pathogenic viruses may include one or more of SARS-CoV-2, influenza A, influenza B and/or influenza C.

The sample may be a biological sample, such as a clinical sample. In some embodiments, the sample is taken from a biological source, such as vagina, urethra, penis, anus, throat, cervix, fermentation broth (fermentation broths), cell culture, and the like. The sample may include, for example, fluid and cells from a fecal sample. The biological sample may be used (1) as is obtained from a subject or source or (2) after pretreatment to modify the characteristics of the sample. Thus, the test sample may be pre-treated prior to use, for example, by disrupting cells or virus particles, preparing a liquid from a solid material, diluting a viscous fluid, filtering a liquid, concentrating a liquid, inactivating interfering components, adding reagents, purifying nucleic acids, and the like. Thus, a "biological sample" as used herein includes nucleic acids (DNA, RNA, or total nucleic acids) extracted from a clinical or biological sample. Sample preparation may also include the use of solutions containing buffers, salts, detergents, and/or the like for preparing the sample for analysis. In some embodiments, the sample is processed prior to molecular testing. In some embodiments, the sample is directly analyzed and no pretreatment is performed prior to testing. The sample may be, for example, a fecal sample. In some embodiments, the sample is a fecal sample from a patient with clinical symptoms of acute gastroenteritis.

In some embodiments, the sample to be tested is treated prior to performing the methods disclosed herein. For example, in some embodiments, the sample may be isolated, concentrated, or subjected to various other processing steps prior to performing the methods disclosed herein. For example, in some embodiments, the sample may be treated to isolate nucleic acids from the sample prior to contacting the sample with the oligonucleotides, as disclosed herein. In some embodiments, the methods disclosed herein are performed on a sample without in vitro culturing the sample. In some embodiments, the sample is subjected to the methods disclosed herein without isolating nucleic acids from the sample prior to contacting the sample with the oligonucleotides disclosed herein.

The sample may comprise one or more nucleic acids (e.g., more than one nucleic acid). The term "more than one" as used herein may refer to two or more. Thus, in some embodiments, a sample comprises two or more (e.g., 3 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more) nucleic acids (e.g., gDNA, mRNA). The disclosed methods can be used as very sensitive methods for detecting the presence of a target nucleic acid in a sample (e.g., in a complex mixture of nucleic acids such as gDNA). In some embodiments, the sample comprises 5 or more nucleic acids (e.g., 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more nucleic acids) that differ in sequence from one another. In some embodiments, the sample comprises 10 or more, 20 or more, 50 or more More, 100 or more, 500 or more, 10 ³ One or more, 5X10 ³ One or more of 10 ⁴ One or more, 5X10 ⁴ One or more of 10 ⁵ One or more, 5X10 ⁵ One or more of 10 ⁶ One or more, 5X10 ⁶ One or more or 10 ⁷ Or more nucleic acids.

In some embodiments, the sample comprises 10 to 20, 20 to 50, 50 to 100, 100 to 500, 500 to 10 ³ Seed, 10 ³ Seed to 5x10 ³ Seed, 5x10 ³ Seed to 10 ⁴ Seed, 10 ⁴ Seed to 5x10 ⁴ Seed, 5x10 ⁴ Seed to 10 ⁵ Seed, 10 ⁵ Seed to 5x10 ⁵ Seed, 5x10 ⁵ Seed to 10 ⁶ Seed, 10 ⁶ Seed to 5x10 ⁶ Species, or 5x10 ⁶ Seed to 10 ⁷ Species, or more than 10 ⁷ A nucleic acid. In some embodiments, the sample comprises 5 to 10 ⁷ Seed nucleic acids (e.g., sequences differing from each other) (e.g., 5 to 10 ⁶ Seed, 5 to 10 ⁵ Seed, 5 to 50,000 seed, 5 to 30,000 seed, 10 to 10 seed ⁶ Seed, 10 to 10 ⁵ Seed, 10 to 50,000 seed, 10 to 30,000 seed, 20 to 10 seed ⁶ Seed, 20 to 10 ⁵ Seed, 20 to 50,000, or 20 to 30,000 nucleic acids, or a number or range between any two of these values). In some embodiments, the sample comprises 20 or more nucleic acids that differ in sequence from one another.

The sample may be any sample comprising nucleic acid (e.g., to determine whether target nucleic acid is present in a population of nucleic acids). The sample may be derived from any source, e.g., the sample may be a synthetic combination of purified nucleic acids; the sample may be a cell lysate, a DNA-enriched cell lysate, or nucleic acids isolated and/or purified from a cell lysate. The sample may be from a patient (e.g., for diagnostic purposes). The sample may be from permeabilized cells. The sample may be from crosslinked cells. The sample may be a tissue slice. The sample may be from a tissue prepared by cross-linking followed by degreasing and conditioning to form a uniform refractive index.

The sample may comprise a target nucleic acid and more than one non-target nucleic acid. In some embodiments, the target nucleic acid is in one copy per 10 non-target nucleic acids, one copy per 20 non-target nucleic acids, one copy per 25 non-target nucleic acids, one copy per 50 non-target nucleic acids, one copy per 100 non-target nucleic acids, one copy per 500 non-target nucleic acids, one copy per 10 ³ One copy of each non-target nucleic acid, 10 per 5x ³ One copy per 10 of non-target nucleic acid ⁴ One copy of each non-target nucleic acid, 10 per 5x ⁴ One copy per 10 of non-target nucleic acid ⁵ One copy of each non-target nucleic acid, 10 per 5x ⁵ One copy per 10 of non-target nucleic acid ⁶ One copy per 10 of non-target nucleic acid ⁶ Fewer than one copy of the non-target nucleic acid or numbers or ranges between any two of these values are present in the sample. In some embodiments, the target nucleic acid is copied from 10 non-target nucleic acids to 20 non-target nucleic acids, from 20 non-target nucleic acids to 50 non-target nucleic acids, from 50 non-target nucleic acids to 100 non-target nucleic acids, from 100 non-target nucleic acids to 500 non-target nucleic acids, from 500 non-target nucleic acids to 10 non-target nucleic acids ³ One copy per 10 of non-target nucleic acid ³ One copy of each non-target nucleic acid to every 5X 10 ³ One copy of each non-target nucleic acid, 10 per 5x ³ One copy of each non-target nucleic acid to every 10 ⁴ One copy per 10 of non-target nucleic acid ⁴ One copy of each non-target nucleic acid to every 10 ⁵ One copy per 10 of non-target nucleic acid ⁵ One copy of each non-target nucleic acid to every 10 ⁶ One copy per 10 of non-target nucleic acid ⁶ One copy of each non-target nucleic acid to every 10 ⁷ A copy of a non-target nucleic acid or a number or range between any two of these values is present in the sample.

Suitable samples include, but are not limited to, saliva, blood, serum, plasma, urine, aspirate, and biopsy samples. Thus, the term "sample" in relation to a patient encompasses blood and other liquid samples of biological origin, solid tissue samples such as biopsy samples or tissue cultures or cells derived therefrom and their progeny. The definition also includes samples that are manipulated in any way after they are obtained, such as by treating, washing or enriching certain cell populations, such as cancer cells, with reagents. The definition also includes samples that have been enriched for a particular type of molecule (e.g., nucleic acid). The term "sample" encompasses biological samples, such as clinical samples, such as blood, plasma, serum, aspirate, cerebrospinal fluid (CSF), and also includes tissue obtained by surgical excision, tissue obtained by biopsy, cells in culture, cell supernatants, cell lysates, tissue samples, organs, bone marrow, and the like. "biological sample" includes biological fluids derived therefrom (e.g., cancerous cells, infected cells, etc.), such as nucleic acid-containing samples obtained from such cells (e.g., cell lysates or other cell extracts containing nucleic acids).

Suitable samples for use in the methods disclosed herein include any conventional biological sample obtained from an organism or portion thereof (such as plants, animals, bacteria, etc.). In certain embodiments, the biological sample is obtained from an animal subject, such as a human subject. Biological samples are any solid or fluid samples obtained from, excreted or secreted by, any living organism, including but not limited to single cell organisms such as bacteria, yeasts, protozoa, and amoebas, etc., multicellular organisms such as plants or animals, including samples from healthy or seemingly healthy human subjects or human patients affected by a condition or disease to be diagnosed or studied such as infection by a pathogenic microorganism such as a pathogenic bacterium or virus. For example, the biological sample may be a biological fluid obtained from: such as blood, plasma, serum, urine, stool, sputum, mucus, lymph, synovial fluid, bile, ascites, pleural effusion, seroma, saliva, cerebrospinal fluid, aqueous humor, or vitreous humor, or any bodily secretion, leakage, exudate (e.g., fluid obtained from an abscess or any other site of infection or inflammation) or fluid obtained from a joint (e.g., a normal joint, or a joint affected by a disease such as rheumatoid arthritis, osteoarthritis, gout, or septic arthritis), or a swab of a skin or mucosal surface.

The sample may also be a sample obtained from any organ or tissue (including biopsy or autopsy samples, such as tumor biopsies), or may include cells (whether primary or cultured) or media conditioned by any cell, tissue or organ. Exemplary samples include, but are not limited to, cells, cell lysates, blood smears, cell centrifuge preparations, cytological smears, bodily fluids (e.g., blood, plasma, serum, saliva, sputum, urine, bronchoalveolar lavage, semen, etc.), tissue biopsies (e.g., tumor biopsies), fine needle aspirates, and/or tissue sections (e.g., cryostat tissue sections and/or paraffin embedded tissue sections). In other examples, the sample comprises circulating tumor cells (which can be identified by cell surface markers). In particular examples, the sample is used directly (e.g., fresh or frozen), or may be manipulated prior to use, for example, by fixation (e.g., using formalin) and/or embedding in wax (such as Formalin Fixed Paraffin Embedded (FFPE) tissue samples). It will be appreciated that any method of obtaining tissue from a subject may be utilized, and that the choice of method used will depend on a variety of factors, such as the type of tissue, the age of the subject, or the procedures available to the practitioner. Standard techniques for obtaining such samples are available in the art.

The sample may be an environmental sample, such as water, soil, or a surface, such as an industrial or medical surface.

Due to the increased sensitivity of the embodiments disclosed herein, in certain example embodiments, assays and methods may be run on crude samples or samples in which the target molecules to be detected are not further fractionated or purified from the sample.

Cells can be lysed to release target molecules (e.g., target dsDNA). Cell lysis may be accomplished by any of a variety of means, such as by chemical or biochemical means, by osmotic shock, or by thermal, mechanical or optical lysis means. Cells can be lysed by adding a cell lysis buffer comprising a detergent (e.g., SDS, lithium dodecyl sulfate, triton X-100, tween-20, or NP-40), an organic solvent (e.g., methanol or acetone), or a digestive enzyme (e.g., proteinase K, pepsin, or trypsin), or any combination thereof. To increase association of the target with the barcode, the diffusion rate of the target molecule may be altered by, for example, reducing the temperature of the lysate and/or increasing the viscosity of the lysate.

In some embodiments, filter paper may be used to lyse the sample. The filter paper may be soaked with lysis buffer on top of the filter paper. The filter paper may be applied to the sample with pressure, which may facilitate cleavage of the sample and hybridization of the target of the sample to the substrate.

In some embodiments, the cleavage may be performed by mechanical cleavage, thermal cleavage, optical cleavage, and/or chemical cleavage. Chemical cleavage may include the use of digestive enzymes such as proteinase K, pepsin and trypsin. Lysis may be performed by adding a lysis buffer to the substrate. The lysis buffer may comprise Tris HCl. The lysis buffer may comprise at least about 0.01M, 0.05M, 0.1M, 0.5M, or 1M or more Tris HCl. The lysis buffer may comprise up to about 0.01M, 0.05M, 0.1M, 0.5M, or 1M or more Tris HCl. The lysis buffer may comprise about 0.1M Tris HCl. The pH of the lysis buffer may be at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or higher. The pH of the lysis buffer may be up to about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or higher. In some embodiments, the pH of the lysis buffer is about 7.5. The lysis buffer may comprise a salt (e.g., liCl). The salt concentration in the lysis buffer may be at least about 0.1M, 0.5M, or 1M or higher. The salt concentration in the lysis buffer may be up to about 0.1M, 0.5M, or 1M or higher. In some embodiments, the concentration of salt in the lysis buffer is about 0.5M. The lysis buffer may comprise a detergent (e.g., SDS, lithium dodecyl sulfate, triton X, tween, NP-40). The detergent concentration in the lysis buffer may be at least about 0.0001%, 0.0005%, 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6% or 7% or more. The detergent concentration in the lysis buffer may be up to about 0.0001%, 0.0005%, 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6% or 7% or more. In some embodiments, the detergent concentration in the lysis buffer is about 1% lithium dodecyl sulfate. The time used in the lysis method may depend on the amount of detergent used. In some embodiments, the more detergent used, the less time is required for lysis. The lysis buffer may comprise a chelating agent (e.g., EDTA, EGTA). The chelating agent concentration in the lysis buffer may be at least about 1mM, 5mM, 10mM, 15mM, 20mM, 25mM, or 30mM or more. The chelating agent concentration in the lysis buffer may be up to about 1mM, 5mM, 10mM, 15mM, 20mM, 25mM, or 30mM or more. In some embodiments, the concentration of chelating agent in the lysis buffer is about 10mM. The lysis buffer may contain a reducing agent (e.g., beta-mercaptoethanol, DTT). The concentration of reducing agent in the lysis buffer may be at least about 1mM, 5mM, 10mM, 15mM, or 20mM or more. The concentration of reducing agent in the lysis buffer may be up to about 1mM, 5mM, 10mM, 15mM, or 20mM or more. In some embodiments, the concentration of reducing agent in the lysis buffer is about 5mM. In some embodiments, the lysis buffer may comprise about 0.1M Tris HCl, about pH 7.5, about 0.5M LiCl, about 1% lithium dodecyl sulfate, about 10mM EDTA and about 5mM DTT.

The cleavage may be carried out at a temperature of about 4 ℃, 10 ℃, 15 ℃, 20 ℃, 25 ℃ or 30 ℃. The lysis may be performed for about 1 minute, 5 minutes, 10 minutes, 15 minutes, or 20 minutes or more. Lysed cells may include at least about 100000, 200000, 300000, 400000, 500000, 600000, or 700000 or more target nucleic acid molecules. Lysed cells may include up to about 100000, 200000, 300000, 400000, 500000, 600000 or 700000 or more target nucleic acid molecules.

Kit for detecting a substance in a sample

Kits described herein may comprise: more than one protein complex. In some embodiments, each of the more than one protein complexes comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target double-stranded DNA (dsDNA). In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. In some embodiments, the binding sites of each of the more than one protein complexes are different from each other. In some embodiments, the kit comprises: at least one component that provides real-time detection activity for the nucleic acid amplification product. Real-time detection activity may be provided by molecular beacons. The dried composition may comprise reverse transcriptase and/or reverse transcription primers.

The kit may comprise, for example, one or more polymerases and one or more primers, and optionally one or more reverse transcriptases and/or reverse transcription primers, as described herein. When a target is amplified, a pair of primers (forward and reverse) may be included in the kit. In the case of amplifying more than one target sequence, more than one primer pair may be included in the kit. The kit may comprise control polynucleotides and in the case of amplifying more than one target sequence, more than one control polynucleotide may be included in the kit.

The kit may also contain one or more components in any number of separate vessels, chambers, containers, packets, tubes, vials, microtiter plates, etc., or the components may be combined in various combinations in such a container. For example, the components of the kit may be present in one or more containers. In some embodiments, all components are provided in one container. In some embodiments, the enzyme (e.g., polymerase and/or reverse transcriptase) may be provided in a separate container from the primer. The components may be lyophilized, heat dried, freeze dried or in a stabilizing buffer, for example. In some embodiments, the polymerase and/or reverse transcriptase are present in lyophilized or heat-dried form in a single container, and the primer is lyophilized, heat-dried, freeze-dried, or present in a buffer in a different container. In some embodiments, the polymerase and/or reverse transcriptase and the primer are in a single container in lyophilized form or in heat dried form.

The kit may also comprise, for example, dntps used in the reaction, or modified nucleotides, vessels, cuvettes or other containers for the reaction, or vials of water or buffer for rehydrating lyophilized or heat-dried components. For example, the buffers used may be suitable for both polymerase and primer annealing activities.

The kit may also comprise instructions for performing one or more methods described herein and/or descriptions of one or more components described herein. The instructions and/or descriptions may be in printed form and may be contained in a kit insert. The kit may also contain a written description of the internet location providing such instructions or descriptions.

The kit may further comprise reagents for detection methods, such as reagents for FRET, lateral flow devices, test strips, fluorescent dyes, colloidal gold particles, latex particles, molecular beacons or polystyrene beads.

Fig. 1, 3, 4, 5A-5F, and 7A-7H of the present disclosure were created with a biorender.

Examples

Some aspects of the embodiments discussed above are disclosed in more detail in the following examples, which are not intended to limit the scope of the disclosure in any way.

Example 1

Design and validation of fusion proteins and guide RNAs (sgrnas)

Four constructs were designed for the production of fusion proteins: dCAS9-Fl26-Tn5, dCAS9-xTen-Tn5, tn5-Fl26-dCAS9, tn5-xTen-dCAS9 (see, e.g., FIGS. 8-10). These constructs have dCas9 or Tn5 sequences at the N-terminus of the fusion protein separated by a Fl26 linker or an xTen linker. In some embodiments, the plasmid design is based on the following: "Chen, s.p. & Wang, h.h. (2019) & An Engineered Cas-Transposon System for Programmable and Site-Directed DNA transfer.the CRISPR journal.vol 2,Number 6.DOI:10.1089/crispr.2019.0030 and Picelli s., bjorklund, a.k., reinius, b., sgasser, s., wingerb, g., & Sandbert, r. (2014)"; "Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome research.24:2033-2040.ISSN 1088-9051/14".

sgRNA design

Sgrnas targeting the salmonella enterica InvA and FliC genes were designed. Sequences from salmonella enterica strain ATCC 13311 were used. The sgrnas were designed using a tool of Integrated DNA Technologies (IDT) (table 1). The relative positions of the sgrnas for the InvA and FliC genes are shown in fig. 11 and 12, respectively.

Table 1: salmonella enterica sgRNA

Fragments of 264bp, 8bp, 148bp, 292bp, 458bp and 195bp were predicted for InvA. Fragments of about 130bp, 82bp and 232bp are expected for FliC.

Verification of Salmonella enterica sgRNA

To verify the specificity of sgrnas, genomic samples were cleaved with Cas 9. The adaptors were ligated to Cas 9-cleaved DNA and the PCR-amplified fragments were observed by bioanalyzer.

Table 2: bioanalytical analysis of sgRNA activity

FIG. 13 and Table 2 show that cleavage in gDNA is specific for the expected size (compare the "biological analyzer expected size [ bp ] column and the" actual [ bp ] column in Table 2), thus demonstrating that the guide RNA for Salmonella enterica is functional.

Next, sgRNAs targeting human genes EXT1, BCL9, HOXA13, HOXD11 and OLIG2 were designed for a total of 10 sgRNAs (tables 3A-3C). sgrnas were designed using GenScript tools.

Table 3A: human sgRNA targets

Table 3B: human sgRNA targets

Table 3C: human sgRNA targets

sgrnas were also designed to target chlamydia trachomatis gene polymorphic membrane protein a (pmp a) (table 4). A total of 5 sgRNAs were designed using the IDT tool.

Table 4: chlamydia trachomatis sgRNA target

Verification of transposase Tn5

FIGS. 14-15 show that Tn5 can mix designed adapter A (5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3', SEQ ID NO: 27) with adapter B (5 ' -GTCTCGTGGGCTCGG) AGATGTGTATAAGAGACAG-3', SEQ ID NO: 28) was ligated to the DNA fragment for PCR amplification, demonstrating functionality. First, gDNA was cut and stuck from Salmonella enterica using Tn5 with custom adaptors. The labeled fragments were then amplified by PCR. The data in FIGS. 14-15 indicate that Tn5 transposase is loaded with custom adaptors.

Verification of fusion proteins

Recombinant expression of dCAS9-Fl26-Tn5, dCAS9-xTen-Tn5, tn5-Fl26-dCAS9, tn5-xTen-dCAS9, and then purification. In some embodiments, the recombinant protein is isolated from a cleavage moiety (intein) on a chitin column. The purified fusion proteins were analyzed for predicted size and purity on SDS-PAGE gels (FIGS. 16-21).

SDS-PAGE analysis of dCAS9-Fl26-Tn5 is shown in FIG. 16. The samples were observed to have the following purities: >80%. In some embodiments, the fusion protein may also comprise an intein domain. Analysis by the bioanalyzer in fig. 17 showed that a portion of the protein produced (peak at 44.91) was the correct size (without intein).

SDS-PAGE analysis of dCAS9-xTen-Tn5 is shown in FIG. 18. The samples were observed to have the following purities: >70%. In some embodiments, the fusion protein may also comprise an intein domain, resulting in a larger size than desired. Analysis by the bioanalyzer in fig. 19 showed that a portion of the protein produced (peak at 44.62) was the correct size (without intein).

FIG. 20 depicts SDS-PAGE analysis of recombinantly expressed and purified Tn5-Fl26-dCAS 9. FIG. 21 depicts SDS-PAGE analysis of recombinantly expressed and purified Tn 5-xTen-dmas 9. The samples were observed to have the following purities: >65%.

Testing the functionality of fusion proteins

dAS 9-Fl26-Tn5 and dAS 9-xTen-Tn5 were tested for functionality. The scheme is as follows: (1) loading sgrnas and adaptors into fusion proteins (using human sgrnas unless otherwise indicated), (2) directing enzymatic fragmentation (Guided Tagmentation), (3) purification, (4) PCR amplification, (5) Quality Control (QC), and (6) analysis of the results.

Loading of sgrnas and adaptors into fusion proteins

The fusion protein (1 molecule dCAS9-Tn5 to 1 sgRNA to 2 adaptors) was loaded in a 1:1:2 ratio. The mixture was incubated at 24℃for 30 min.

Guiding cleavage by enzymatic cleavage

100mM dCAS9-Tn5 (6.02 e10 molecules) and 500ng human gDNA (1.52e5 molecules) were combined, the ratio of gDNA to dCAS9-Tn5 being 1 to 3.95e5. Incubating the mixture: incubation at 37 ℃ for 60 minutes and at 55 ℃ for 60 minutes to produce tagged fragments. Several incubation methods were tried, and in some embodiments dCas9 may function in the range of 25 ℃ to 42 ℃ and Tn5 may function in the range of 37 ℃ to 60 ℃. The PCR amplification procedure is shown in Table 5.

Table 5: PCR amplification

Figure 22 depicts data related to Cas 9-only control reactions. Visible lines show the tape station analysis of Cas9 digested DNA. Analysis of the sample after the PCR amplification reaction showed no signal. This data suggests that Cas9 itself cannot add an adapter to the 5 'or 3' end of a DNA fragment.

FIGS. 23-24 show the PCR amplification results after digestion and ligation of adaptors with dCS 9-Fl26-Tn5 or dCS 9-xTen-Tn5, respectively. The arrows in the figure point to the signal of the post-PCR sample. PCR amplification was detected only if both fusion proteins (dCAS 9-Fl26-Tn5 and dCAS9-xTen-Tn 5) could be transposed (e.g., adaptors (adaptors B) were added at the 5 'and 3' ends of the DNA molecules).

Results

The results indicate that Tn5 is able to add custom adaptors to human gDNA. Only Cas9 control showed that Tn5 was required for this process to amplify. These results show the functionality of Tn5 fused to dCas 9.

Fusion protein and DNA ratio test

Next, the effect of reducing the gDNA to Cas9-Tn5 ratio was tested. The DNA concentration was kept constant while the Cas-Tn fusion protein concentration was reduced: 100nM (194,071 dCAS9-Tn5 molecules to 1 DNA genome copy), 1nM (1, 940:1), 100pM (194:1), 10pM (19.4:1), 1pM (1.94:1). The results are shown in fig. 25-31. FIG. 25 depicts the results of PCR amplification after a directed enzymatic fragmentation reaction using dCAS9-Tn5 at a ratio of 194,071:1 and shows broad peaks after PCR, indicating non-specific enzymatic fragmentation. Decreasing the amount of dCas9-Tn5 (fig. 26-31) resulted in a detectable peak from the PCR reaction, indicating that decreasing the ratio of fusion protein to DNA increased the specificity of enzymatic fragmentation.

Results

The results indicate that Tn5 is able to add custom adaptors to human gDNA. Only Cas9 control showed that Tn5 was required for this process to amplify DNA. Tn5 proved to be functional and evidence exists to direct transposition. Thus, there is evidence that the fusion protein comprises both dCas9 and Tn5 activity.

Fusion proteins and sgrnas for salmonella enterica

FIGS. 38-39 depict guided cleavage fragmentation of Salmonella enterica sgRNA using for dCS 9-xTen-Tn 5. The data indicate that the addition of sgrnas increases specificity. FIG. 39 shows that guided cleavage fragmentation without sgRNA is random. FIG. 38 shows that the addition of sgRNA confers specificity.

Example 2

Sample library preparation

Guiding the digestion of library by enzymatic cleavage

Described herein are methods and compositions for generating libraries for sequencing on Illumina NextSeq.

3 libraries were prepared using ligation-based methods (fig. 37, 40, 42A-42B), where a single adaptor (e.g., adaptor B, using either Tn5 alone or dCas9-Tn5 fusion) was used to add the nebnet sequencing adaptors after the cleavage step, and two libraries were prepared using guided cleavage-based methods (fig. 41, 43-44), where sequences required for NGS were included in the guided cleavage step on adaptors a and B. All libraries were prepared using human sgrnas. dCAS9-Fl26-Tn5 fusion proteins were used to direct cleavage fragmentation by enzymatic methods. In these experiments, the DNA was incubated with dCS-Tn 5 in either a long or short incubation protocol. For the short protocol, the reaction was incubated at 30 ℃ for 30 minutes, and then at 37 ℃ for 30 minutes. For the long protocol, the reaction was incubated at 30 ℃ for 30 minutes, then at 38 ℃ for 60 minutes, and then at 55 ℃ for 60 minutes.

FIG. 32 shows highly multiplexed single primer DNA amplification using Tn5 alone. Analysis by a bioanalyzer showed non-specific DNA amplification by PCR, indicating that DNA could be amplified using only 1 primer (adapter B).

FIG. 33 (short incubation protocol) and FIG. 34 (long incubation protocol) show evidence supporting highly multiplexed single primer DNA amplification using dCS 9-Tn fusion protein. Bioanalyzer analysis of PCR amplification showed that only 1 primer (adapter B) was used to specifically amplify several DNA fragments simultaneously.

Fig. 35 (long incubation protocol) and fig. 36 (short incubation protocol) show evidence supporting custom locus specific sequencing library preparation. Analysis by a bioanalyzer indicated that a sequencing library could be created. Addition of adaptors a and B required for sequencing in the Illumina platform indicated that sequencing libraries could be created using guided enzymatic fragmentation.

In at least some of the previously described embodiments, one or more elements used in one embodiment may be used interchangeably in another embodiment unless such substitution is technically not feasible. Those skilled in the art will appreciate that various other omissions, additions and modifications may be made to the methods and structures described above without departing from the scope of the claimed subject matter. All such modifications and changes are intended to fall within the scope of the subject matter defined by the appended claims.

With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. For clarity, various singular/plural permutations may be explicitly set forth herein. As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Any reference herein to "or" is intended to encompass "and/or" unless otherwise specified.

Those skilled in the art will understand that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims), are generally intended as "open" terms (e.g., the term "including" should be interpreted as "including but not limited to (including but not limited to)", the term "having" should be interpreted as "having at least (having at least)", the term "including" should be interpreted as "including but not limited to (includes but is not limited to)", and so forth. Those skilled in the art will further understand that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., "a" and/or "an" should be interpreted to mean "at least one" or "one or more"); the same holds true for the use of definite articles to introduce claim recitations. Furthermore, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, means at least two recitations, or two or more recitations). Further, in those instances where a convention analogous to "at least one of A, B and C, etc." is used, such a syntactic structure is generally intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B and C together, etc.). In those instances where a convention analogous to "at least one of A, B or C, etc." is used, such a syntactic structure is generally intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B and C together, etc.). Those skilled in the art will further appreciate that, in fact, any separating word and/or expression presenting two or more alternative terms, whether in the specification, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms.

Further, when features or aspects of the present disclosure are described in terms of Markush groups (Markush groups), those skilled in the art will appreciate that the present disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.

As will be understood by those of skill in the art, for any and all purposes, such as in providing a written description, all ranges disclosed herein also include any and all possible subranges and combinations of subranges of the range. Any listed range can be readily identified as sufficiently descriptive and that the same range can be broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each of the ranges discussed herein can be readily broken down into a lower third, a middle third, an upper third, and the like. As will also be understood by those skilled in the art, all language such as "up to", "at least", "greater than", "less than" and the like include the stated numbers and refer to ranges that may be subsequently broken down into subranges as discussed above. Finally, as will be appreciated by those skilled in the art, a range includes members of each individual. Thus, for example, a group of 1-3 items refers to a group of 1, 2, or 3 items. Similarly, a group of 1-5 items refers to a group of 1, 2, 3, 4, or 5 items, and so forth.

While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Sequence listing

<110> Beckton Di-Kirson Co Ltd

Alva Luoge Di Neiss

<120> preparation method of nucleic acid sequencing library

<130> 68EB-317326-WO

<150> US 63/189,032

<151> 2021-05-14

<150> US 63/243,443

<151> 2021-09-13

<160> 45

<170> PatentIn version 3.5

<210> 1

<211> 12339

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> 3XFlag-Cas9-Fl26-Tn5

<400> 1

atacactccg ctatcgctac gtgactgggt catggctgcg ccccgacacc cgccaacacc 60

cgctgacgcg ccctgacggg cttgtctgct cccggcatcc gcttacagac aagctgtgac 120

cgtctccggg agctgcatgt gtcagaggtt ttcaccgtca tcaccgaaac gcgcgaggca 180

gctgcggtaa agctcatcag cgtggtcgtg cagcgattca cagatgtctg cctgttcatc 240

cgcgtccagc tcgttgagtt tctccagaag cgttaatgtc tggcttctga taaagcgggc 300

catgttaagg gcggtttttt cctgtttggt cactgatgcc tccgtgtaag ggggatttct 360

gttcatgggg gtaatgatac cgatgaaacg agagaggatg ctcacgatac gggttactga 420

tgatgaacat gcccggttac tggaacgttg tgagggtaaa caactggcgg tatggatgcg 480

gcgggaccag agaaaaatca ctcagggtca atgccagccg aacgccagca agacgtagcc 540

cagcgcgtcg gccgccatgc cggcgataat ggcctgcttc tcgccgaaac gtttggtggc 600

gggaccagtg acgaaggctt gagcgagggc gtgcaagatt ccgaataccg caagcgacag 660

gccgatcatc gtcgcgctcc agcgaaagcg gtcctcgccg aaaatgaccc agagcgctgc 720

cggcacctgt cctacgagtt gcatgataaa gaagacagtc ataagtgcgg cgacgatagt 780

catgccccgc gcccaccgga aggagctgac tgggttgaag gctctcaagg gcatcggtcg 840

agatcccggt gcctaatgag tgagctaact tacattaatt gcgttgcgct cactgcccgc 900

tttccagtcg ggaaacctgt cgtgccagct gcattaatga atcggccaac gcgcggggag 960

aggcggtttg cgtattgggc gccagggtgg tttttctttt caccagtgag acgggcaaca 1020

gctgattgcc cttcaccgcc tggccctgag agagttgcag caagcggtcc acgctggttt 1080

gccccagcag gcgaaaatcc tgtttgatgg tggttaacgg cgggatataa catgagctgt 1140

cttcggtatc gtcgtatccc actaccgaga tatccgcacc aacgcgcagc ccggactcgg 1200

taatggcgcg cattgcgccc agcgccatct gatcgttggc aaccagcatc gcagtgggaa 1260

cgatgccctc attcagcatt tgcatggttt gttgaaaacc ggacatggca ctccagtcgc 1320

cttcccgttc cgctatcggc tgaatttgat tgcgagtgag atatttatgc cagccagcca 1380

gacgcagacg cgccgagaca gaacttaatg ggcccgctaa cagcgcgatt tgctggtgac 1440

ccaatgcgac cagatgctcc acgcccagtc gcgtaccgtc ttcatgggag aaaataatac 1500

tgttgatggg tgtctggtca gagacatcaa gaaataacgc cggaacatta gtgcaggcag 1560

cttccacagc aatggcatcc tggtcatcca gcggatagtt aatgatcagc ccactgacgc 1620

gttgcgcgag aagattgtgc accgccgctt tacaggcttc gacgccgctt cgttctacca 1680

tcgacaccac cacgctggca cccagttgat cggcgcgaga tttaatcgcc gcgacaattt 1740

gcgacggcgc gtgcagggcc agactggagg tggcaacgcc aatcagcaac gactgtttgc 1800

ccgccagttg ttgtgccacg cggttgggaa tgtaattcag ctccgccatc gccgcttcca 1860

ctttttcccg cgttttcgca gaaacgtggc tggcctggtt caccacgcgg gaaacggtct 1920

gataagagac accggcatac tctgcgacat cgtataacgt tactggtttc acattcacca 1980

ccctgaattg actctcttcc gggcgctatc atgccatacc gcgaaaggtt ttgcgccatt 2040

cgatggtgtc cgggatctcg acgctctccc ttatgcgact cctgcattag gaagcagccc 2100

agtagtaggt tgaggccgtt gagcaccgcc gccgcaagga atggtgcatg ccggcatgcc 2160

gccctttcgt cttcaagaat taattcccaa ttccccaggc atcaaataaa acgaaaggct 2220

cagtcgaaag actgggcctt tcgttttatc tgttgtttgt cggtgaacgc tctcctgagt 2280

aggacaaatc cgccgggagc ggatttgaac gttgcgaagc aacggcccgg agggtggcgg 2340

gcaggacgcc cgccataaac tgccaggaat taattcccca ggcatcaaat aaaacgaaag 2400

gctcagtcga aagactgggc ctttcgtttt atctgttgtt tgtcggtgaa cgctctcctg 2460

agtaggacaa atccgccggg agcggatttg aacgttgcga agcaacggcc cggagggtgg 2520

cgggcaggac gcccgccata aactgccagg aattaattcc ccaggcatca aataaaacga 2580

aaggctcagt cgaaagactg ggcctttcgt tttatctgtt gtttgtcggt gaacgctctc 2640

ctgagtagga caaatccgcc gggagcggat ttgaacgttg cgaagcaacg gcccggaggg 2700

tggcgggcag gacgcccgcc ataaactgcc aggaattaat tccccaggca tcaaataaaa 2760

cgaaaggctc agtcgaaaga ctgggccttt cgttttatct gttgtttgtc ggtgaacgct 2820

ctcctgagta ggacaaatcc gccgggagcg gatttgaacg ttgcgaagca acggcccgga 2880

gggtggcggg caggacgccc gccataaact gccaggaatt aattccccag gcatcaaata 2940

aaacgaaagg ctcagtcgaa agactgggcc tttcgtttta tctgttgttt gtcggtgaac 3000

gctctcctga gtaggacaaa tccgccggga gcggatttga acgttgcgaa gcaacggccc 3060

ggagggtggc gggcaggacg cccgccataa actgccagga attggggatc ggaattaatt 3120

cccggtttaa accggggatc tcgatcccgc gaaattaata cgactcacta taggggaatt 3180

gtgagcggat aacaattccc ctctagaaat aattttgttt aactttaaga aggagatata 3240

ccatgggtga ttacaaggat cacgatggcg attacaagga tcacgatatc gattacaagg 3300

atgatgatga taagatggat aaaaagtatt ctattggttt agctatcggc acaaatagcg 3360

tcggatgggc ggtgatcact gatgaatata aggttccgtc taaaaagttc aaggttctgg 3420

gaaatacaga ccgccacagt atcaaaaaaa atcttatagg ggctctttta tttgacagtg 3480

gagagacagc ggaagcgact cgtctcaaac ggacagctcg tagaaggtat acacgtcgga 3540

agaatcgtat ttgttatcta caggagattt tttcaaatga gatggcgaaa gtagatgata 3600

gtttctttca tcgacttgaa gagtcttttt tggtggaaga agacaagaag catgaacgtc 3660

atcctatttt tggaaatata gtagatgaag ttgcttatca tgagaaatat ccaactatct 3720

atcatctgcg aaaaaaattg gtagattcta ctgataaagc ggatttgcgc ttaatctatt 3780

tggccttagc gcatatgatt aagtttcgtg gtcatttttt gattgaggga gatttaaatc 3840

ctgataatag tgatgtggac aaactattta tccagttggt acaaacctac aatcaattat 3900

ttgaagaaaa ccctattaac gcaagtggag tagatgctaa agcgattctt tctgcacgat 3960

tgagtaaatc aagacgatta gaaaatctca ttgctcagct ccccggtgag aagaaaaatg 4020

gcttatttgg gaatctcatt gctttgtcat tgggtttgac ccctaatttt aaatcaaatt 4080

ttgatttggc agaagatgct aaattacagc tttcaaaaga tacttacgat gatgatttag 4140

ataatttatt ggcgcaaatt ggagatcaat atgctgattt gtttttggca gctaagaatt 4200

tatcagatgc tattttactt tcagatatcc taagagtaaa tactgaaata actaaggctc 4260

ccctatcagc atcaatgatt aaacgctacg atgaacatca tcaagacttg actcttttaa 4320

aagctttagt tcgacaacaa cttccagaaa agtataaaga aatctttttt gatcaatcaa 4380

aaaacggata tgcaggttat attgatgggg gagctagcca agaagaattt tataaattta 4440

tcaaaccaat tttagaaaaa atggatggta ctgaggaatt attggtgaaa ctaaatcgtg 4500

aagatttgct gcgcaagcaa cggacctttg acaacggctc tattccccat caaattcact 4560

tgggtgagct gcatgctatt ttgagaagac aagaagactt ttatccattt ttaaaagaca 4620

atcgtgagaa gattgaaaaa atcttgactt ttcgaattcc ttattatgtt ggtccattgg 4680

cgcgtggcaa tagtcgtttt gcatggatga ctcggaagtc tgaagaaaca attaccccat 4740

ggaattttga agaagttgtc gataaaggtg cttcagctca atcatttatt gaacgcatga 4800

caaactttga taaaaatctt ccaaatgaaa aagtactacc aaaacatagt ttgctttatg 4860

agtattttac ggtttataac gaattgacaa aggtcaaata tgttactgaa ggaatgcgaa 4920

aaccagcatt tctttcaggt gaacagaaga aagccattgt tgatttactc ttcaaaacaa 4980

atcgaaaagt aaccgttaag caattaaaag aagattattt caaaaaaata gaatgttttg 5040

atagtgttga aatttcagga gttgaagata gatttaatgc ttcattaggt acctaccatg 5100

atttgctaaa aattattaaa gataaagatt ttttggataa tgaagaaaat gaagatatct 5160

tagaggatat tgttttaaca ttgaccttat ttgaagatag ggagatgatt gaggaaagac 5220

ttaaaacata tgctcacctc tttgatgata aggtgatgaa acagcttaaa cgtcgccgtt 5280

atactggttg gggacgtttg tctcgaaaat tgattaatgg tattagggat aagcaatctg 5340

gcaaaacaat attagatttt ttgaaatcag atggttttgc caatcgcaat tttatgcagc 5400

tgatccatga tgatagtttg acatttaaag aagacattca aaaagcacaa gtgtctggac 5460

aaggcgatag tttacatgaa catattgcaa atttagctgg tagccctgct attaaaaaag 5520

gtattttaca gactgtaaaa gttgttgatg aattggtcaa agtaatgggg cggcataagc 5580

cagaaaatat cgttattgaa atggcacgtg aaaatcagac aactcaaaag ggccagaaaa 5640

attcgcgaga gcgtatgaaa cgaatcgaag aaggtatcaa agaattagga agtcagattc 5700

ttaaagagca tcctgttgaa aatactcaat tgcaaaatga aaagctctat ctctattatc 5760

tccaaaatgg aagagacatg tatgtggacc aagaattaga tattaatcgt ttaagtgatt 5820

atgatgtcga tgccattgtt ccacaaagtt tccttaaaga cgattcaata gacaataagg 5880

tcttaacgcg ttctgataaa aatcgtggta aatcggataa cgttccaagt gaagaagtag 5940

tcaaaaagat gaaaaactat tggagacaac ttctaaacgc caagttaatc actcaacgta 6000

agtttgataa tttaacgaaa gctgaacgtg gaggtttgag tgaacttgat aaagctggtt 6060

ttatcaaacg ccaattggtt gaaactcgcc aaatcactaa gcatgtggca caaattttgg 6120

atagtcgcat gaatactaaa tacgatgaaa atgataaact tattcgagag gttaaagtga 6180

ttaccttaaa atctaaatta gtttctgact tccgaaaaga tttccaattc tataaagtac 6240

gtgagattaa caattaccat catgcccatg atgcgtatct aaatgccgtc gttggaactg 6300

ctttgattaa gaaatatcca aaacttgaat cggagtttgt ctatggtgat tataaagttt 6360

atgatgttcg taaaatgatt gctaagtctg agcaagaaat aggcaaagca accgcaaaat 6420

atttctttta ctctaatatc atgaacttct tcaaaacaga aattacactt gcaaatggag 6480

agattcgcaa acgccctcta atcgaaacta atggggaaac tggagaaatt gtctgggata 6540

aagggcgaga ttttgccaca gtgcgcaaag tattgtccat gccccaagtc aatattgtca 6600

agaaaacaga agtacagaca ggcggattct ccaaggagtc aattttacca aaaagaaatt 6660

cggacaagct tattgctcgt aaaaaagact gggatccaaa aaaatatggt ggttttgata 6720

gtccaacggt agcttattca gtcctagtgg ttgctaaggt ggaaaaaggg aaatcgaaga 6780

agttaaaatc cgttaaagag ttactaggga tcacaattat ggaaagaagt tcctttgaaa 6840

aaaatccgat tgacttttta gaagctaaag gatataagga agttaaaaaa gacttaatca 6900

ttaaactacc taaatatagt ctttttgagt tagaaaacgg tcgtaaacgg atgctggcta 6960

gtgccggaga attacaaaaa ggaaatgagc tggctctgcc aagcaaatat gtgaattttt 7020

tatatttagc tagtcattat gaaaagttga agggtagtcc agaagataac gaacaaaaac 7080

aattgtttgt ggagcagcat aagcattatt tagatgagat tattgagcaa atcagtgaat 7140

tttctaagcg tgttatttta gcagatgcca atttagataa agttcttagt gcatataaca 7200

aacatagaga caaaccaata cgtgaacaag cagaaaatat tattcattta tttacgttga 7260

cgaatcttgg agctcccgct gcttttaaat attttgatac aacaattgat cgtaaacgat 7320

atacgtctac aaaagaagtt ttagatgcca ctcttatcca tcaatccatc actggtcttt 7380

atgaaacacg cattgatttg agtcagctag gaggtgacga tgacgataaa gaattcggtg 7440

gcggtggctc tggcggtggt gggagtggag gtgggggatc aggaggaggc ggttcccata 7500

tgattaccag tgcactgcat cgtgcggcgg attgggcgaa aagcgtgttt tctagtgctg 7560

cgctgggtga tccgcgtcgt accgcgcgtc tggtgaatgt tgcggcgcaa ctggccaaat 7620

atagcggcaa aagcattacc attagcagcg aaggcagcaa agccatgcag gaaggcgcgt 7680

atcgttttat tcgtaatccg aacgtgagcg cggaagcgat tcgtaaagcg ggtgccatgc 7740

agaccgtgaa actggcccag gaatttccgg aactgctggc aattgaagat accacctctc 7800

tgagctatcg tcatcaggtg gcggaagaac tgggcaaact gggtagcatt caggataaaa 7860

gccgtggttg gtgggtgcat agcgtgctgc tgctggaagc gaccaccttt cgtaccgtgg 7920

gcctgctgca tcaagaatgg tggatgcgtc cggatgatcc ggcggatgcg gatgaaaaag 7980

aaagcggcaa atggctggcc gctgctgcaa cttcgcgtct gagaatgggc agcatgatga 8040

gcaacgtgat tgcggtgtgc gatcgtgaag cggatattca tgcgtatctg caagataaac 8100

tggcccataa cgaacgtttt gtggtgcgta gcaaacatcc gcgtaaagat gtggaaagcg 8160

gcctgtatct gtatgatcac ctgaaaaacc agccggaact gggcggctat cagattagca 8220

ttccgcagaa aggcgtggtg gataaacgtg gcaaacgtaa aaaccgtccg gcgcgtaaag 8280

cgagcctgag cctgcgtagc ggccgtatta ccctgaaaca gggcaacatt accctgaacg 8340

cggtgctggc cgaagaaatt aatccgccga aaggcgaaac cccgctgaaa tggctgctgc 8400

tgaccagcga gccggtggaa agtctggccc aagcgctgcg tgtgattgat atttataccc 8460

atcgttggcg cattgaagaa tttcacaaag cgtggaaaac gggtgcgggt gcggaacgtc 8520

agcgtatgga agaaccggat aacctggaac gtatggtgag cattctgagc tttgtggcgg 8580

tgcgtctgct gcaactgcgt gaatctttta ctccgccgca agcactgcgt gcgcagggcc 8640

tgctgaaaga agcggaacac gttgaaagcc agagcgcgga aaccgtgctg accccggatg 8700

aatgccaact gctgggctat ctggataaag gcaaacgcaa acgcaaagaa aaagcgggca 8760

gcctgcaatg ggcgtatatg gcgattgcgc gtctgggcgg ctttatggat agcaaacgta 8820

ccggcattgc gagctggggt gcgctgtggg aaggttggga agcgctgcaa agcaaactgg 8880

atggctttct ggccgcgaaa gacctgatgg cgcagggcat taaaatctgc atcacgggag 8940

atgcactagt tgccctaccc gagggcgagt cggtacgcat cgccgacatc gtgccgggtg 9000

cgcggcccaa cagtgacaac gccatcgacc tgaaagtcct tgaccggcat ggcaatcccg 9060

tgctcgccga ccggctgttc cactccggcg agcatccggt gtacacggtg cgtacggtcg 9120

aaggtctgcg tgtgacgggc accgcgaacc acccgttgtt gtgtttggtc gacgtcgccg 9180

gggtgccgac cctgctgtgg aagctgatcg acgaaatcaa gccgggcgat tacgcggtga 9240

ttcaacgcag cgcattcagc gtcgactgtg caggttttgc ccgcgggaaa cccgaatttg 9300

cgcccacaac ctacacagtc ggcgtccctg gactggtgcg tttcttggaa gcacaccacc 9360

gagacccgga cgcccaagct atcgccgacg agctgaccga cgggcggttc tactacgcga 9420

aagtcgccag tgtcaccgac gccggcgtgc agccggtgta tagccttcgt gtcgacacgg 9480

cagaccacgc gtttatcacg aacgggttcg tcagccacgc tactggcctc accggtctga 9540

actcaggcct cacgacaaat cctggtgtat ccgcttggca ggtcaacaca gcttatactg 9600

cgggacaatt ggtcacatat aacggcaaga cgtataaatg tttgcagccc cacacctcct 9660

tggcaggatg ggaaccatcc aacgttcctg ccttgtggca gcttcaatga ctgcaggaag 9720

gggatccggc tgctaacaaa gcccgaaagg aagctgagtt ggctgctgcc accgctgagc 9780

aataactagc ataacccctt ggggcctcta aacgggtctt gaggggtttt ttgctgaaag 9840

gaggaactat atccggataa ctacgtcagg tggcactttt cggggaaatg tgcgcggaac 9900

ccctatttgt ttatttttct aaatacattc aaatatgtat ccgctcatga gacaataacc 9960

ctgataaatg cttcaataat attgaaaaag gaagagtatg agtattcaac atttccgtgt 10020

cgcccttatt cccttttttg cggcattttg ccttcctgtt tttgctcacc cagaaacgct 10080

ggtgaaagta aaagatgctg aagatcagtt gggtgcacga gtgggttaca tcgaactgga 10140

tctcaacagc ggtaagatcc ttgagagttt tcgccccgaa gaacgtttcc caatgatgag 10200

cacttttaaa gttctgctat gtggcgcggt attatcccgt gttgacgccg ggcaagagca 10260

actcggtcgc cgcatacact attctcagaa tgacttggtt gagtactcac cagtcacaga 10320

aaagcatctt acggatggca tgacagtaag agaattatgc agtgctgcca taaccatgag 10380

tgataacact gcggccaact tacttctgac aacgatcgga ggaccgaagg agctaaccgc 10440

ttttttgcac aacatggggg atcatgtaac tcgccttgat cgttgggaac cggagctgaa 10500

tgaagccata ccaaacgacg agcgtgacac cacgatgcct gtagcaatgg caacaacgtt 10560

gcgcaaacta ttaactggcg aactacttac tctagcttcc cggcaacaat taatagactg 10620

gatggaggcg gataaagttg caggaccact tctgcgctcg gcccttccgg ctggctggtt 10680

tattgctgat aaatctggag ccggtgagcg tgggtctcgc ggtatcattg cagcactggg 10740

gccagatggt aagccctccc gtatcgtagt tatctacacg acggggagtc aggcaactat 10800

ggatgaacga aatagacaga tcgctgagat aggtgcctca ctgattaagc attggtaact 10860

gtcagaccaa gtttactcat atatacttta gattgattta ccccggttga taatcagaaa 10920

agccccaaaa acaggaagat tgtataagca aatatttaaa ttgtaaacgt taatattttg 10980

ttaaaattcg cgttaaattt ttgttaaatc agctcatttt ttaaccaata ggccgaaatc 11040

ggcaaaatcc cttataaatc aaaagaatag cccgagatag ggttgagtgt tgttccagtt 11100

tggaacaaga gtccactatt aaagaacgtg gactccaacg tcaaagggcg aaaaaccgtc 11160

tatcagggcg atggcccact acgtgaacca tcacccaaat caagtttttt ggggtcgagg 11220

tgccgtaaag cactaaatcg gaaccctaaa gggagccccc gatttagagc ttgacgggga 11280

aagccggcga acgtggcgag aaaggaaggg aagaaagcga aaggagcggg cgctagggcg 11340

ctggcaagtg tagcggtcac gctgcgcgta accaccacac ccgccgcgct taatgcgccg 11400

ctacagggcg cgtaaaagga tctaggtgaa gatccttttt gataatctca tgaccaaaat 11460

cccttaacgt gagttttcgt tccactgagc gtcagacccc gtagaaaaga tcaaaggatc 11520

ttcttgagat cctttttttc tgcgcgtaat ctgctgcttg caaacaaaaa aaccaccgct 11580

accagcggtg gtttgtttgc cggatcaaga gctaccaact ctttttccga aggtaactgg 11640

cttcagcaga gcgcagatac caaatactgt ccttctagtg tagccgtagt taggccacca 11700

cttcaagaac tctgtagcac cgcctacata cctcgctctg ctaatcctgt taccagtggc 11760

tgctgccagt ggcgataagt cgtgtcttac cgggttggac tcaagacgat agttaccgga 11820

taaggcgcag cggtcgggct gaacgggggg ttcgtgcaca cagcccagct tggagcgaac 11880

gacctacacc gaactgagat acctacagcg tgagctatga gaaagcgcca cgcttcccga 11940

agggagaaag gcggacaggt atccggtaag cggcagggtc ggaacaggag agcgcacgag 12000

ggagcttcca gggggaaacg cctggtatct ttatagtcct gtcgggtttc gccacctctg 12060

acttgagcgt cgatttttgt gatgctcgtc aggggggcgg agcctatgga aaaacgccag 12120

caacgcggcc tttttacggt tcctggcctt ttgctggcct tttgctcaca tgttctttcc 12180

tgcgttatcc cctgattctg tggataaccg tattaccgcc tttgagtgag ctgataccgc 12240

tcgccgcagc cgaacgaccg agcgcagcga gtcagtgagc gaggaagcta tggtgcactc 12300

tcagtacaat ctgctctgat gccgcatagt taagccagt 12339

<210> 2

<211> 12306

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> 3XFlag-Cas9-xTen-Tn5

<400> 2

atacactccg ctatcgctac gtgactgggt catggctgcg ccccgacacc cgccaacacc 60

cgctgacgcg ccctgacggg cttgtctgct cccggcatcc gcttacagac aagctgtgac 120

cgtctccggg agctgcatgt gtcagaggtt ttcaccgtca tcaccgaaac gcgcgaggca 180

gctgcggtaa agctcatcag cgtggtcgtg cagcgattca cagatgtctg cctgttcatc 240

cgcgtccagc tcgttgagtt tctccagaag cgttaatgtc tggcttctga taaagcgggc 300

catgttaagg gcggtttttt cctgtttggt cactgatgcc tccgtgtaag ggggatttct 360

gttcatgggg gtaatgatac cgatgaaacg agagaggatg ctcacgatac gggttactga 420

tgatgaacat gcccggttac tggaacgttg tgagggtaaa caactggcgg tatggatgcg 480

gcgggaccag agaaaaatca ctcagggtca atgccagccg aacgccagca agacgtagcc 540

cagcgcgtcg gccgccatgc cggcgataat ggcctgcttc tcgccgaaac gtttggtggc 600

gggaccagtg acgaaggctt gagcgagggc gtgcaagatt ccgaataccg caagcgacag 660

gccgatcatc gtcgcgctcc agcgaaagcg gtcctcgccg aaaatgaccc agagcgctgc 720

cggcacctgt cctacgagtt gcatgataaa gaagacagtc ataagtgcgg cgacgatagt 780

catgccccgc gcccaccgga aggagctgac tgggttgaag gctctcaagg gcatcggtcg 840

agatcccggt gcctaatgag tgagctaact tacattaatt gcgttgcgct cactgcccgc 900

tttccagtcg ggaaacctgt cgtgccagct gcattaatga atcggccaac gcgcggggag 960

aggcggtttg cgtattgggc gccagggtgg tttttctttt caccagtgag acgggcaaca 1020

gctgattgcc cttcaccgcc tggccctgag agagttgcag caagcggtcc acgctggttt 1080

gccccagcag gcgaaaatcc tgtttgatgg tggttaacgg cgggatataa catgagctgt 1140

cttcggtatc gtcgtatccc actaccgaga tatccgcacc aacgcgcagc ccggactcgg 1200

taatggcgcg cattgcgccc agcgccatct gatcgttggc aaccagcatc gcagtgggaa 1260

cgatgccctc attcagcatt tgcatggttt gttgaaaacc ggacatggca ctccagtcgc 1320

cttcccgttc cgctatcggc tgaatttgat tgcgagtgag atatttatgc cagccagcca 1380

gacgcagacg cgccgagaca gaacttaatg ggcccgctaa cagcgcgatt tgctggtgac 1440

ccaatgcgac cagatgctcc acgcccagtc gcgtaccgtc ttcatgggag aaaataatac 1500

tgttgatggg tgtctggtca gagacatcaa gaaataacgc cggaacatta gtgcaggcag 1560

cttccacagc aatggcatcc tggtcatcca gcggatagtt aatgatcagc ccactgacgc 1620

gttgcgcgag aagattgtgc accgccgctt tacaggcttc gacgccgctt cgttctacca 1680

tcgacaccac cacgctggca cccagttgat cggcgcgaga tttaatcgcc gcgacaattt 1740

gcgacggcgc gtgcagggcc agactggagg tggcaacgcc aatcagcaac gactgtttgc 1800

ccgccagttg ttgtgccacg cggttgggaa tgtaattcag ctccgccatc gccgcttcca 1860

ctttttcccg cgttttcgca gaaacgtggc tggcctggtt caccacgcgg gaaacggtct 1920

gataagagac accggcatac tctgcgacat cgtataacgt tactggtttc acattcacca 1980

ccctgaattg actctcttcc gggcgctatc atgccatacc gcgaaaggtt ttgcgccatt 2040

cgatggtgtc cgggatctcg acgctctccc ttatgcgact cctgcattag gaagcagccc 2100

agtagtaggt tgaggccgtt gagcaccgcc gccgcaagga atggtgcatg ccggcatgcc 2160

gccctttcgt cttcaagaat taattcccaa ttccccaggc atcaaataaa acgaaaggct 2220

cagtcgaaag actgggcctt tcgttttatc tgttgtttgt cggtgaacgc tctcctgagt 2280

aggacaaatc cgccgggagc ggatttgaac gttgcgaagc aacggcccgg agggtggcgg 2340

gcaggacgcc cgccataaac tgccaggaat taattcccca ggcatcaaat aaaacgaaag 2400

gctcagtcga aagactgggc ctttcgtttt atctgttgtt tgtcggtgaa cgctctcctg 2460

agtaggacaa atccgccggg agcggatttg aacgttgcga agcaacggcc cggagggtgg 2520

cgggcaggac gcccgccata aactgccagg aattaattcc ccaggcatca aataaaacga 2580

aaggctcagt cgaaagactg ggcctttcgt tttatctgtt gtttgtcggt gaacgctctc 2640

ctgagtagga caaatccgcc gggagcggat ttgaacgttg cgaagcaacg gcccggaggg 2700

tggcgggcag gacgcccgcc ataaactgcc aggaattaat tccccaggca tcaaataaaa 2760

cgaaaggctc agtcgaaaga ctgggccttt cgttttatct gttgtttgtc ggtgaacgct 2820

ctcctgagta ggacaaatcc gccgggagcg gatttgaacg ttgcgaagca acggcccgga 2880

gggtggcggg caggacgccc gccataaact gccaggaatt aattccccag gcatcaaata 2940

aaacgaaagg ctcagtcgaa agactgggcc tttcgtttta tctgttgttt gtcggtgaac 3000

gctctcctga gtaggacaaa tccgccggga gcggatttga acgttgcgaa gcaacggccc 3060

ggagggtggc gggcaggacg cccgccataa actgccagga attggggatc ggaattaatt 3120

cccggtttaa accggggatc tcgatcccgc gaaattaata cgactcacta taggggaatt 3180

gtgagcggat aacaattccc ctctagaaat aattttgttt aactttaaga aggagatata 3240

ccatgggtga ttacaaggat cacgatggcg attacaagga tcacgatatc gattacaagg 3300

atgatgatga taagatggat aaaaagtatt ctattggttt agctatcggc acaaatagcg 3360

tcggatgggc ggtgatcact gatgaatata aggttccgtc taaaaagttc aaggttctgg 3420

gaaatacaga ccgccacagt atcaaaaaaa atcttatagg ggctctttta tttgacagtg 3480

gagagacagc ggaagcgact cgtctcaaac ggacagctcg tagaaggtat acacgtcgga 3540

agaatcgtat ttgttatcta caggagattt tttcaaatga gatggcgaaa gtagatgata 3600

gtttctttca tcgacttgaa gagtcttttt tggtggaaga agacaagaag catgaacgtc 3660

atcctatttt tggaaatata gtagatgaag ttgcttatca tgagaaatat ccaactatct 3720

atcatctgcg aaaaaaattg gtagattcta ctgataaagc ggatttgcgc ttaatctatt 3780

tggccttagc gcatatgatt aagtttcgtg gtcatttttt gattgaggga gatttaaatc 3840

ctgataatag tgatgtggac aaactattta tccagttggt acaaacctac aatcaattat 3900

ttgaagaaaa ccctattaac gcaagtggag tagatgctaa agcgattctt tctgcacgat 3960

tgagtaaatc aagacgatta gaaaatctca ttgctcagct ccccggtgag aagaaaaatg 4020

gcttatttgg gaatctcatt gctttgtcat tgggtttgac ccctaatttt aaatcaaatt 4080

ttgatttggc agaagatgct aaattacagc tttcaaaaga tacttacgat gatgatttag 4140

ataatttatt ggcgcaaatt ggagatcaat atgctgattt gtttttggca gctaagaatt 4200

tatcagatgc tattttactt tcagatatcc taagagtaaa tactgaaata actaaggctc 4260

ccctatcagc atcaatgatt aaacgctacg atgaacatca tcaagacttg actcttttaa 4320

aagctttagt tcgacaacaa cttccagaaa agtataaaga aatctttttt gatcaatcaa 4380

aaaacggata tgcaggttat attgatgggg gagctagcca agaagaattt tataaattta 4440

tcaaaccaat tttagaaaaa atggatggta ctgaggaatt attggtgaaa ctaaatcgtg 4500

aagatttgct gcgcaagcaa cggacctttg acaacggctc tattccccat caaattcact 4560

tgggtgagct gcatgctatt ttgagaagac aagaagactt ttatccattt ttaaaagaca 4620

atcgtgagaa gattgaaaaa atcttgactt ttcgaattcc ttattatgtt ggtccattgg 4680

cgcgtggcaa tagtcgtttt gcatggatga ctcggaagtc tgaagaaaca attaccccat 4740

ggaattttga agaagttgtc gataaaggtg cttcagctca atcatttatt gaacgcatga 4800

caaactttga taaaaatctt ccaaatgaaa aagtactacc aaaacatagt ttgctttatg 4860

agtattttac ggtttataac gaattgacaa aggtcaaata tgttactgaa ggaatgcgaa 4920

aaccagcatt tctttcaggt gaacagaaga aagccattgt tgatttactc ttcaaaacaa 4980

atcgaaaagt aaccgttaag caattaaaag aagattattt caaaaaaata gaatgttttg 5040

atagtgttga aatttcagga gttgaagata gatttaatgc ttcattaggt acctaccatg 5100

atttgctaaa aattattaaa gataaagatt ttttggataa tgaagaaaat gaagatatct 5160

tagaggatat tgttttaaca ttgaccttat ttgaagatag ggagatgatt gaggaaagac 5220

ttaaaacata tgctcacctc tttgatgata aggtgatgaa acagcttaaa cgtcgccgtt 5280

atactggttg gggacgtttg tctcgaaaat tgattaatgg tattagggat aagcaatctg 5340

gcaaaacaat attagatttt ttgaaatcag atggttttgc caatcgcaat tttatgcagc 5400

tgatccatga tgatagtttg acatttaaag aagacattca aaaagcacaa gtgtctggac 5460

aaggcgatag tttacatgaa catattgcaa atttagctgg tagccctgct attaaaaaag 5520

gtattttaca gactgtaaaa gttgttgatg aattggtcaa agtaatgggg cggcataagc 5580

cagaaaatat cgttattgaa atggcacgtg aaaatcagac aactcaaaag ggccagaaaa 5640

attcgcgaga gcgtatgaaa cgaatcgaag aaggtatcaa agaattagga agtcagattc 5700

ttaaagagca tcctgttgaa aatactcaat tgcaaaatga aaagctctat ctctattatc 5760

tccaaaatgg aagagacatg tatgtggacc aagaattaga tattaatcgt ttaagtgatt 5820

atgatgtcga tgccattgtt ccacaaagtt tccttaaaga cgattcaata gacaataagg 5880

tcttaacgcg ttctgataaa aatcgtggta aatcggataa cgttccaagt gaagaagtag 5940

tcaaaaagat gaaaaactat tggagacaac ttctaaacgc caagttaatc actcaacgta 6000

agtttgataa tttaacgaaa gctgaacgtg gaggtttgag tgaacttgat aaagctggtt 6060

ttatcaaacg ccaattggtt gaaactcgcc aaatcactaa gcatgtggca caaattttgg 6120

atagtcgcat gaatactaaa tacgatgaaa atgataaact tattcgagag gttaaagtga 6180

ttaccttaaa atctaaatta gtttctgact tccgaaaaga tttccaattc tataaagtac 6240

gtgagattaa caattaccat catgcccatg atgcgtatct aaatgccgtc gttggaactg 6300

ctttgattaa gaaatatcca aaacttgaat cggagtttgt ctatggtgat tataaagttt 6360

atgatgttcg taaaatgatt gctaagtctg agcaagaaat aggcaaagca accgcaaaat 6420

atttctttta ctctaatatc atgaacttct tcaaaacaga aattacactt gcaaatggag 6480

agattcgcaa acgccctcta atcgaaacta atggggaaac tggagaaatt gtctgggata 6540

aagggcgaga ttttgccaca gtgcgcaaag tattgtccat gccccaagtc aatattgtca 6600

agaaaacaga agtacagaca ggcggattct ccaaggagtc aattttacca aaaagaaatt 6660

cggacaagct tattgctcgt aaaaaagact gggatccaaa aaaatatggt ggttttgata 6720

gtccaacggt agcttattca gtcctagtgg ttgctaaggt ggaaaaaggg aaatcgaaga 6780

agttaaaatc cgttaaagag ttactaggga tcacaattat ggaaagaagt tcctttgaaa 6840

aaaatccgat tgacttttta gaagctaaag gatataagga agttaaaaaa gacttaatca 6900

ttaaactacc taaatatagt ctttttgagt tagaaaacgg tcgtaaacgg atgctggcta 6960

gtgccggaga attacaaaaa ggaaatgagc tggctctgcc aagcaaatat gtgaattttt 7020

tatatttagc tagtcattat gaaaagttga agggtagtcc agaagataac gaacaaaaac 7080

aattgtttgt ggagcagcat aagcattatt tagatgagat tattgagcaa atcagtgaat 7140

tttctaagcg tgttatttta gcagatgcca atttagataa agttcttagt gcatataaca 7200

aacatagaga caaaccaata cgtgaacaag cagaaaatat tattcattta tttacgttga 7260

cgaatcttgg agctcccgct gcttttaaat attttgatac aacaattgat cgtaaacgat 7320

atacgtctac aaaagaagtt ttagatgcca ctcttatcca tcaatccatc actggtcttt 7380

atgaaacacg cattgatttg agtcagctag gaggtgacag cggttccgaa actcccggta 7440

catcagaaag cgcgaccccc gaaagcatga ttaccagtgc actgcatcgt gcggcggatt 7500

gggcgaaaag cgtgttttct agtgctgcgc tgggtgatcc gcgtcgtacc gcgcgtctgg 7560

tgaatgttgc ggcgcaactg gccaaatata gcggcaaaag cattaccatt agcagcgaag 7620

gcagcaaagc catgcaggaa ggcgcgtatc gttttattcg taatccgaac gtgagcgcgg 7680

aagcgattcg taaagcgggt gccatgcaga ccgtgaaact ggcccaggaa tttccggaac 7740

tgctggcaat tgaagatacc acctctctga gctatcgtca tcaggtggcg gaagaactgg 7800

gcaaactggg tagcattcag gataaaagcc gtggttggtg ggtgcatagc gtgctgctgc 7860

tggaagcgac cacctttcgt accgtgggcc tgctgcatca agaatggtgg atgcgtccgg 7920

atgatccggc ggatgcggat gaaaaagaaa gcggcaaatg gctggccgct gctgcaactt 7980

cgcgtctgag aatgggcagc atgatgagca acgtgattgc ggtgtgcgat cgtgaagcgg 8040

atattcatgc gtatctgcaa gataaactgg cccataacga acgttttgtg gtgcgtagca 8100

aacatccgcg taaagatgtg gaaagcggcc tgtatctgta tgatcacctg aaaaaccagc 8160

cggaactggg cggctatcag attagcattc cgcagaaagg cgtggtggat aaacgtggca 8220

aacgtaaaaa ccgtccggcg cgtaaagcga gcctgagcct gcgtagcggc cgtattaccc 8280

tgaaacaggg caacattacc ctgaacgcgg tgctggccga agaaattaat ccgccgaaag 8340

gcgaaacccc gctgaaatgg ctgctgctga ccagcgagcc ggtggaaagt ctggcccaag 8400

cgctgcgtgt gattgatatt tatacccatc gttggcgcat tgaagaattt cacaaagcgt 8460

ggaaaacggg tgcgggtgcg gaacgtcagc gtatggaaga accggataac ctggaacgta 8520

tggtgagcat tctgagcttt gtggcggtgc gtctgctgca actgcgtgaa tcttttactc 8580

cgccgcaagc actgcgtgcg cagggcctgc tgaaagaagc ggaacacgtt gaaagccaga 8640

gcgcggaaac cgtgctgacc ccggatgaat gccaactgct gggctatctg gataaaggca 8700

aacgcaaacg caaagaaaaa gcgggcagcc tgcaatgggc gtatatggcg attgcgcgtc 8760

tgggcggctt tatggatagc aaacgtaccg gcattgcgag ctggggtgcg ctgtgggaag 8820

gttgggaagc gctgcaaagc aaactggatg gctttctggc cgcgaaagac ctgatggcgc 8880

agggcattaa aatctgcatc acgggagatg cactagttgc cctacccgag ggcgagtcgg 8940

tacgcatcgc cgacatcgtg ccgggtgcgc ggcccaacag tgacaacgcc atcgacctga 9000

aagtccttga ccggcatggc aatcccgtgc tcgccgaccg gctgttccac tccggcgagc 9060

atccggtgta cacggtgcgt acggtcgaag gtctgcgtgt gacgggcacc gcgaaccacc 9120

cgttgttgtg tttggtcgac gtcgccgggg tgccgaccct gctgtggaag ctgatcgacg 9180

aaatcaagcc gggcgattac gcggtgattc aacgcagcgc attcagcgtc gactgtgcag 9240

gttttgcccg cgggaaaccc gaatttgcgc ccacaaccta cacagtcggc gtccctggac 9300

tggtgcgttt cttggaagca caccaccgag acccggacgc ccaagctatc gccgacgagc 9360

tgaccgacgg gcggttctac tacgcgaaag tcgccagtgt caccgacgcc ggcgtgcagc 9420

cggtgtatag ccttcgtgtc gacacggcag accacgcgtt tatcacgaac gggttcgtca 9480

gccacgctac tggcctcacc ggtctgaact caggcctcac gacaaatcct ggtgtatccg 9540

cttggcaggt caacacagct tatactgcgg gacaattggt cacatataac ggcaagacgt 9600

ataaatgttt gcagccccac acctccttgg caggatggga accatccaac gttcctgcct 9660

tgtggcagct tcaatgactg caggaagggg atccggctgc taacaaagcc cgaaaggaag 9720

ctgagttggc tgctgccacc gctgagcaat aactagcata accccttggg gcctctaaac 9780

gggtcttgag gggttttttg ctgaaaggag gaactatatc cggataacta cgtcaggtgg 9840

cacttttcgg ggaaatgtgc gcggaacccc tatttgttta tttttctaaa tacattcaaa 9900

tatgtatccg ctcatgagac aataaccctg ataaatgctt caataatatt gaaaaaggaa 9960

gagtatgagt attcaacatt tccgtgtcgc ccttattccc ttttttgcgg cattttgcct 10020

tcctgttttt gctcacccag aaacgctggt gaaagtaaaa gatgctgaag atcagttggg 10080

tgcacgagtg ggttacatcg aactggatct caacagcggt aagatccttg agagttttcg 10140

ccccgaagaa cgtttcccaa tgatgagcac ttttaaagtt ctgctatgtg gcgcggtatt 10200

atcccgtgtt gacgccgggc aagagcaact cggtcgccgc atacactatt ctcagaatga 10260

cttggttgag tactcaccag tcacagaaaa gcatcttacg gatggcatga cagtaagaga 10320

attatgcagt gctgccataa ccatgagtga taacactgcg gccaacttac ttctgacaac 10380

gatcggagga ccgaaggagc taaccgcttt tttgcacaac atgggggatc atgtaactcg 10440

ccttgatcgt tgggaaccgg agctgaatga agccatacca aacgacgagc gtgacaccac 10500

gatgcctgta gcaatggcaa caacgttgcg caaactatta actggcgaac tacttactct 10560

agcttcccgg caacaattaa tagactggat ggaggcggat aaagttgcag gaccacttct 10620

gcgctcggcc cttccggctg gctggtttat tgctgataaa tctggagccg gtgagcgtgg 10680

gtctcgcggt atcattgcag cactggggcc agatggtaag ccctcccgta tcgtagttat 10740

ctacacgacg gggagtcagg caactatgga tgaacgaaat agacagatcg ctgagatagg 10800

tgcctcactg attaagcatt ggtaactgtc agaccaagtt tactcatata tactttagat 10860

tgatttaccc cggttgataa tcagaaaagc cccaaaaaca ggaagattgt ataagcaaat 10920

atttaaattg taaacgttaa tattttgtta aaattcgcgt taaatttttg ttaaatcagc 10980

tcatttttta accaataggc cgaaatcggc aaaatccctt ataaatcaaa agaatagccc 11040

gagatagggt tgagtgttgt tccagtttgg aacaagagtc cactattaaa gaacgtggac 11100

tccaacgtca aagggcgaaa aaccgtctat cagggcgatg gcccactacg tgaaccatca 11160

cccaaatcaa gttttttggg gtcgaggtgc cgtaaagcac taaatcggaa ccctaaaggg 11220

agcccccgat ttagagcttg acggggaaag ccggcgaacg tggcgagaaa ggaagggaag 11280

aaagcgaaag gagcgggcgc tagggcgctg gcaagtgtag cggtcacgct gcgcgtaacc 11340

accacacccg ccgcgcttaa tgcgccgcta cagggcgcgt aaaaggatct aggtgaagat 11400

cctttttgat aatctcatga ccaaaatccc ttaacgtgag ttttcgttcc actgagcgtc 11460

agaccccgta gaaaagatca aaggatcttc ttgagatcct ttttttctgc gcgtaatctg 11520

ctgcttgcaa acaaaaaaac caccgctacc agcggtggtt tgtttgccgg atcaagagct 11580

accaactctt tttccgaagg taactggctt cagcagagcg cagataccaa atactgtcct 11640

tctagtgtag ccgtagttag gccaccactt caagaactct gtagcaccgc ctacatacct 11700

cgctctgcta atcctgttac cagtggctgc tgccagtggc gataagtcgt gtcttaccgg 11760

gttggactca agacgatagt taccggataa ggcgcagcgg tcgggctgaa cggggggttc 11820

gtgcacacag cccagcttgg agcgaacgac ctacaccgaa ctgagatacc tacagcgtga 11880

gctatgagaa agcgccacgc ttcccgaagg gagaaaggcg gacaggtatc cggtaagcgg 11940

cagggtcgga acaggagagc gcacgaggga gcttccaggg ggaaacgcct ggtatcttta 12000

tagtcctgtc gggtttcgcc acctctgact tgagcgtcga tttttgtgat gctcgtcagg 12060

ggggcggagc ctatggaaaa acgccagcaa cgcggccttt ttacggttcc tggccttttg 12120

ctggcctttt gctcacatgt tctttcctgc gttatcccct gattctgtgg ataaccgtat 12180

taccgccttt gagtgagctg ataccgctcg ccgcagccga acgaccgagc gcagcgagtc 12240

agtgagcgag gaagctatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa 12300

gccagt 12306

<210> 3

<211> 11245

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> pET-Tn5-xTen-dCas9

<400> 3

caaggagatg gcgcccaaca gtcccccggc cacggggcct gccaccatac ccacgccgaa 60

acaagcgctc atgagcccga agtggcgagc ccgatcttcc ccatcggtga tgtcggcgat 120

ataggcgcca gcaaccgcac ctgtggcgcc ggtgatgccg gccacgatgc gtccggcgta 180

gaggatcgag atctcgatcc cgcgaaatta atacgactca ctatagggga attgtgagcg 240

gataacaatt cccctctaga aataattttg tttaacttta agaaggagat ataccatgat 300

taccagtgca ctgcatcgtg cggcggattg ggcgaaaagc gtgttttcta gtgctgcgct 360

gggtgatccg cgtcgtaccg cgcgtctggt gaatgttgcg gcgcaactgg ccaaatatag 420

cggcaaaagc attaccatta gcagcgaagg cagcaaagcc atgcaggaag gcgcgtatcg 480

ttttattcgt aatccgaacg tgagcgcgga agcgattcgt aaagcgggtg ccatgcagac 540

cgtgaaactg gcccaggaat ttccggaact gctggcaatt gaagatacca cctctctgag 600

ctatcgtcat caggtggcgg aagaactggg caaactgggt agcattcagg ataaaagccg 660

tggttggtgg gtgcatagcg tgctgctgct ggaagcgacc acctttcgta ccgtgggcct 720

gctgcatcaa gaatggtgga tgcgtccgga tgatccggcg gatgcggatg aaaaagaaag 780

cggcaaatgg ctggccgctg ctgcaacttc gcgtctgaga atgggcagca tgatgagcaa 840

cgtgattgcg gtgtgcgatc gtgaagcgga tattcatgcg tatctgcaag ataaactggc 900

ccataacgaa cgttttgtgg tgcgtagcaa acatccgcgt aaagatgtgg aaagcggcct 960

gtatctgtat gatcacctga aaaaccagcc ggaactgggc ggctatcaga ttagcattcc 1020

gcagaaaggc gtggtggata aacgtggcaa acgtaaaaac cgtccggcgc gtaaagcgag 1080

cctgagcctg cgtagcggcc gtattaccct gaaacagggc aacattaccc tgaacgcggt 1140

gctggccgaa gaaattaatc cgccgaaagg cgaaaccccg ctgaaatggc tgctgctgac 1200

cagcgagccg gtggaaagtc tggcccaagc gctgcgtgtg attgatattt atacccatcg 1260

ttggcgcatt gaagaatttc acaaagcgtg gaaaacgggt gcgggtgcgg aacgtcagcg 1320

tatggaagaa ccggataacc tggaacgtat ggtgagcatt ctgagctttg tggcggtgcg 1380

tctgctgcaa ctgcgtgaat cttttactcc gccgcaagca ctgcgtgcgc agggcctgct 1440

gaaagaagcg gaacacgttg aaagccagag cgcggaaacc gtgctgaccc cggatgaatg 1500

ccaactgctg ggctatctgg ataaaggcaa acgcaaacgc aaagaaaaag cgggcagcct 1560

gcaatgggcg tatatggcga ttgcgcgtct gggcggcttt atggatagca aacgtaccgg 1620

cattgcgagc tggggtgcgc tgtgggaagg ttgggaagcg ctgcaaagca aactggatgg 1680

ctttctggcc gcgaaagacc tgatggcgca gggcattaaa atcagcggtt ccgaaactcc 1740

cggtacatca gaaagcgcga cccccgaaag catggataaa aagtattcta ttggtttagc 1800

tatcggcaca aatagcgtcg gatgggcggt gatcactgat gaatataagg ttccgtctaa 1860

aaagttcaag gttctgggaa atacagaccg ccacagtatc aaaaaaaatc ttataggggc 1920

tcttttattt gacagtggag agacagcgga agcgactcgt ctcaaacgga cagctcgtag 1980

aaggtataca cgtcggaaga atcgtatttg ttatctacag gagatttttt caaatgagat 2040

ggcgaaagta gatgatagtt tctttcatcg acttgaagag tcttttttgg tggaagaaga 2100

caagaagcat gaacgtcatc ctatttttgg aaatatagta gatgaagttg cttatcatga 2160

gaaatatcca actatctatc atctgcgaaa aaaattggta gattctactg ataaagcgga 2220

tttgcgctta atctatttgg ccttagcgca tatgattaag tttcgtggtc attttttgat 2280

tgagggagat ttaaatcctg ataatagtga tgtggacaaa ctatttatcc agttggtaca 2340

aacctacaat caattatttg aagaaaaccc tattaacgca agtggagtag atgctaaagc 2400

gattctttct gcacgattga gtaaatcaag acgattagaa aatctcattg ctcagctccc 2460

cggtgagaag aaaaatggct tatttgggaa tctcattgct ttgtcattgg gtttgacccc 2520

taattttaaa tcaaattttg atttggcaga agatgctaaa ttacagcttt caaaagatac 2580

ttacgatgat gatttagata atttattggc gcaaattgga gatcaatatg ctgatttgtt 2640

tttggcagct aagaatttat cagatgctat tttactttca gatatcctaa gagtaaatac 2700

tgaaataact aaggctcccc tatcagcatc aatgattaaa cgctacgatg aacatcatca 2760

agacttgact cttttaaaag ctttagttcg acaacaactt ccagaaaagt ataaagaaat 2820

cttttttgat caatcaaaaa acggatatgc aggttatatt gatgggggag ctagccaaga 2880

agaattttat aaatttatca aaccaatttt agaaaaaatg gatggtactg aggaattatt 2940

ggtgaaacta aatcgtgaag atttgctgcg caagcaacgg acctttgaca acggctctat 3000

tccccatcaa attcacttgg gtgagctgca tgctattttg agaagacaag aagactttta 3060

tccattttta aaagacaatc gtgagaagat tgaaaaaatc ttgacttttc gaattcctta 3120

ttatgttggt ccattggcgc gtggcaatag tcgttttgca tggatgactc ggaagtctga 3180

agaaacaatt accccatgga attttgaaga agttgtcgat aaaggtgctt cagctcaatc 3240

atttattgaa cgcatgacaa actttgataa aaatcttcca aatgaaaaag tactaccaaa 3300

acatagtttg ctttatgagt attttacggt ttataacgaa ttgacaaagg tcaaatatgt 3360

tactgaagga atgcgaaaac cagcatttct ttcaggtgaa cagaagaaag ccattgttga 3420

tttactcttc aaaacaaatc gaaaagtaac cgttaagcaa ttaaaagaag attatttcaa 3480

aaaaatagaa tgttttgata gtgttgaaat ttcaggagtt gaagatagat ttaatgcttc 3540

attaggtacc taccatgatt tgctaaaaat tattaaagat aaagattttt tggataatga 3600

agaaaatgaa gatatcttag aggatattgt tttaacattg accttatttg aagataggga 3660

gatgattgag gaaagactta aaacatatgc tcacctcttt gatgataagg tgatgaaaca 3720

gcttaaacgt cgccgttata ctggttgggg acgtttgtct cgaaaattga ttaatggtat 3780

tagggataag caatctggca aaacaatatt agattttttg aaatcagatg gttttgccaa 3840

tcgcaatttt atgcagctga tccatgatga tagtttgaca tttaaagaag acattcaaaa 3900

agcacaagtg tctggacaag gcgatagttt acatgaacat attgcaaatt tagctggtag 3960

ccctgctatt aaaaaaggta ttttacagac tgtaaaagtt gttgatgaat tggtcaaagt 4020

aatggggcgg cataagccag aaaatatcgt tattgaaatg gcacgtgaaa atcagacaac 4080

tcaaaagggc cagaaaaatt cgcgagagcg tatgaaacga atcgaagaag gtatcaaaga 4140

attaggaagt cagattctta aagagcatcc tgttgaaaat actcaattgc aaaatgaaaa 4200

gctctatctc tattatctcc aaaatggaag agacatgtat gtggaccaag aattagatat 4260

taatcgttta agtgattatg atgtcgatgc cattgttcca caaagtttcc ttaaagacga 4320

ttcaatagac aataaggtct taacgcgttc tgataaaaat cgtggtaaat cggataacgt 4380

tccaagtgaa gaagtagtca aaaagatgaa aaactattgg agacaacttc taaacgccaa 4440

gttaatcact caacgtaagt ttgataattt aacgaaagct gaacgtggag gtttgagtga 4500

acttgataaa gctggtttta tcaaacgcca attggttgaa actcgccaaa tcactaagca 4560

tgtggcacaa attttggata gtcgcatgaa tactaaatac gatgaaaatg ataaacttat 4620

tcgagaggtt aaagtgatta ccttaaaatc taaattagtt tctgacttcc gaaaagattt 4680

ccaattctat aaagtacgtg agattaacaa ttaccatcat gcccatgatg cgtatctaaa 4740

tgccgtcgtt ggaactgctt tgattaagaa atatccaaaa cttgaatcgg agtttgtcta 4800

tggtgattat aaagtttatg atgttcgtaa aatgattgct aagtctgagc aagaaatagg 4860

caaagcaacc gcaaaatatt tcttttactc taatatcatg aacttcttca aaacagaaat 4920

tacacttgca aatggagaga ttcgcaaacg ccctctaatc gaaactaatg gggaaactgg 4980

agaaattgtc tgggataaag ggcgagattt tgccacagtg cgcaaagtat tgtccatgcc 5040

ccaagtcaat attgtcaaga aaacagaagt acagacaggc ggattctcca aggagtcaat 5100

tttaccaaaa agaaattcgg acaagcttat tgctcgtaaa aaagactggg atccaaaaaa 5160

atatggtggt tttgatagtc caacggtagc ttattcagtc ctagtggttg ctaaggtgga 5220

aaaagggaaa tcgaagaagt taaaatccgt taaagagtta ctagggatca caattatgga 5280

aagaagttcc tttgaaaaaa atccgattga ctttttagaa gctaaaggat ataaggaagt 5340

taaaaaagac ttaatcatta aactacctaa atatagtctt tttgagttag aaaacggtcg 5400

taaacggatg ctggctagtg ccggagaatt acaaaaagga aatgagctgg ctctgccaag 5460

caaatatgtg aattttttat atttagctag tcattatgaa aagttgaagg gtagtccaga 5520

agataacgaa caaaaacaat tgtttgtgga gcagcataag cattatttag atgagattat 5580

tgagcaaatc agtgaatttt ctaagcgtgt tattttagca gatgccaatt tagataaagt 5640

tcttagtgca tataacaaac atagagacaa accaatacgt gaacaagcag aaaatattat 5700

tcatttattt acgttgacga atcttggagc tcccgctgct tttaaatatt ttgatacaac 5760

aattgatcgt aaacgatata cgtctacaaa agaagtttta gatgccactc ttatccatca 5820

atccatcact ggtctttatg aaacacgcat tgatttgagt cagctaggag gtgaccacca 5880

ccaccaccac cactgagatc cggctgctaa caaagcccga aaggaagctg agttggctgc 5940

tgccaccgct gagcaataac tagcataacc ccttggggcc tctaaacggg tcttgagggg 6000

ttttttgctg aaaggaggaa ctatatccgg atatcccgca agaggcccgg cagtaccggc 6060

ataaccaagc ctatgcctac agcatccagg gtgacggtgc cgaggatgac gatgagcgca 6120

ttgttagatt tcatacacgg tgcctgactg cgttagcaat ttaactgtga taaactaccg 6180

cattaaagct agcttatcga tgataagctg tcaaacatga gaattaattc ttgaagacga 6240

aagggcctcg tgatacgcct atttttatag gttaatgtca tgataataat ggtttcttag 6300

acgtcaggtg gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa 6360

atacattcaa atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat 6420

tgaaaaagga agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg 6480

gcattttgcc ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa 6540

gatcagttgg gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt 6600

gagagttttc gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt 6660

ggcgcggtat tatcccgtgt tgacgccggg caagagcaac tcggtcgccg catacactat 6720

tctcagaatg acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg 6780

acagtaagag aattatgcag tgctgccata accatgagtg ataacactgc ggccaactta 6840

cttctgacaa cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat 6900

catgtaactc gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag 6960

cgtgacacca cgatgcctgc agcaatggca acaacgttgc gcaaactatt aactggcgaa 7020

ctacttactc tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca 7080

ggaccacttc tgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc 7140

ggtgagcgtg ggtctcgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt 7200

atcgtagtta tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc 7260

gctgagatag gtgcctcact gattaagcat tggtaactgt cagaccaagt ttactcatat 7320

atactttaga ttgatttaaa acttcatttt taatttaaaa ggatctaggt gaagatcctt 7380

tttgataatc tcatgaccaa aatcccttaa cgtgagtttt cgttccactg agcgtcagac 7440

cccgtagaaa agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc 7500

ttgcaaacaa aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca 7560

actctttttc cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgtccttcta 7620

gtgtagccgt agttaggcca ccacttcaag aactctgtag caccgcctac atacctcgct 7680

ctgctaatcc tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg 7740

gactcaagac gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc 7800

acacagccca gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta 7860

tgagaaagcg ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg 7920

gtcggaacag gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt 7980

cctgtcgggt ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg 8040

cggagcctat ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg 8100

ccttttgctc acatgttctt tcctgcgtta tcccctgatt ctgtggataa ccgtattacc 8160

gcctttgagt gagctgatac cgctcgccgc agccgaacga ccgagcgcag cgagtcagtg 8220

agcgaggaag cggaagagcg cctgatgcgg tattttctcc ttacgcatct gtgcggtatt 8280

tcacaccgca atggtgcact ctcagtacaa tctgctctga tgccgcatag ttaagccagt 8340

atacactccg ctatcgctac gtgactgggt catggctgcg ccccgacacc caccaacacc 8400

cgctgacgcg ccctgacggg cttgtctgct cccggcatcc gcttacagac aagctgtgac 8460

cgtctccggg agctgcatgt gtcagaggtt ttcaccgtca tcaccgaaac gcgcgaggca 8520

gctgcggtaa agctcatcag cgtggtcgtg aagcgattca cagatgtctg cctgttcatc 8580

cgcgtccagc tcgttgagtt tctccagaag cgttaatgtc tggcttctga taaagcgggc 8640

catgttaagg gcggtttttt cctgtttggt cactgatgcc tccgtgtaag ggggatttct 8700

gttcatgggg gtaatgatac cgatgaaacg agagaggatg ctcacgatac gggttactga 8760

tgatgaacat gcccggttac tggaacgttg tgagggtaaa caactggcgg tatggatgcg 8820

gcgggaccag agaaaaatca ctcagggtca atgccagcgc ttcgttaata cagatgtagg 8880

tgttccacag ggtagccagc agcatcctgc gatgcagatc cggaacataa tggtgcaggg 8940

cgctgacttc cgcgtttcca gactttacga aacacggaaa ccgaagacca ttcatgttgt 9000

tgctcaggtc gcagacgttt tgcagcagca gtcgcttcac gttcgctcgc gtatcggtga 9060

ttcattctgc taaccagtaa ggcaaccccg ccagcctagc cgggtcctca acgacaggag 9120

cacgatcatg cgcacccgtg gccaggaccc aacgctgccc gagatgcgcc gcgtgcggct 9180

gctggagatg gcggacgcga tggatatgtt ctgccaaggg ttggtttgcg cattcacagt 9240

tctccgcaag aattgattgg ctccaattct tggagtggtg aatccgttag cgaggtgccg 9300

ccggcttcca ttcaggtcga ggtggcccgg ctccatgcac cgcgacgcaa cgcggggagg 9360

cagacaaggt atagggcggc gcctacaatc catgccaacc cgttccatgt gctcgccgag 9420

gcggcataaa tcgccgtgac gatcagcggt ccaatgatcg aagttaggct ggtaagagcc 9480

gcgagcgatc cttgaagctg tccctgatgg tcgtcatcta cctgcctgga cagcatggcc 9540

tgcaacgcgg gcatcccgat gccgccggaa gcgagaagaa tcataatggg gaaggccatc 9600

cagcctcgcg tcgcgaacgc cagcaagacg tagcccagcg cgtcggccgc catgccggcg 9660

ataatggcct gcttctcgcc gaaacgtttg gtggcgggac cagtgacgaa ggcttgagcg 9720

agggcgtgca agattccgaa taccgcaagc gacaggccga tcatcgtcgc gctccagcga 9780

aagcggtcct cgccgaaaat gacccagagc gctgccggca cctgtcctac gagttgcatg 9840

ataaagaaga cagtcataag tgcggcgacg atagtcatgc cccgcgccca ccggaaggag 9900

ctgactgggt tgaaggctct caagggcatc ggtcgagatc ccggtgccta atgagtgagc 9960

taacttacat taattgcgtt gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc 10020

cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgccag 10080

ggtggttttt cttttcacca gtgagacggg caacagctga ttgcccttca ccgcctggcc 10140

ctgagagagt tgcagcaagc ggtccacgct ggtttgcccc agcaggcgaa aatcctgttt 10200

gatggtggtt aacggcggga tataacatga gctgtcttcg gtatcgtcgt atcccactac 10260

cgagatatcc gcaccaacgc gcagcccgga ctcggtaatg gcgcgcattg cgcccagcgc 10320

catctgatcg ttggcaacca gcatcgcagt gggaacgatg ccctcattca gcatttgcat 10380

ggtttgttga aaaccggaca tggcactcca gtcgccttcc cgttccgcta tcggctgaat 10440

ttgattgcga gtgagatatt tatgccagcc agccagacgc agacgcgccg agacagaact 10500

taatgggccc gctaacagcg cgatttgctg gtgacccaat gcgaccagat gctccacgcc 10560

cagtcgcgta ccgtcttcat gggagaaaat aatactgttg atgggtgtct ggtcagagac 10620

atcaagaaat aacgccggaa cattagtgca ggcagcttcc acagcaatgg catcctggtc 10680

atccagcgga tagttaatga tcagcccact gacgcgttgc gcgagaagat tgtgcaccgc 10740

cgctttacag gcttcgacgc cgcttcgttc taccatcgac accaccacgc tggcacccag 10800

ttgatcggcg cgagatttaa tcgccgcgac aatttgcgac ggcgcgtgca gggccagact 10860

ggaggtggca acgccaatca gcaacgactg tttgcccgcc agttgttgtg ccacgcggtt 10920

gggaatgtaa ttcagctccg ccatcgccgc ttccactttt tcccgcgttt tcgcagaaac 10980

gtggctggcc tggttcacca cgcgggaaac ggtctgataa gagacaccgg catactctgc 11040

gacatcgtat aacgttactg gtttcacatt caccaccctg aattgactct cttccgggcg 11100

ctatcatgcc ataccgcgaa aggttttgcg ccattcgatg gtgtccggga tctcgacgct 11160

ctcccttatg cgactcctgc attaggaagc agcccagtag taggttgagg ccgttgagca 11220

ccgccgccgc aaggaatggt gcatg 11245

<210> 4

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.CVSJ0588.AF

<400> 4

gaaattaatg gtttaagctt 20

<210> 5

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.CVSJ0588.AA

<400> 5

aggtgagcaa gatttccatt 20

<210> 6

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.CVSJ0588.AC

<400> 6

tcaaggacat attctcctgt 20

<210> 7

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.CVSJ0588.AJ

<400> 7

aatgtgctcc ataaggaatt 20

<210> 8

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.CVSJ0588.AG

<400> 8

aactttcttc ttctgaggag 20

<210> 9

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.CVSJ0588.AB

<400> 9

aagatcacac ctatgggaaa 20

<210> 10

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.RNXS0617.AA

<400> 10

gctattttga ccatttcaat 20

<210> 11

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.RNXS0617.AB

<400> 11

cggaggacaa atccatacca 20

<210> 12

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.RNXS0617.AC

<400> 12

cagtttatcg ttattaccaa 20

<210> 13

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.RNXS0617.AD

<400> 13

acttatacca tgctgaccat 20

<210> 14

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.RZZJ8230.AA

<400> 14

ggacaacacc ctgaccatcc 20

<210> 15

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.RZZJ8230.AC

<400> 15

gtctgacctc gactccatcc 20

<210> 16

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> CD.Cas9.RZZJ8230.AD

<400> 16

gaacatcaaa ggtctgactc 20

<210> 17

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> EXT1-1

<400> 17

atatcacgtc cataacgggg 20

<210> 18

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> EXT1-4

<400> 18

cacttggcct gactacaccg 20

<210> 19

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> BCL9-2

<400> 19

gggttggcat cggaaccacg 20

<210> 20

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> BCL9-4

<400> 20

gatgccctct ccaaatgccg 20

<210> 21

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> HOXA13-1

<400> 21

gtagccatag ggcagcgccg 20

<210> 22

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> HOXA13-4

<400> 22

tttctctacg acaacggcgg 20

<210> 23

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> HOXD11-1

<400> 23

gggcttcgac cagttctacg 20

<210> 24

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> HOXD11-3

<400> 24

gggctacgct ccctactacg 20

<210> 25

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> OLIG2-1

<400> 25

actggtgagc gagatctacg 20

<210> 26

<211> 20

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> OLIG2-2

<400> 26

gcacgccgca catcaccccg 20

<210> 27

<211> 33

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> adapter A

<400> 27

tcgtcggcag cgtcagatgt gtataagaga cag 33

<210> 28

<211> 34

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> adapter B

<400> 28

gtctcgtggg ctcggagatg tgtataagag acag 34

<210> 29

<211> 19

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> mosaic Ends

<400> 29

agatgtgtat aagagacag 19

<210> 30

<211> 15

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Illumina sequence primer site-SP 2

<400> 30

gtctcgtggg ctcgg 15

<210> 31

<211> 14

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Illumina sequence primer site-SP 1

<400> 31

tcgtcggcag cgtc 14

<210> 32

<211> 10

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> (G)n

<220>

<221> Variant

<222> (2)..(10)

<223> may or may not be present

<400> 32

Gly Gly Gly Gly Gly Gly Gly Gly Gly Gly

1 5 10

<210> 33

<211> 20

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> (GS)n

<220>

<221> Variant

<222> (3)..(20)

<223> may or may not be present

<400> 33

Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser

1 5 10 15

Gly Ser Gly Ser

20

<210> 34

<211> 50

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> (GSGGS)n

<220>

<221> Variant

<222> (6)..(50)

<223> may or may not be present

<400> 34

Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly

1 5 10 15

Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser

20 25 30

Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly

35 40 45

Gly Ser

50

<210> 35

<211> 50

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> (G4S)n

<220>

<221> Variant

<222> (6)..(50)

<223> may or may not be present

<400> 35

Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly

1 5 10 15

Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly

20 25 30

Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly

35 40 45

Gly Ser

50

<210> 36

<211> 40

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> (GGGS)n

<220>

<221> Variant

<222> (5)..(40)

<223> may or may not be present

<400> 36

Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser

1 5 10 15

Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser

20 25 30

Gly Gly Gly Ser Gly Gly Gly Ser

35 40

<210> 37

<211> 4

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Glycine/serine spacer 1

<400> 37

Gly Gly Ser Gly

1

<210> 38

<211> 5

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Glycine/serine spacer 2

<400> 38

Gly Gly Ser Gly Gly

1 5

<210> 39

<211> 5

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Glycine/serine spacer 3

<400> 39

Gly Ser Gly Ser Gly

1 5

<210> 40

<211> 5

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Glycine/serine spacer 4

<400> 40

Gly Ser Gly Gly Gly

1 5

<210> 41

<211> 5

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Glycine/serine spacer 5

<400> 41

Gly Gly Gly Ser Gly

1 5

<210> 42

<211> 5

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> Glycine/serine spacer 6

<400> 42

Gly Ser Ser Ser Gly

1 5

<210> 43

<211> 12

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> XTEN linker 1

<400> 43

Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala

1 5 10

<210> 44

<211> 16

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> XTEN linker 2

<400> 44

Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser

1 5 10 15

<210> 45

<211> 21

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<220>

<223> XTEN linker 3

<400> 45

Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Gly

1 5 10 15

Gly Ser Gly Gly Ser

20

Claims

1. A composition comprising more than one protein complex, wherein each of the more than one protein complex comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target double-stranded DNA (dsDNA), wherein the transposome comprises a transposase, a first adapter, and a second adapter, and wherein the binding site of each of the more than one protein complexes is different from each other.

2. The composition of claim 1, wherein at least two of the more than one protein complexes comprise the same transposomes.

3. The composition of claim 1, wherein the more than one protein complex all comprise the same transposomes.

4. The composition of any one of claims 1-3, wherein the more than one protein complex all comprise the same transposase.

5. The composition of any one of claims 1-4, wherein the first adaptor and the second adaptor in the same transposome are the same.

6. The composition of any one of claims 1-5, wherein the first adapter, the second adapter, or both in different transposomes are different.

7. The composition of any one of claims 1-6, wherein the first adapter, the second adapter, or both are dsDNA or RNA/DNA duplex.

8. The composition of any one of claims 1-7, wherein the adapter is about 3-200 base pairs in length.

9. The composition of any one of claims 1-8, wherein the first adapter, the second adapter, or both are sequencing adapters.

10. The composition of claim 9, wherein the sequencing adapter comprises a P5 or P7 primer sequence.

11. The composition of any one of claims 1-10, wherein the binding sites of at least two of the more than one protein complexes are located on the same target dsDNA.

12. The composition of claim 11, wherein the binding sites of at least two of the more than one protein complexes are about 1-50000 nucleotides apart on the same target dsDNA.

13. The composition of claim 11, wherein the distance between the binding sites of one pair of the more than one protein complexes is substantially the same as the distance between the binding sites of another pair of the more than one protein complexes.

14. The composition of claim 11, wherein the distance between the binding sites of one pair of the more than one protein complex is different from the distance between the binding sites of another pair of the more than one protein complex.

15. The composition of any one of claims 1-11, wherein the binding sites of at least two of the more than one protein complexes are located on different strands of target dsDNA.

16. The composition of any one of claims 1-15, wherein at least two of the more than one protein complexes are capable of specifically binding to different target dsDNA.

17. The composition of any one of claims 1-15, wherein the more than one protein complex is capable of specifically binding between about 2-5000 targets dsDNA.

18. The composition of any one of claims 1-17, wherein the transposase is a Tn5 transposase, a Tn7 transposase, a mariner Tc 1-like transposase, a Himar1C9 transposase, or a sleeping beauty transposase.

19. The composition of any one of claims 1-18, wherein the transposase is a superactive transposase.

20. The composition of any one of claims 1-19, wherein the programmable DNA binding unit comprises a nuclease-deficient CRISPR-associated protein (dCAS protein) and a guide RNA capable of specifically binding to a binding site of the target dsDNA.

21. The composition of claim 20, wherein the transposome is associated with the programmable DNA binding unit through a linker that connects the transposase and the dCAS protein.

22. The composition of claim 21, wherein the linker comprises a peptide linker, a chemical linker, or both.

23. The composition of claim 20, wherein the transposase is present as a fusion protein comprising the dCAS protein.

24. The composition of any one of claims 20-23, wherein the dCAS protein is dCAS9, dCAS12, dCAS13, dCAS14, or SpRY dCAS.

25. The composition of claim 24, wherein the dCAS13 protein is dCAS13a, dCAS13b, dCAS13c, or dCAS13d.

26. The composition of any one of claims 1-19, wherein the programmable DNA binding unit comprises a protein component capable of specifically binding to a binding site on the target dsDNA, wherein the protein component comprises an endonuclease-deficient Zinc Finger Nuclease (ZFN), an endonuclease-deficient transcription activator-like effector nuclease (TALEN), an Argonaute protein, an endonuclease-deficient meganuclease, a recombinase, or a combination thereof.

27. The composition of claim 26, wherein the transposomes are associated with the programmable DNA binding unit through a linker connecting the transposase and the protein component.

28. The composition of claim 27, wherein the linker comprises a peptide linker, a chemical linker, or both.

29. The composition of claim 28, wherein the peptide linker comprises more than one glycine, serine, threonine, alanine, lysine, glutamine, or a combination thereof.

30. The composition of claim 29, wherein the peptide linker comprises a GS linker.

31. The composition of claim 28, wherein the peptide linker is an XTEN linker.

32. The composition of claim 26, wherein the protein component is present as a fusion protein comprising the transposase.

33. A reaction mixture comprising

The composition of any one of claims 1-32; and

a sample nucleic acid suspected of comprising one or more target dsDNA.

34. The reaction mixture of claim 33, further comprising a DNA polymerase, dntps, or a combination thereof.

35. The reaction mixture of any one of claims 33-34, wherein the adapter is covalently attached to the target dsDNA or fragment thereof.

36. The reaction mixture of any one of claims 33-35, comprising more than one dsDNA fragment, each fragment comprising the first adaptor and the second adaptor of one of the more than one protein complexes at each terminus, respectively.

37. The reaction mixture of any one of claims 33-36, wherein the sample nucleic acid comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof.

38. The reaction mixture of any one of claims 33-37, wherein the target dsDNA is genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof.

39. The reaction mixture of any one of claims 33-38, wherein the sample nucleic acid is from a biological sample, a clinical sample, an environmental sample, or a combination thereof.

40. The reaction mixture of claim 39, wherein the biological sample comprises stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue, exudates, body fluids, or combinations thereof.

41. A method of tagging nucleic acids, comprising:

contacting the composition of any one of claims 1-32 with a sample suspected of containing more than one target double-stranded DNA (dsDNA) to form a reaction mixture; and

incubating the reaction mixture to generate more than one dsDNA fragments, each fragment comprising the first adaptor and the second adaptor of one of the more than one protein complexes at each end, respectively.

42. A method for generating a sequencing library, comprising:

contacting the composition of any one of claims 1-32 with a sample suspected of containing more than one target double-stranded DNA (dsDNA) to form a reaction mixture;

Incubating the reaction mixture to generate more than one dsDNA fragments, each fragment comprising the first adaptor and the second adaptor of one of the more than one protein complex at each terminus, respectively; and

amplifying the more than one dsDNA fragment with primers capable of binding to the adaptors at the ends of the dsDNA fragments to generate a sequencing library.

43. The method of claim 42, wherein each of the primers is about 5-80 nucleotides in length.

44. The method of any one of claims 42-43, wherein amplifying the more than one dsDNA fragment with the primer is performed using Polymerase Chain Reaction (PCR).

45. The method of claim 44, wherein the PCR is loop-mediated isothermal amplification (LAMP), helicase-dependent amplification (HDA), recombinase Polymerase Amplification (RPA), strand Displacement Amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), nicking Enzyme Amplification Reaction (NEAR), rolling Circle Amplification (RCA), multiple Displacement Amplification (MDA), branched amplification (RAM), circular helicase-dependent amplification (cHDA), single Primer Isothermal Amplification (SPIA), signal-mediated RNA amplification technology (SMART), self-sustained sequence replication (3 SR), genomic index amplification reaction (GEAR), or Isothermal Multiple Displacement Amplification (IMDA).

46. The method of claim 44, wherein the PCR is real-time PCR or quantitative real-time PCR (QRT-PCR).

47. The method of any one of claims 41-46, wherein the sample comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof.

48. The method of any one of claims 41-47, wherein the more than one target dsDNA comprises genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof.

49. The method of any one of claims 41-48, wherein the sample is or is derived from a biological sample, a clinical sample, an environmental sample, or a combination thereof.

50. The method of any one of claims 41-49, wherein the more than one target dsDNA comprises DNA from at least 2 different organisms.

51. The method of any one of claims 41-50, wherein the more than one target dsDNA comprises DNA from at least 2 different genes.

52. The method of any one of claims 41-51, further comprising producing said more than one target dsDNA from more than one target RNA with reverse transcriptase.

53. The method of any one of claims 41-51, wherein the more than one target dsDNA comprises a target dsDNA produced from a target RNA with a reverse transcriptase.

54. The method of any one of claims 41-53, wherein the more than one target dsDNA comprises a genetic feature of interest.

55. The method of claim 54, wherein the genetic feature of interest comprises one or more mutations of interest.

56. The method of claim 55, wherein the one or more mutations of interest comprise a point mutation, an inversion, a deletion, an insertion, a translocation, a replication, a copy number variation, or a combination thereof.

57. The method of claim 55, wherein the one or more mutations of interest comprise nucleotide substitutions, deletions, insertions, or combinations thereof.

58. The method of any one of claims 54-57, wherein the genetic characteristic of interest is indicative of pathogen identification, antibiotic resistance, or antibiotic susceptibility of the target dsDNA-derived organism.

59. The method of any one of claims 54-57, wherein the genetic feature of interest is indicative of a cancer status of the target dsDNA-derived organism.

60. The method of any one of claims 54-57, wherein the genetic characteristic of interest is indicative of a state of a genetic disease of the target dsDNA-derived organism.

61. The method of claim 60, wherein the genetic disease is a monogenic disorder.

62. The method of claim 60, wherein the genetic disorder is cystic fibrosis, huntington's disease, sickle cell anemia, hemophilia, duchenne muscular dystrophy, thalassemia, fragile X syndrome, familial hypercholesterolemia, polycystic kidney disease, type I neurofibromatosis, hereditary spherical erythromatosis, ma Fanzeng syndrome, tay-saxox disease, phenylketonuria, mucopolysaccharidosis, lysosomal acid lipase deficiency, glycogen storage disease, galactosylation, or hemochromatosis.

63. The method of any one of claims 41-62, wherein contacting the more than one target dsDNA with the more than one protein complex pair occurs at about 25 ℃ to about 80 ℃.

64. The method of any one of claims 41-63, wherein incubating the reaction mixture comprises incubating the reaction mixture at about 37 ℃ to about 55 ℃.

65. The method of any one of claims 41-64, wherein the more than one protein complex pair and the more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 2,000:1.

66. The method of any one of claims 41-64, wherein the more than one protein complex pair and the more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 200:1.

67. The method of any one of claims 41-66, further comprising labeling one or both ends of one or more of the more than one dsDNA fragments.

68. The method of any one of claims 41-66, comprising labeling both ends of one or more of said more than one dsDNA fragments differently.

69. The method of any one of claims 67-68, wherein said labeling comprises labeling with an anionic label, a cationic label, a neutral label, an electrochemical label, a protein label, a fluorescent label, a magnetic label, or a combination thereof.

70. The method of any one of claims 67-69, further comprising enriching for the labeled dsDNA fragments, capturing the labeled dsDNA fragments, isolating the labeled dsDNA fragments, and/or visualizing the labeled dsDNA fragments.