CN117897502A

CN117897502A - Compositions and methods for detecting genetic features

Info

Publication number: CN117897502A
Application number: CN202280058627.5A
Authority: CN
Inventors: T·卢尼
Original assignee: Strange Genomics Systems Inc
Current assignee: Strange Genomics Systems Inc
Priority date: 2021-07-06
Filing date: 2022-06-29
Publication date: 2024-04-16

Abstract

Disclosed herein, inter alia, are compositions and methods that provide a sequencing-efficient solution for detecting genetic features and aberrations.

Description

Compositions and methods for detecting genetic features

Cross reference to related applications

The present application claims the benefits of U.S. provisional application No. 63/218,794 filed on 7.6 of 2021, U.S. provisional application No. 63/297,078 filed on 1.6 of 2022, and U.S. provisional application No. 63/348,939 filed on 3 of 2022; each of the U.S. provisional applications is incorporated by reference herein in its entirety for all purposes.

References to "sequence Listing", tables or computer program List appendix submitted in ASCII files

The sequence listing written in file 051385-548001wo_seq_st25.Txt created on month 29 of 2022, byte number 547, machine format IBM-PC, using MS Windows operating system is incorporated herein by reference.

Background

Gene fusion is a somatic change that may lead to cancer. Translocation, copy number changes, and inversion may lead to gene fusion, as well as deregulation of gene expression and novel molecular functions. The Next Generation Sequencing (NGS) method for gene fusion detection may employ non-targeted sequencing (e.g., whole genome or whole transcriptome sequencing) or targeted sequencing of the fusion gene of interest. The targeting method for gene fusion detection can simplify analysis and reduce cost. A popular method for targeted sequencing of gene fusions involves multiplex PCR, wherein primer sets are designed to generate PCR amplicons spanning known breakpoint junctions; anchored Multiplex PCR (AMP); and methods for enriching breakpoint regions of interest using hybridization capture. Multiplex PCR, however, cannot identify fusions involving novel breakpoints and partners; AMPs have relatively high input requirements and more complex workflows, often limited to RNA analysis only; and hybrid capture has a relatively complex workflow and reduced sensitivity compared to PCR-based methods. For targeted and non-targeted approaches, robustness to sample degradation is often critical due to the widespread use of FFPE preserved tissue and cfDNA as input materials.

Disclosure of Invention

In view of the above, there is a need for a method to achieve high sensitivity targeted analysis of gene fusion with minimal workflow complexity and input requirements, as well as robustness to highly degraded materials. Solutions to these and other problems in the art are described herein, among other things.

In one aspect, there is provided a method of differentially amplifying a polynucleotide comprising a fusion gene relative to a polynucleotide not comprising the fusion gene, the method comprising: i) Circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises a fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise a fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides; ii) binding a blocking element to the one or more non-fused circular template polynucleotides; and iii) hybridizing a first primer and a second primer to the one or more non-fusion circular template polynucleotides and the one or more fusion circular template polynucleotides and extending with a polymerase to produce a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product, wherein the first amount is detectably less than the second amount; thereby differentially amplifying the polynucleotide comprising the fusion gene.

In one aspect, there is provided a method of amplifying a polynucleotide comprising a fusion gene, the method comprising: i) Binding a blocking element to a non-fusion circular template polynucleotide, wherein the non-fusion circular template does not comprise a fusion gene; ii) hybridizing the first primer and the second primer to the non-fused circular template polynucleotide; and hybridizing the first primer and the second primer to a fusion circular template polynucleotide, wherein the fusion circular template polynucleotide comprises a fusion gene; and iii) extending the first primer and the second primer with a non-strand displacement polymerase to produce a fusion polynucleotide amplification product.

In one aspect, a kit is provided comprising: a circularizing agent, wherein the circularizing agent is capable of binding the 5 'and 3' ends of a linear nucleic acid molecule; a blocking element capable of binding to one or more circular polynucleotides; a first primer and a second primer; and a polymerase.

Drawings

Figure 1 shows outward facing primers (as shown by the arrow) designed to target regions of the fusion partner of interest adjacent to the breakpoint location of interest. One element, referred to as a blocking element, prevents extension of the polymerase (e.g., a non-extendable oligomer used in conjunction with a non-strand displacement polymerase) that targets unrearranged sequences adjacent to the outward facing primer. The blocking element selectively inhibits amplification of unrearranged templates, resulting in preferential amplification of templates containing the fusion.

FIGS. 2A-2B illustrate a blocked inverse PCR method. Fig. 2A illustrates a method consisting of: (a) an outwardly facing reverse PCR primer pair; (b) A 5' blocking oligomer that selectively binds to an unordered template adjacent to the reverse PCR primer pair and upstream of the intended fusion breakpoint region; and (c) a second optional 3 'blocking oligomer positioned 3' of the intended fusion junction. The relative positioning of the blocking oligomers is indicated in the figure. A 5 'blocking oligomer refers to an oligonucleotide that binds on the 5' side of an exon junction; similarly, a 3 'blocking oligomer refers to an oligonucleotide that binds on the 3' side of an exon junction. In embodiments, and under suitable conditions, the 5' blocking oligomer is not bound, such that the circularized template can be amplified (e.g., the cDNA contains a fusion junction). In the examples, and under appropriate conditions, the 3' blocking oligomer prevents the amplification of fragments with insufficient coverage of the fusion junction. FIG. 2B shows in detail an example showing an outward facing primer containing a target specific sequence (A) and optionally a sequence (B) for downstream library preparation and analysis.

FIG. 3 shows the strategy of FIG. 1 (i.e., a polynucleotide having a sequence of a first region fused to a sequence of a second region at a fusion junction) applied to a template having a fusion. The 5' blocking oligomer does not bind to the outward facing primer, allowing for selective amplification of templates containing the junction from the debris material. A 5 'blocking oligomer refers to an oligonucleotide that binds on the 5' side of an exon junction; similarly, a 3 'blocking oligomer refers to an oligonucleotide that binds on the 3' side of an exon junction. In embodiments, and under suitable conditions, the 5' blocking oligomer prevents amplification of unrearranged templates (e.g., cdnas that do not contain fusion junctions). In the examples, and under appropriate conditions, the 3' blocking oligomer prevents the amplification of fragments with insufficient coverage of the fusion junction.

FIG. 4 shows a circularized template comprising fusion junctions. In an embodiment, the circularized template comprises two junctions: 1) A junction resulting from fusion of the sample and 2) a junction resulting from circularization of the 5 'and 3' ends of the linear nucleic acid molecule. In embodiments, the latter (i.e., the junction resulting from cyclization) may be used to quantify and estimate template abundance and/or perform error correction.

Fig. 5 illustrates an exemplary overview for detecting translocations. After amplification and sequencing, the sequencing reads are mapped to a reference. Translocation events may result in an excess of intergenic mapping sequences that partially align with non-targeted 5' fusion genes (gene a) and targeted fusion partners (gene B) near the breakpoint.

Fig. 6 illustrates a bioinformatics workflow for breakpoint mapping. Briefly, sequencing reads of a target of interest are identified, for example, by k-mer matching or alignment. The cyclized junctions are then identified by k-mer matching or alignment. In some embodiments, k-mer matching may be achieved using a k-mer index reflecting the circularized junction of nucleic acids produced by known fusions. Next, reads are classified as having intra-genic junctions or inter-genic junctions, and mapped positions and densities of the mapped reads are determined. Direct alignment of reads to breakpoints is not necessary, but may aid in analysis.

Fig. 7 illustrates an embodiment of a method described herein applied to analysis of IGH V (D) J-rearrangement. (A) Traditional methods of amplifying IGH rearrangements involve multiplex PCR primers targeting variable gene framework regions in combination with one or more adapter gene primers. Such methods are limited by the following: the need for complex primer pools, the inability to detect rearrangements with somatic hypermutations within the primer binding sites, and the inability to identify translocations involving the IGHJ gene. (B) In contrast, blocking inverse PCR of IGH loci utilizes outward facing primers targeting rarely mutated junction gene regions. The method minimizes the number of primers required, avoids shedding due to somatic hypermutation, enables detection of the IGHJ translocation, and allows estimation of template copy number by analysis of circularized junctions. The inclusion of blocking elements increases the proportion of rearrangements containing amplicons, thereby facilitating downstream sequencing analysis.

Fig. 8 illustrates an embodiment of a design strategy for the method described herein applied to IGH rearrangement. Outward facing primers are designed to amplify each IGHJ gene while blocking the targeting of the oligomer to the region upstream of and adjacent to each junction gene.

Fig. 9 illustrates an embodiment of a workflow for analyzing B cell rearrangements by the methods described herein. Amplification of the IGH, IGK and IGL loci is followed by next generation sequencing. The resulting reads are filtered to remove short and off-target products, cyclized junctions are identified, unique sequences are collapsed, and then the presence of V (D) J rearrangements is annotated by IgBLAST or similar tool. Reads with effective V (D) J rearrangements were used to determine the frequency and template count of each rearrangement and identify clonal rearrangements consistent with the presence of B cell malignancy. The presence or absence of translocation in reads lacking V (D) J rearrangement is assessed using k-mer analysis or methods known in the art (e.g., geneFuse). A final report was generated indicating V (D) J clonality and easy-to-place status of the sample.

Fig. 10 illustrates an embodiment in which outward facing primers (shown as a pair of arrows pointing towards each other) are designed to target regions of a fusion partner of interest adjacent to a breakpoint location of interest are used in combination with inward facing primers (shown as a pair of arrows pointing towards each other) designed to target somatic mutations (e.g., single Nucleotide Polymorphisms (SNPs), insertions, deletions, copy Number Variations (CNVs), etc.). One element, referred to as a blocking element, prevents extension of the polymerase (e.g., a non-extendable oligomer used in conjunction with a non-strand displacement polymerase) that targets unrearranged sequences adjacent to the outward facing primer. The blocking element selectively inhibits amplification of unrearranged templates, resulting in preferential amplification of templates containing the fusion. After circularization and PCR amplification with inwardly facing primers, for example, the SNP-containing region is amplified.

11A-11C illustrate amplification of a region of interest (e.g., a single region of interest or tandem repeats of a region of interest) using a single-pool multiplex amplification reaction (e.g., a single Chi Duochong PCR reaction). FIG. 11A shows an example in which two pairs of overlapping inward facing primers (e.g., 1F and 1R and 2F and 2R) are used to amplify a target region, resulting in three amplification products (e.g., three PCR products: amplification product of the 1F and 1R primer pairs), amplicon 2 (amplification product of the 2F and 2R primer pairs), and a maximum amplicon (amplification product of the 1F and 2R primer pairs), as described in U.S. patent publication US2016/0340746, which discloses that amplification products are identical regardless of whether a linear template or a circular template is used, are identical as a result of the lower amplification efficiency caused by the stabilized secondary structure, amplification reaction products with overlapping inward facing primers are identical, amplification products from the amplification product of the amplification example 11A (e.g., amplification products of the 1F and 1R primer pairs and 2R primer pairs) are used when the linear template is used, and tandem amplification products of the amplification example 11A are used, and tandem amplification products such as the amplification products of the amplification example 1F and 2R primer pairs are used when the tandem amplification products of the amplification example 11A and the amplification products of the amplification example 11A are used are identical, and tandem amplification products are not used for the tandem amplification products are identical, amplification products of the 2R and 1F primer pairs). The duplicate specific amplicon is identified by the presence of a unique primer pair present in the amplicon and a circularized junction within the amplicon (represented by dashed lines).

Fig. 12 shows a graph highlighting the time aspect of monitoring the Measurable Residual Disease (MRD) of Acute Lymphoblastic Leukemia (ALL). Each line represents the level of residual disease over time following therapeutic intervention (e.g., radiation and/or chemotherapy) at different time points monitored by different hypothetical patients after treatment. The response curve contains: DP (disease persistence), VEP (very early relapse), ER (early relapse), LR (late relapse), VLR (very late relapse) and NR (no relapse). 10-2 represents the proportion of leukemic cells, which represents the approximate lower limit of detection of VER.

Figure 13 shows blocking element efficiency as determined by gel electrophoresis analysis. Synthetic oligomers were generated to represent IGH rearrangements (fusion, F) and unrearranged IGH j6 genes (wild type, W). PCR amplification of each template (as shown in fig. 1) was performed using reverse PCR primers in the presence or absence of non-extendable blocking oligomers (indicated +/-) that were able to hybridize to the W template but not to the F template. Arrows indicate the location of the desired product. The PCR amplification products were then visualized on agarose gels.

Fig. 14 shows the results of bioinformatic reconstruction of the detected breakpoint region within the BCL2 locus of chromosome 18 using the methods described herein. Each grey horizontal line represents one sequencing fragment, and the visual representation of the coverage is on top.

Detailed Description

Described herein are novel methods for detecting gene fusions within and across different independent chromosomes.

I. Definition of the definition

Practice of the techniques described herein will employ, unless indicated to the contrary, conventional methods of chemistry, biochemistry, organic chemistry, molecular biology, microbiology, recombinant DNA technology, genetics, immunology and cell biology within the skill of the art, many of which are described below for purposes of illustration. Examples of such techniques are available in the literature. Methods, devices, and materials similar or equivalent to those described herein can be used in the practice of the present invention.

All patents, patent applications, articles and publications mentioned herein, including above and below, are hereby expressly incorporated by reference in their entirety.

Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Various scientific dictionaries containing the terms contained herein are well known and available to those of skill in the art. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, some preferred methods and materials are described. Accordingly, the terms defined immediately below are more fully described by reference to the entire specification. It is to be understood that this disclosure is not limited to the particular methods, protocols, and reagents described, as these may vary depending on the context in which they are used by those skilled in the art. The following definitions are provided to facilitate understanding of certain terms used frequently herein and are not meant to limit the scope of the present disclosure.

As used herein, the singular terms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Reference throughout this specification to "one embodiment," "an embodiment," "another embodiment," "a particular embodiment," "a related embodiment," "an embodiment," "other embodiments," or "other embodiments" or combinations thereof means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the foregoing phrases appearing throughout the specification do not necessarily all refer to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

As used herein, the term "about" means a range of values that includes the specified value, which one of ordinary skill in the art would consider reasonably similar to the specified value. In an embodiment, the term "about" means within the standard deviation of using measurements generally acceptable in the art. In an embodiment, about means extending to a range of +/-10% of the specified value. In an embodiment, about means a specified value.

Throughout this specification, unless the context requires otherwise, the words "comprise", "comprising", and "include" will be understood to imply the inclusion of a stated step or element or group of steps or elements but not the exclusion of any other step or element or group of steps or elements. "consisting of … …" means including and limited to things after the phrase "consisting of … …". Thus, the phrase "consisting of … …" indicates that the listed elements are required or mandatory and that no other elements may be present. "consisting essentially of … …" means any element listed after the phrase is included and is limited to other elements that do not interfere with or affect the activity or effect described in the disclosure with respect to the listed elements. Thus, the phrase "consisting essentially of … …" indicates that the listed elements are required or necessary, but that other elements are optional and may be present or absent depending on whether they affect the activity or effect of the listed elements.

As used herein, the term "control" or "control experiment" is used in accordance with its ordinary and customary meaning and refers to an experiment in which the subject or reagent of the experiment is treated as in a parallel experiment, except that the procedure, reagent or variable of the experiment is omitted. In some cases, controls were used as a standard for comparison in assessing experimental efficacy.

As used herein, the term "complement" is used in accordance with its plain and ordinary meaning and refers to a nucleotide (e.g., RNA nucleotide or DNA nucleotide) or nucleotide sequence capable of base pairing with a complementary nucleotide or nucleotide sequence. As described herein and well known in the art, the complementary (matching) nucleotide of adenosine is thymidine in DNA or alternatively RNA, the complementary (matching) nucleotide of adenosine is uracil, and the complementary (matching) nucleotide of guanine is cytosine. Thus, the complement may comprise a nucleotide sequence that base pairs with a corresponding complementary nucleotide of the second nucleic acid sequence. The nucleotides of the complement may partially or completely match the nucleotides of the second nucleic acid sequence. When the nucleotides of the complement are perfectly matched to each nucleotide in the second nucleic acid sequence, the complement forms a base pair with each nucleotide in the second nucleic acid sequence. When the nucleotides of the complement match the nucleotide portion of the second nucleic acid sequence, only some of the nucleotides in the complement form base pairs with the nucleotides of the second nucleic acid sequence. Examples of complementary sequences include coding and non-coding sequences, wherein the non-coding sequence contains complementary nucleotides of the coding sequence and thus forms the complement of the coding sequence. Further examples of complementary sequences are sense and antisense sequences, wherein the sense sequence contains the complementary nucleotides of the antisense sequence and thus forms the complement of the antisense sequence. By "double-stranded" is meant that at least two fully or partially complementary oligonucleotides and/or polynucleotides undergo Watson-Crick type base pairing (Watson-Crick type basepairing) between all or most of their nucleotides, thereby forming a stable complex.

As described herein, complementarity of sequences may be partial, where only some of the nucleic acids match according to base pairing, or complete, where all of the nucleic acids match according to base pairing. Thus, two sequences that are complementary to each other may have a specified percentage of nucleotides that complement each other (e.g., about 60%, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more complementarity within a specified region). In an embodiment, when two sequences are fully complementary, they are complementary, with 100% complementarity. In embodiments, sequences in a pair of complementary sequences form part of a single polynucleotide (e.g., hairpin structure with or without an overhang) or part of an isolated polynucleotide with non-base pairing nucleotides. In embodiments, one or both of a pair of complementary sequences forms part of a longer polynucleotide, which may or may not comprise additional complementary regions.

As used herein, the term "contacting" is used in accordance with its ordinary and ordinary meaning and refers to a process of bringing at least two different substances (e.g., chemical compounds comprising biomolecules or cells) into close enough proximity to react, interact, or physically touch. However, the resulting reaction product may be produced directly from the reaction between the added reagents or from intermediates from one or more of the added reagents that may be produced in the reaction mixture. The term "contacting" may comprise allowing two species, which may be compounds, nucleic acids, proteins or enzymes (e.g., DNA polymerase), to react, interact or physically touch.

As used herein, the term "nucleic acid" is used in accordance with its simple and ordinary meaning and refers to nucleotides (e.g., deoxyribonucleotides or ribonucleotides) and polymers thereof or complements thereof in single-stranded, double-stranded or multi-stranded form. The terms "polynucleotide", "oligonucleotide", "oligomer" and the like refer to the sequence of nucleotides in a general and customary sense. The term "nucleotide" refers in a general and customary sense to a single unit of a polynucleotide, i.e., a monomer. The nucleotide may be a ribonucleotide, a deoxyribonucleotide or a modified form thereof. Examples of polynucleotides contemplated herein include single-and double-stranded DNA, single-and double-stranded RNA having a linear or circular framework, and hybrid molecules having mixtures of single-and double-stranded DNA and RNA. Non-limiting examples of polynucleotides include genes, gene fragments, exons, introns, intergenic DNA (including but not limited to heterochromatic DNA), messenger RNAs (mrnas), transfer RNAs, ribosomal RNAs, ribozymes, cdnas, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of sequences, isolated RNA of sequences, nucleic acid probes, and primers. Polynucleotides useful in the methods of the present disclosure may comprise natural nucleic acid sequences and variants thereof, artificial nucleic acid sequences, or combinations of such sequences. "nucleosides" are similar in structure to nucleotides, but lack a phosphate moiety. Examples of nucleoside analogs are those in which the label is attached to the base and no phosphate group is attached to the sugar molecule. As used herein, the terms "nucleic acid oligomer" and "oligonucleotide" are used interchangeably and are intended to include, but are not limited to, nucleic acids 200 nucleotides or less in length. In some embodiments, the oligonucleotide is a nucleic acid of 2 to 200 nucleotides, 2 to 150 nucleotides, 5 to 150 nucleic acids, or 5 to 100 nucleotides in length.

As used herein, the term "primer" is defined as one or more nucleic acid fragments that can specifically hybridize to a nucleic acid template, bind by a polymerase, and extend during template-directed nucleic acid synthesis. The primer may be of any length, depending on the particular technique for which it is to be used. For example, PCR primers are typically between 10 and 40 nucleotides in length. In some embodiments, the primer is 200 nucleotides or less in length. In certain embodiments, the primer is 10 to 150 nucleotides, 15 to 150 nucleotides, 5 to 100 nucleotides, 5 to 50 nucleotides, or 10 to 50 nucleotides in length. The length and complexity of the nucleic acid immobilized on the nucleic acid template is not critical. The skilled artisan can adjust these factors to provide optimal hybridization and signal generation for a given hybridization procedure and to provide a desired resolution between different gene or genomic locations. Primers allow the addition of nucleotide residues thereto or the synthesis of oligonucleotides or polynucleotides therefrom under suitable conditions known in the art. In one embodiment, the primer is a DNA primer, i.e., a primer consisting of or consisting essentially of deoxyribonucleotide residues. The primer is designed to have a sequence complementary to the template/target DNA region to which the primer hybridizes. The addition of nucleotide residues to the 3' end of the primer by formation of phosphodiester bonds results in DNA extension products. The addition of nucleotide residues to the 3' end of the DNA extension product by formation of phosphodiester bonds will result in additional DNA extension products. In another embodiment, the primer is an RNA primer. In an embodiment, the primer is hybridized to the target polynucleotide. A "primer" comprises a sequence complementary to a polynucleotide template, and a complex formed by hydrogen bonding or hybridization to the template to create a primer/template complex to prime polymerase synthesis, the primer extending during DNA synthesis by the addition of a covalently bonded base attached at the 3' end complementary to the template.

As used herein, the terms "solid support" and "substrate" and "solid surface" refer to a discrete solid or semi-solid surface to which a plurality of primers may be attached. The solid support may encompass any type of solid, porous or hollow sphere, cylinder, or other similar configuration composed of a plastic, ceramic, metal, or polymeric material (e.g., hydrogel) to which the nucleic acid may be immobilized (e.g., covalent or non-covalent). The solid support may beIncluding discrete particles, which may be spherical (e.g., microspheres) or have non-spherical or irregular shapes, such as cubic, rectangular, pyramidal, cylindrical, conical, elliptical, disc-shaped, etc. Solid supports in the form of discrete particles may be referred to herein as "beads," which alone do not imply or require any particular shape. The shape of the beads may be non-spherical. The solid support may further comprise a polymer or hydrogel on the surface to which the primer is attached (e.g., a splint primer is covalently attached to the polymer, wherein the polymer is in direct contact with the solid support). Exemplary solid supports include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylic, polystyrene, and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethane, teflon ^TM Cyclic olefin copolymers, polyimides, etc.), nylon, ceramics, resins, zeonor, silica or silica-based materials (including silicon and modified silicon), carbon, metals, inorganic glass, fiber bundles, photopatterned dry film resists, UV cured adhesives, and polymers. The solid support of some embodiments has at least one surface positioned within the flow cell. The solid support or region thereof may be substantially planar. The solid support may have surface features such as wells, pits, channels, ridges, raised areas, posts, columns, and the like. The term solid support encompasses substrates (e.g., flow cells) having a surface comprising a polymeric coating covalently attached thereto. In an embodiment, the solid support is a flow cell. The term "flow cell" as used herein refers to a chamber containing a solid surface through which one or more fluidic reagents can flow. Examples of flow cells and related fluidic systems and detection platforms that can be readily used in the methods of the present disclosure are described, for example, in Bentley et al Nature, 456:53-59 (2008).

In some embodiments, the nucleic acid comprises a capture nucleic acid. A capture nucleic acid refers to a nucleic acid that is attached to a substrate (e.g., covalently attached). In some embodiments, the capture nucleic acid comprises a primer. In some embodiments, the capture nucleic acid is a nucleic acid configured to specifically hybridize to a portion of one or more nucleic acid templates (e.g., templates of a library). In some embodiments, a capture nucleic acid configured to specifically hybridize to a portion of one or more nucleic acid templates is substantially complementary to an appropriate portion of the nucleic acid templates or amplicons thereof. In some embodiments, the capture nucleic acid is configured to specifically hybridize to a portion of an adapter or a portion thereof. In some embodiments, the capture nucleic acid or a portion thereof is substantially complementary to a portion of the adapter or complement thereof. In some embodiments, the capture nucleic acid is a probe oligonucleotide. Typically, the probe oligonucleotide is complementary to the target polynucleotide or a portion thereof, and further comprises a label (e.g., a binding moiety) or is attached to the surface such that hybridization to the probe oligonucleotide allows selective separation of unbound polynucleotides from probe-bound polynucleotides in the population. The probe oligonucleotide may or may not be used as a primer.

Nucleic acids, including, for example, nucleic acids having phosphorothioate backbones, may comprise one or more reactive moieties. As used herein, the term reactive moiety comprises any group capable of reacting with another molecule, e.g., a nucleic acid or polypeptide, through covalent, non-covalent, or other interactions. For example, a nucleic acid may comprise an amino acid reactive moiety that reacts with an amino acid on a protein or polypeptide by covalent, non-covalent, or other interactions.

Polynucleotides are typically composed of a specific sequence of four nucleotide bases: adenine (a); cytosine (C); guanine (G); and thymine (T) (uracil (U) represents thymine (T) when the polynucleotide is RNA). Thus, the term "polynucleotide sequence" is an alphabetical representation of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself. This alphabetical representation can be entered into a database in a computer with a central processing unit and used for bioinformatic applications such as functional genomics and homology searches. The polynucleotide may optionally comprise one or more non-standard nucleotides, nucleotide analogs, and/or modified nucleotides.

As used herein, the term "template nucleic acid" refers to any polynucleotide molecule that can be bound by a polymerase and used as a template for nucleic acid synthesis. The template nucleic acid may be a target nucleic acid. In general, the term "target nucleic acid" refers to a nucleic acid molecule or polynucleotide in a starting population of nucleic acid molecules having a target sequence whose presence, amount, and/or nucleotide sequence or changes in one or more of them need to be determined. In general, the term "target sequence" refers to a nucleic acid sequence on a single strand of nucleic acid. The target sequence may be a portion of a gene, a regulatory sequence, genomic DNA, cDNA, RNA comprising mRNA, miRNA, rRNA, or the like. The target sequence may be a target sequence from a sample or a secondary target, such as the product of an amplification reaction. The target nucleic acid need not be any single molecule or sequence. For example, depending on the reaction conditions, the target nucleic acid may be any of a variety of target nucleic acids in a reaction, or all nucleic acids in a given reaction. For example, in a nucleic acid amplification reaction with random primers, all polynucleotides in the reaction may be amplified. As a further example, a pool of targets may be determined simultaneously in a single reaction using polynucleotide primers directed to multiple targets. As yet another example, all or a subset of the polynucleotides in a sample may be modified by the addition of primer binding sequences (e.g., by ligating adaptors containing primer binding sequences) such that each modified polynucleotide becomes a target nucleic acid in a reaction with a corresponding primer polynucleotide. In the context of selective sequencing, a "target nucleic acid" refers to a subset of nucleic acids sequenced from within an initial population of nucleic acids.

The term "polynucleotide fusion" is used in accordance with its plain and ordinary meaning and refers to a polynucleotide formed by the joining of two regions of a reference sequence (e.g., a reference genome) that are not so joined in the reference sequence, thereby forming a fusion junction between the two regions that are not present in the reference sequence. Polynucleotide fusions can be formed through a number of processes, including inter-chromosomal translocations, intra-chromosomal translocations, and other chromosomal rearrangements (e.g., inversions and duplications). Polynucleotide fusion may involve fusion between two gene sequences, referred to as "gene fusion" and results in a "fusion gene". In some cases, the fusion gene is expressed as a fusion transcript (e.g., a fusion mRNA transcript) comprising the sequences of both genes or portions thereof.

"fusion gene" is used in accordance with its ordinary meaning in the art and refers to a hybrid gene or portion thereof formed from two previously independent genes or portions thereof (e.g., in a cell). A "fusion junction" is a point in the sequence of a fusion gene between two previously independent genes or portions thereof. Hybrid genes may be caused by translocation of the gene or gene portion, interstitial deletions and/or chromosomal inversion. An "exon junction" is a point or position in a fusion gene sequence between two previously independent exon sequences or portions thereof.

The nucleic acid may be amplified by a suitable method. The term "amplification" as used herein refers to a process of linearly or exponentially generating an amplicon nucleic acid having the same or substantially the same (e.g., substantially the same) nucleotide sequence and/or complement as a target nucleic acid or a segment thereof in a sample. In some embodiments, the amplification reaction comprises a suitable thermostable polymerase. Thermostable polymerases are known in the art and are stable for extended periods of time at temperatures above 80 ℃ compared to common polymerases found in most mammals. In certain embodiments, the term "amplification" refers to a method comprising the Polymerase Chain Reaction (PCR). The conditions conducive to amplification (i.e., amplification conditions) typically comprise at least the use of a suitable polymerase, a suitable template, a suitable primer or set of primers, a suitable nucleotide (e.g., dNTPs), a suitable buffer, and a suitable annealing, hybridization, and/or extension time and temperature. In certain embodiments, the amplification product (e.g., amplicon) may contain one or more additional and/or different nucleotides than the template sequence or portion thereof from which the amplicon was generated (e.g., the primer may contain "additional" nucleotides (e.g., a 5' portion that does not hybridize to the template), or one or more mismatched bases within the hybridized portion of the primer).

As used herein, "differential amplification" (differential amplification or differential amplification) refers to the degree of amplification of a gene of interest being greater than the degree of amplification of a reference gene, thereby resulting in a greater amount of amplified product from the gene of interest relative to the amount of amplified product from a reference gene. In embodiments, the gene of interest comprises a polynucleotide sequence that includes a fusion gene, and the gene of interest comprises a polynucleotide that does not include a fusion gene.

As used herein, the term "Rolling Circle Amplification (RCA)" refers to a nucleic acid amplification reaction that amplifies a circular nucleic acid template (e.g., a single-stranded DNA circle) by a rolling circle mechanism. Rolling circle amplification reactions are initiated by hybridization of primers to a circular (usually single stranded) nucleic acid template. The nucleic acid polymerase then extends the primer hybridized to the circular nucleic acid template by continuing around the circular nucleic acid template to repeat the sequence of the nucleic acid template once again (rolling circle mechanism). Rolling circle amplification generally produces concatemers comprising tandem repeat units of a circular nucleic acid template sequence. Rolling circle amplification may be Linear RCA (LRCA) that exhibits linear amplification kinetics (e.g., RCA using a single specific primer), or may be exponential RCA (ecrca) that exhibits exponential amplification kinetics. Rolling circle amplification can also be performed using multiple primers (multiplex primer rolling circle amplification or MPRCA) to generate hyperbranched concatamers. For example, in a dual primer RCA, one primer may be complementary to a circular nucleic acid template, as in a linear RCA, while the other may be complementary to a tandem repeat unit nucleic acid sequence of the RCA product. Thus, a double primer RCA can be performed as a chain reaction with exponential (geometric) amplification kinetics, characterized by a branched cascade involving multiple hybridization of two primers, primer extension and strand displacement events. This typically produces a discrete set of multiple duplex double stranded nucleic acid amplification products. Rolling circle amplification can be performed in vitro under isothermal conditions using a suitable nucleic acid polymerase, such as Phi29 DNA polymerase. RCA may be performed by using any DNA polymerase known in the art (e.g., phi29 DNA polymerase, bst DNA polymerase, or SD polymerase).

The nucleic acid may be amplified by a thermal cycling method or an isothermal amplification method. In some embodiments, rolling circle amplification methods are used. In some embodiments, the amplification occurs on a solid support (e.g., within a flow-through cell) to which the nucleic acid, nucleic acid library, or portion thereof is immobilized. In some sequencing methods, a nucleic acid library is added to a flow cell and immobilized to an anchor by hybridization under appropriate conditions. This type of nucleic acid amplification is commonly referred to as solid phase amplification. In some embodiments of solid phase amplification, all or part of the amplified product is synthesized by extension primed by the immobilized primer. The solid phase amplification reaction is similar to standard solution phase amplification except that at least one of the amplification oligonucleotides (e.g., primers) is immobilized on a solid support.

In some embodiments, the solid phase amplification comprises a nucleic acid amplification reaction comprising only one species of oligonucleotide primer immobilized to a surface or substrate. In certain embodiments, the solid phase amplification comprises a plurality of different immobilized oligonucleotide primer species. In some embodiments, the solid phase amplification may comprise a nucleic acid amplification reaction comprising an oligonucleotide primer of one species immobilized on a solid surface and a second, different oligonucleotide primer species in solution. Immobilized primers or solution-based primers of a variety of different species may be used. Non-limiting examples of solid phase nucleic acid amplification reactions include interfacial amplification, bridged PCR amplification, emulsion PCR, wildFire amplification (e.g., U.S. patent publication No. US 20130012399), and the like, or combinations thereof.

In embodiments, the target nucleic acid is a cell-free nucleic acid. In general, the terms "cell-free", "circulating" and "extracellular" (e.g., "cell-free DNA" (cfDNA) and "cell-free RNA" (cfRNA)) as applied to nucleic acids are used interchangeably to refer to nucleic acids present in a sample from a subject or portion thereof, which can be isolated or otherwise manipulated (e.g., as extracted from a cell or virus) without applying a cleavage step to the initially collected sample. Thus, even prior to collection of a subject sample, cell-free nucleic acid is not encapsulated or "dissociated" from the cell or virus from which it was derived. Cell-free nucleic acids can be produced as a byproduct of cell death (e.g., apoptosis or necrosis) or cell shedding, thereby releasing the nucleic acid into the surrounding body fluid or circulation. Thus, cell-free nucleic acids may be isolated from non-cellular fractions of blood (e.g., serum or plasma), other bodily fluids (e.g., urine), or non-cellular fractions of other types of samples.

As used herein, the term "analog" when referring to a chemical compound refers to a compound that has a structure similar to that of another chemical compound, but differs from it in one or more different atoms, functional groups, or substructures replaced by one or more other atoms, functional groups, or substructures. In the context of nucleotides, "nucleotide analogs" and "modified nucleotides" refer to a compound that, like the nucleotides that are analogs thereof, can be incorporated into a nucleic acid molecule (e.g., an extension product) by a suitable polymerase, e.g., a DNA polymerase in the context of a nucleotide analog. The term also encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring or non-naturally occurring, which have similar binding properties as the reference nucleic acid and which are metabolized in a manner similar to the reference nucleotide. Examples of such analogs include, but are not limited to, phosphodiester derivatives including, for example, phosphoramidates, phosphorodiamidates, phosphorothioates (also known as phosphorothioates, which have double bond sulfur substituted oxygen containing phosphates), phosphorodithioates, phosphonocarboxylic acids, phosphonocarboxylic acid esters, phosphonoacetic acid, phosphonoformic acid, methylphosphonates, borophosphonates, or O-methylphosphinamide linkages (see, e.g., eckstein, oligonucleotides and analogs: methods of use (OLIGONUCLEOTIDES AND ANALOGUES: A PRACTICAL APPROACH), oxford university press (Oxford University Press)), and modifications to nucleotide bases as in 5-methylcytidine or pseudouridine; peptide nucleic acid backbones and linkages. Other similar nucleic acids include nucleic acids having a positive backbone; nonionic backbones, modified sugar and non-ribose backbones (e.g., phosphorodiamidate morpholino oligonucleotides or Locked Nucleic Acids (LNAs)), including those described in the following documents: U.S. Pat. nos. 5,235,033 and 5,034,506, and chapters 6 and 7, ASC seminar series 580 (ASC Symposium Series 580), carbohydrate modification in antisense studies (CARBOHYDRATE MODIFICATIONS IN ANTISENSE RESEARCH), editors: sanghui and Cook. Nucleic acids containing one or more carbocyclic sugars are also included within one definition of nucleic acid. Modification of the ribose-phosphate backbone can be performed for a variety of reasons, for example, to increase the stability and half-life of such molecules in physiological environments, or as probes on biochips. Mixtures of naturally occurring nucleic acids and analogs can be prepared; alternatively, mixtures of different nucleic acid analogs can be prepared, as well as mixtures of naturally occurring nucleic acids and analogs. In embodiments, the internucleotide linkages in the DNA are phosphodiester, phosphodiester derivatives, or a combination of both.

As used herein, "natural" nucleotide is used in accordance with its plain and ordinary meaning and refers to naturally occurring nucleotides that do not contain exogenous markers (e.g., fluorescent dyes or other markers) or chemical modifications, such as chemical modifications (e.g., reversible terminating moieties) that can characterize a nucleotide analog. Examples of natural nucleotides that can be used to perform the procedures described herein include: dATP (2 '-deoxyadenosine-5' -triphosphate); dGTP (2 '-deoxyguanosine-5' -triphosphate); dCTP (2 '-deoxycytidine-5' -triphosphate); dTTP (2 '-deoxythymidine-5' -triphosphate); and dUTP (2 '-deoxyuridine-5' -triphosphate).

As used herein, the term "modified nucleotide" refers to a nucleotide that is modified in some way. Typically, a nucleotide contains a single 5-carbon sugar moiety, a single nitrogenous base moiety, and 1 to 3 phosphate moieties. In embodiments, the nucleotide may comprise a blocking moiety (alternatively referred to herein as a reversible terminator moiety) and/or a labeling moiety. The blocking moiety on a nucleotide prevents covalent bonds from forming between the 3 'hydroxyl moiety of the nucleotide and the 5' phosphate of another nucleotide. The blocking moiety on a nucleotide may be reversible, whereby the blocking moiety may be removed or modified to allow the 3 'hydroxyl group to form a covalent bond with the 5' phosphate of another nucleotide. The blocking moiety may be effectively irreversible under the particular conditions used in the methods set forth herein. In embodiments, the blocking moiety is attached to the 3' oxygen of the nucleotide and is independently-NH ₂ 、-CN、-CH ₃ 、C ₂ -C ₆ Allyl (e.g. -CH) ₂ -CH＝CH ₂ ) Methoxyalkyl (e.g., -CH) ₂ -O-CH ₃ ) or-CH ₂ N ₃ . In embodiments, the blocking moiety is linked to the 3' oxygen of the nucleotide, and independently The labeling moiety of a nucleotide may be any moiety that allows for detection of the nucleotide, for example, using spectroscopic methods. Exemplary label moieties are fluorescent labels, mass labels, chemiluminescent labels, electrochemical labels, detectable labels, and the like. One or more of the above moieties may be absent from the nucleotides used in the methods and compositions set forth herein. For example, the nucleotides may lack a labeling moiety or a blocking moiety, or both. Examples of nucleotide analogs include, but are not limited to, 7-deaza-adenine, 7-deaza-guanine, analogs of the deoxynucleotides shown herein, analogs of 5-position labels linked to cytosine or thymine by a cleavable linker or 7-position analogs of deaza-adenine or deaza-guanine, and analogs that use a small chemical moiety to cap the-OH group at the 3' position of deoxyribose. Nucleotide analogs and DNA sequencing based on DNA polymerase are also described in U.S. patent No. 6,664,079, which is incorporated herein by reference in its entirety for all purposes.

In embodiments, the nucleotides of the present disclosure use cleavable linkers to attach the tag to the nucleotide. The use of cleavable linkers ensures that the tag can be removed (if desired) after detection, avoiding any interfering signals to any subsequently incorporated tagged nucleotides. The use of the term "cleavable linker" is not intended to imply that the entire linker needs to be removed from the nucleotide base. The cleavage site may be located at a position on the linker that ensures that a portion of the linker remains attached to the nucleotide base after cleavage. The linker can be attached at any position on the nucleotide base as long as Watson-Crick base pairing is still possible. In the context of purine bases, it is preferred that the linker is linked by 7-position of the purine or by a preferred deazapurine analogue, by an 8-modified purine, by an N-6 modified adenosine or an N-2 modified guanine. For pyrimidines, the linkage is preferably by 5-position on cytidine, thymidine or uracil and N-4 position on cytosine.

In embodiments, the nucleotides of the present disclosure use cleavable linkers to attach the tag to the nucleotide. The use of cleavable linkers ensures that the tag can be removed (if desired) after detection, avoiding any interfering signals to any subsequently incorporated tagged nucleotides. The use of the term "cleavable linker" is not intended to imply that the entire linker needs to be removed from the nucleotide base. The cleavage site may be located at a position on the linker that ensures that a portion of the linker remains attached to the nucleotide base after cleavage. The linker can be attached at any position on the nucleotide base as long as Watson-Crick base pairing is still possible. In the context of purine bases, it is preferred that the linker is linked by 7-position of the purine or by a preferred deazapurine analogue, by an 8-modified purine, by an N-6 modified adenosine or an N-2 modified guanine. For pyrimidines, the linkage is preferably by 5-position on cytidine, thymidine or uracil and N-4 position on cytosine. The term "cleavable linker" or "cleavable moiety" as used herein refers to a divalent or monovalent moiety, respectively, that is capable of being separated (e.g., detached, cleaved, broken, hydrolyzed, stable bonds within the moiety) into distinct entities. Cleavable linkers are cleavable (e.g., specifically cleavable) in response to an external stimulus (e.g., an enzyme, a nucleophilic/basic reagent, a reducing agent, light irradiation, an electrophilic/acidic reagent, an organometallic and metallic reagent, or an oxidizing agent). Chemically cleavable linkers represent a catalyst capable of responding to a chemical species (e.g., acid, bond, oxidant, reductant, pd (0), tris- (2-carboxyethyl) phosphine, dilute nitrous acid, fluoride, tris (3-hydroxypropyl) phosphine), dithionite Sodium (Na) ₂ S ₂ O ₄ ) Or hydrazine (N) ₂ H ₄ ) Is split in the presence of a linker. Chemically cleavable linkers are not enzymatically cleavable. In an embodiment, the cleavable linker is cleaved by contacting the cleavable linker with a cleavage reagent. In an embodiment, the cleavage agent is a phosphine-containing agent (e.g., TCEP or THPP), sodium dithionite (Na ₂ S ₂ O ₄ ) Weak acid, hydrazine (N) ₂ H ₄ ) Pd (0) or optical radiation (e.g., ultraviolet radiation). In an embodiment, cutting includes removal. In the context of polynucleotides, a "cleavable site" or "scission bond" is a site that allows for controlled cleavage of a polynucleotide strand (e.g., a linker, primer, or polynucleotide) by chemical, enzymatic, or photochemical means known in the art and described herein. A splice site may refer to a bond of nucleotides (i.e., an internucleoside bond) between two other nucleotides in a nucleotide chain. In embodiments, the scissoring bond may be located at any position within one or more nucleic acid molecules, including at or near a terminus (e.g., the 3' end of an oligonucleotide), or in an internal position of the one or more nucleic acid molecules. In embodiments, the conditions suitable for separating the scissoring bond comprise adjusting pH and/or temperature. In embodiments, the scission site may comprise at least one acid labile bond. For example, the acid labile bond may comprise a phosphoramidate linkage. In an example, the phosphoramidate linkage can be hydrolyzed under acidic conditions, including weakly acidic conditions such as trifluoroacetic acid and a suitable temperature (e.g., 30 ℃) or other conditions known in the art, such as Matthias Mag et al, tetrahedral communication (Tetrahedron Letters), volume 33, 48, 1992,7319-7322. In an embodiment, the splice site may comprise at least one photolabile internucleoside linkage (e.g., an o-nitrobenzyl linkage, as described in Walker et al, J.Am. Chem. Soc.) "1988,110,21,7170-7177, such as o-nitrobenzyloxymethyl or p-nitrobenzyloxymethyl. In embodiments, the splice site comprises at least one uracil nucleobase. In embodiments, uracil nucleobases can be cleaved with Uracil DNA Glycosylase (UDG) or carboxamide pyrimidine DNA glycosylase (Fpg). In implementation In an example, the splice junction comprises a sequence specific nicking site having a nucleotide sequence that is recognized and nicked by a nicking endonuclease or uracil DNA glycosylase.

As used herein, the term "removable" group, such as a labeling or blocking group or a protecting group, is used in accordance with its simple and ordinary meaning and refers to a chemical group that can be removed from a nucleotide analog such that a DNA polymerase can extend a nucleic acid (e.g., a primer or extension product) by incorporating at least one additional nucleotide. Removal may be by any suitable method, including enzymatic, chemical or proteolytic cleavage. Removal of a removable group, such as a blocking group, need not remove the entire removable group, but need only remove a sufficient portion thereof so that the DNA polymerase can extend the nucleic acid by incorporating at least one additional nucleotide using a nucleotide or nucleotide analog.

As used herein, the terms "blocking moiety", "reversible blocking group", "reversible terminator" and "reversible terminator moiety" are used in accordance with their simple and ordinary meanings and refer to cleavable moieties that do not interfere with incorporation of nucleotides contained therein by a polymerase (e.g., DNA polymerase, modified DNA polymerase), but prevent additional chain extension ("unblocked") prior to being removed. For example, a reversible terminator may refer to a blocking moiety located, for example, at the 3' position of a nucleotide, and may be a chemically cleavable moiety, such as allyl, azidomethyl, or methoxymethyl, or may be an enzymatically cleavable group, such as a phosphate. Suitable nucleotide blocking moieties are described in applications WO 2004/018497, U.S. Pat. No. 7,057,026, 7,541,444, WO 96/07669, U.S. Pat. Nos. 5,763,594, 5,808,045, 5,872,244, and 6,232,465, the contents of which are incorporated herein by reference in their entirety. Nucleotides may be labeled or unlabeled. The nucleotide may be modified with a reversible terminator useful in the methods provided herein, and may be a 3 '-O-blocked reversible terminator or a 3' -unblocked reversible terminator. In reversible terminators with 3' -O-blocking In nucleotides, the blocking group may be represented as an-OR [ reversible terminating (capping) group]Where O is the oxygen atom of the 3' -OH of the pentose and R is a blocking group, the tag is attached to a base, which acts as a reporter and can be cleaved. Reversible terminators for 3 '-O-blocking are known in the art and may be, for example, 3' -ONH ₂ A reversible terminator, a 3 '-O-allyl reversible terminator, or a 3' -O-azidomethyl reversible terminator. In an embodiment, the reversible terminator moiety is As described herein, the term "allyl" refers to an unsubstituted methylene group attached to a vinyl group having the formula (i.e., -ch=ch ₂ ) Has the formula->In an embodiment, the reversible terminator moiety is described in US10,738,072 (which is incorporated herein by reference for all purposes)For example, a nucleotide comprising a reversible terminator moiety can be represented by the formula:

wherein the nucleobase is adenine or an adenine analog, thymine or a thymine analog, guanine or a guanine analog or cytosine or a cytosine analog.

As used herein, the term "label" is used in accordance with its plain and ordinary meaning and refers to a molecule that is capable of generating or causing a detectable signal, either directly or indirectly, by itself or by interaction with another molecule. Non-limiting examples of detectable labels include fluorescent dyes, biotin, digoxin, haptens, and epitopes. In general, a dye is a molecule, compound or substance that can provide an optically detectable signal, such as a colorimetric, luminescent, bioluminescent, chemiluminescent, phosphorescent or fluorescent signal. In an embodiment, the label is a dye. In an embodiment, the dye is a fluorescent dye. Non-limiting examples of dyes, some of which are commercially available, include CF dyes (Biotium limited), alexa Fluor dyes (sameiser), dyLight dyes (sameiser), cy dyes (GE health science), IRDye (Li-Cor biosciences limited), and HiLyte dyes (Anaspec limited). In embodiments, a particular nucleotide type is associated with a particular tag such that the identifying tag identifies the nucleotide with which it is associated. In embodiments, the label is luciferin, which reacts with luciferase to produce a detectable signal in response to one or more bases incorporated into the elongated complementary strand, as in pyrosequencing. In embodiments, the nucleotide comprises a label (e.g., a dye). In an embodiment, the tag is not associated with any particular nucleotide, but detection of the tag identifies whether one or more nucleotides of known identity are added during the extension step (as in the case of pyrosequencing).

In an embodiment, the detectable label is a fluorescent dye. In an embodiment, the detectable label is a fluorescent dye (e.g., a Fluorescence Resonance Energy Transfer (FRET) chromophore) capable of exchanging energy with another fluorescent dye.

In an embodiment, the detectable moiety is part of a derivative of one of the immediately above described detectable moieties, wherein the derivative differs from one of the immediately above described detectable moieties in the modification resulting from conjugation of the detectable moiety to a compound described herein.

The term "cyanine" or "cyanine moiety" as described herein refers to a detectable moiety containing two nitrogen groups separated by a polymethylene chain. In an embodiment, the cyanine moiety has 3 methine structures (i.e., cyanine 3 or Cy 3). In an embodiment, the cyanine moiety has 5 methine structures (i.e., cyanine 5 or Cy 5). In an embodiment, the cyanine moiety has 7 methine structures (i.e., cyanine 7 or Cy 7).

As used herein, the terms "DNA polymerase" and "nucleic acid polymerase" are used in accordance with their ordinary and customary meanings and refer to enzymes capable of synthesizing nucleic acid molecules from nucleotides (e.g., deoxyribonucleotides). Typically, a DNA polymerase adds nucleotides to the 3' end of a DNA strand, one nucleotide at a time. In embodiments, the DNA polymerase is a Pol IDNA polymerase, pol IIDNA polymerase, pol IIIDNA polymerase, pol IV DNA polymerase, pol V DNA polymerase, pol β DNA polymerase, pol μ DNA polymerase, pol λ DNA polymerase, pol σdna polymerase, pol α DNA polymerase, pol δ DNA polymerase, pol epsilon DNA polymerase, pol ηdna polymerase, pol iota DNA polymerase, pol κdna polymerase, pol ζdna polymerase, pol γ DNA polymerase, pol θ DNA polymerase, pol V DNA polymerase, or thermophilic nucleic acid polymerase (e.g., thermomer γ, 9°n polymerase (exo-), thermomer II, thermomer III, or thermomer IX). In embodiments, the DNA polymerase is a modified archaebacteria DNA polymerase. In embodiments, the polymerase is a reverse transcriptase. In an embodiment, the polymerase is a mutant deep sea pneumococcal polymerase (e.g., a mutant deep sea pneumococcal polymerase as described in WO 2018/148723 or WO 2020/056044).

As used herein, the term "exonuclease activity" is used according to its ordinary meaning in the art and refers to the removal of nucleotides from a nucleic acid by a DNA polymerase. For example, during polymerization, nucleotides are added to the 3' end of the primer strand. Sometimes, DNA polymerase incorporates a wrong nucleotide into the 3' -OH end of the primer strand, wherein the wrong nucleotide cannot form hydrogen bonds with the corresponding base in the template strand. Such erroneously added nucleotides are removed from the primer due to the 3 'to 5' exonuclease activity of the DNA polymerase. In embodiments, the exonuclease activity may be referred to as "proofreading". When referring to 3' -5' exonuclease activity, it is understood that DNA polymerase promotes hydrolysis reactions that break the phosphodiester bonds at either 3' end of the polynucleotide strand to cleave the nucleotides. In an embodiment, 3' -5' exonuclease activity refers to the sequential removal of nucleotides in single-stranded DNA in the 3' →5' direction, thereby releasing deoxyribonucleoside 5' -monophosphates one by one. Methods for quantifying exonuclease activity are known in the art, see for example Southworth et al, proceedings of the national academy of sciences (PNAS), volume 93, 8281-8285 (1996).

As used herein, the term "incorporate" or "chemical incorporation," when used with reference to a primer and homologous nucleotide, refers to the process of joining the homologous nucleotide to the primer or extension product thereof by forming a phosphodiester bond.

As used herein, the term "selectivity" or the like of a compound refers to the ability of the compound to distinguish between molecular targets. When used in the context of sequencing, as in "selective sequencing," this term refers to sequencing one or more target polynucleotides from an original starting polynucleotide population, rather than sequencing non-target polynucleotides from the starting population. Typically, selectively sequencing one or more target polynucleotides involves manipulating the target polynucleotides differentially based on known sequences. For example, the target polynucleotide may be hybridized to a probe oligonucleotide, which may be labeled (e.g., with a member of a binding pair) or bound to a surface. In an embodiment, hybridizing the target polynucleotide to the probe oligonucleotide comprises the step of displacing one strand of the double-stranded nucleic acid. The target polynucleotide to which the probe hybridizes may then be separated from the non-hybridized polynucleotides, such as by removing probe-bound polynucleotides from the starting population or by washing away non-probe-bound polynucleotides. The result is a selected subset of the initial population of polynucleotides, which are then sequenced, thereby selectively sequencing the one or more target polynucleotides.

As used herein, the term "specificity" or the like of a compound refers to the ability of a compound to exert a Specific effect (e.g., binding) on a particular molecular target with little or no effect on other proteins in the cell.

As used herein, the terms "bind" and "binding" are used in accordance with their plain and ordinary meanings and refer to association between atoms or molecules. The association may be direct or indirect. For example, the bound atoms or molecules may be directly bound to each other, such as by covalent or non-covalent bonds (e.g., electrostatic interactions (e.g., ionic bonds, hydrogen bonds, halogen bonds), van der Waals interactions (e.g., dipole-dipole, dipole-induced dipole, london dispersion), ring packing (pi effect), hydrophobic interactions, and the like). As a further example, two molecules may bind indirectly to each other by way of direct binding to one or more intermediate molecules, thereby forming a complex.

As used herein, the terms "sequencing," "sequence determination," "determining nucleotide sequence," and the like encompass the determination of partial and complete sequence information, encompass the identification, ordering, or location of nucleotides comprising a polynucleotide being sequenced, and encompass the physical processes used to generate such sequence information. That is, the term encompasses information levels regarding the target polynucleotide, such as sequence comparison, fingerprinting, and the like, as well as the unequivocal identification and ordering of nucleotides in the target polynucleotide. The term also encompasses the identification, ordering and location of one, two or three of the four types of nucleotides within a target polynucleotide. Sequencing methods, as outlined in U.S. Pat. No. 5,302,509, can be performed using the nucleotides described herein. The sequencing method is preferably performed with the target polynucleotide arranged on a solid substrate. The plurality of target polynucleotides may be immobilized to a solid support via a linker molecule, or may be attached to a particle (e.g., microsphere) that may also be attached to a solid substrate. The solid substrate is in the form of a chip, bead, well, capillary, slide, wafer, filter, fiber, porous medium, or column. In embodiments, the solid substrate is gold, quartz, silica, plastic, glass, diamond, silver, metal, or polypropylene. In an embodiment, the solid substrate is porous.

As used herein, the term "sequencing reaction mixture" is used in accordance with its simple and ordinary meaning and refers to an aqueous mixture containing reagents sufficient for dntps or dNTP analogues to add nucleotides to a DNA strand by a DNA polymerase. In an embodiment, the sequencing reaction mixture comprises a buffer. In embodiments, the buffer comprises an acetate buffer, a 3- (N-morpholino) propane sulfonic acid (MOPS) buffer, an N- (2-acetamido) -2-aminoethane sulfonic Acid (ACES) buffer, a Phosphate Buffered Saline (PBS) buffer, a 4- (2-hydroxyethyl) -1-piperazine ethane sulfonic acid (HEPES) buffer, an N- (1, 1-dimethyl-2-hydroxyethyl) -3-amino-2-hydroxypropane sulfonic Acid (AMPSO) buffer, a borate buffer (e.g., borate buffered saline, sodium borate buffer, boric acid buffer), a 2-amino-2-methyl-1, 3-propanediol (AMPD) buffer, an N-cyclohexyl-2-hydroxy-3-aminopropanesulfonic acid (CAPS) buffer, a 2-amino-2-methyl-1-propanol (AMP) buffer, a 4- (cyclohexylamino) -1-butane sulfonic acid (CABS) buffer, a glycine-NaOH buffer, an N-cyclohexyl-2-aminoethane sulfonic acid (CHES) buffer, a Tris (hydroxymethyl) amino-methane (Tris) buffer, or a N-cyclopropane sulfonic acid (CAPS) buffer. In an embodiment, the buffer is a borate buffer. In an embodiment, the buffer is CHES buffer. In an embodiment, the sequencing reaction mixture comprises nucleotides, wherein the nucleotides comprise a reversible terminating moiety and a label covalently linked to the nucleotides through a cleavable linker. In an embodiment, the sequencing reaction mixture comprises a buffer, a DNA polymerase, a detergent (e.g., triton X), a chelating agent (e.g., EDTA), or a salt (e.g., ammonium sulfate, magnesium chloride, sodium chloride, or potassium chloride).

As used herein, the term "sequencing cycle" is used in accordance with its ordinary and customary meaning and refers to the incorporation of one or more nucleotides (e.g., nucleotide analogs) into the 3' end of a polynucleotide with a polymerase and the detection of one or more labels identifying the incorporated one or more nucleotides. Sequencing can be accomplished by, for example, sequencing by synthesis, pyrosequencing, and the like. In embodiments, the sequencing cycle comprises extending the complementary polynucleotide by incorporating a first nucleotide using a polymerase, wherein the polynucleotide hybridizes to the template nucleic acid, thereby detecting the first nucleotide and identifying the first nucleotide. In an embodiment, to begin the sequencing cycle, one or more differently labeled nucleotides and a DNA polymerase may be introduced. After the nucleotides are added, the resulting signal can be detected (e.g., by excitation and emission of a detectable label) to determine the identity of the incorporated nucleotide (based on the label on the nucleotide). Reagents may then be added to remove the 3' reversible terminator and remove the tag from each incorporated base. Reagents, enzymes and other materials can be removed from between steps by washing. Cycling may involve repeating these steps and reading the sequence of each cluster in multiple iterations.

"hybridization" shall mean the attachment of a single stranded nucleic acid sequence (e.g., a primer) to another nucleic acid sequence based on well known principles of sequence complementarity. In one embodiment, the other nucleic acid sequence is a single stranded nucleic acid. The propensity for hybridization between nucleic acid sequences depends on the temperature and ionic strength of their environment, the length of the nucleic acid, and the degree of complementarity. The effect of these parameters on hybridization is described, for example, in Sambrook J, fritsch E.F., maniatisT., molecular cloning: a laboratory Manual (Molecular cloning: a laboratory manual), described in Cold spring harbor laboratory Press (Cold Spring Harbor Laboratory Press, new York) (1989). As used herein, hybridization of a primer or DNA extension product, respectively, can be extended by creating a phosphodiester bond with an available nucleotide or nucleotide analogue capable of forming a phosphodiester bond therewith. For example, hybridization may be performed at a temperature in the range of 15℃to 95 ℃. In some embodiments, hybridization is performed at about 20 ℃, about 25 ℃, about 30 ℃, about 35 ℃, about 40 ℃, about 45 ℃, about 50 ℃, about 55 ℃, about 60 ℃, about 65 ℃, about 70 ℃, about 75 ℃, about 80 ℃, about 85 ℃, about 90 ℃, or about 95 ℃. In other embodiments, the stringency of hybridization can be further altered by adding or removing components of the buffer solution. In some embodiments, nucleic acids or portions thereof configured to hybridize are generally about 80% or more, 81% or more, 82% or more, 83% or more, 84% or more, 85% or more, 86% or more, 87% or more, 88% or more, 89% or more, 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, 99% or more, or 100% complementary to each other over consecutive portions of the nucleic acid sequence. Specific hybridization distinguishes non-specific hybridization interactions (e.g., two nucleic acids that are not configured for specific hybridization, e.g., two nucleic acids that are 80% or less, 70% or less, 60% or less, or 50% or less) by about 2-fold or more, typically about 10-fold or more, and sometimes about 100-fold or more, 1000-fold or more, 10,000-fold or more, 100,000-fold or more, or 1,000,000-fold or more. Two strands of nucleic acid hybridized to each other may form a duplex comprising a double-stranded portion of nucleic acid.

As used herein, the term "extend" or "elongation" is used in accordance with its ordinary and customary meaning and refers to the synthesis of a new polynucleotide strand complementary to a template strand with a polymerase by adding free nucleotides (e.g., dntps) of a reaction mixture complementary to the template in the 5 'to 3' direction. Extension involves condensing the 5 '-phosphate group of the dNTP with the 3' -hydroxyl group at the end of the nascent (elongated) DNA strand.

As used herein, the term "sequencing read" is used in accordance with its plain and ordinary meaning and refers to an inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment. Sequencing techniques produce reads of varying lengths. Sequencing reads can comprise 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, or more nucleotide bases. Reads of 20-40 base pairs (bp) in length are known as ultrashort. Typical sequencers produce reads ranging from 100 to 500bp in length. The length of the reads is a factor that may affect the outcome of the biological study. For example, longer read lengths increase the resolution of de novo genome assembly and structural variant detection. In some embodiments, a sequencing read may comprise 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, or more nucleotide bases.

As used herein, the term "k-mer" is used in accordance with its plain and ordinary meaning and refers to a subsequence of a larger sequence string, wherein each k-mer has a length of k. The algorithm for determining the overlap between sequence data may involve the identification of k-mers between reads. Without being bound by theory, sequences sharing a large number of k-mers may be from the same region of the sequence to be identified, e.g., a genomic sequence. The k value is the length of the matching region and is typically about 10-30 base pairs. These regions can be found quickly using data structures such as suffix trees or hash tables. For two overlapping reads that share a k-mer, the two reads will typically have a low error rate or long enough to compensate for the high error opportunities. However, for sequencing reads with relatively frequent errors, the method can be modified to allow for errors in the k-mers. For example, previously developed algorithms use spacer k-mers with "don't care" positioning to allow substitution and increase the sensitivity of successive k-mers. Algorithms for k-mers with such spacing are described, for example, in Navarro, G. (2001) ACM calculation survey (ACM Computing Surveys) 33:31-88; and Farach-Colton et al, (2007) journal of computer and research science (J. Computerand Sys. Sci.), 73:1035-1044, the disclosures of which are incorporated herein by reference in their entirety for all purposes.

As used herein, "single cell" refers to one cell. Individual cells useful in the methods described herein may be obtained from a tissue of interest, or from a biopsy, blood sample, or cell culture. In addition, cells from a particular organ, tissue, tumor, neoplasm, etc., may be obtained and used in the methods described herein. In general, cells from any population can be used in the method, such as a population of prokaryotic or eukaryotic organisms, including bacteria or yeast.

The term "cellular component" is used in accordance with its ordinary meaning in the art and refers to any organelle, nucleic acid, protein, or analyte found in a prokaryotic, eukaryotic, archaebacterial, or other organism cell type. Examples of cellular components (e.g., components of a cell) include RNA transcripts, proteins, membranes, lipids, and other analytes.

"Gene" refers to a polynucleotide sequence capable of conferring a biological function upon transcription and/or translation. Functionally, the genome is subdivided into genes. Each gene is a nucleic acid sequence encoding an RNA or polypeptide. Genes are transcribed from DNA to RNA, which may be non-coding (ncRNA) with direct function, or intermediate messengers (mRNA) that are subsequently translated into proteins. Typically, a gene comprises a plurality of sequence elements, such as coding elements (i.e., sequences encoding functional proteins), non-coding elements, and regulatory elements. Each element can be as short as a few bp to 5kb. In embodiments, the gene is a protein coding sequence of RNA. Non-limiting examples of genes include developmental genes (e.g., adhesion molecules, cyclin kinase inhibitors, wnt family members, pax family members, winged-helix family members, hox family members, cytokines/lymphokines and their receptors, growth/differentiation factors and their receptors, neurotransmitters and their receptors); oncogenes (e.g., ABL1, BCL2, BCL6, CBFA2, CBL, CSF1R, ERBA, ERBB, EBRB2, ETS1, ETV6, FGR, FOS, FYN, HCR, HRAS, JUN, KRAS, LCK, LYN, MDM2, MLL, MYB, MYC, MYCL1, MYCN, NRAS, PIM1, PML, RET, SRC, TAL1, TCL3, and YES); tumor suppressor genes (e.g., APC, BRCA1, BRCA2, MADH4, MCC, NF1, NF2, RB1, TP53, and WT 1); and enzymes (e.g., ACC synthase and oxidase, ACP desaturase and hydroxylase, ADP-glucose pyrophosphorylase, atpase, alcohol dehydrogenase, amylase, amyloglucosidase, catalase, cellulase, chalcone synthase, chitinase, cyclooxygenase, decarboxylase, dextrinase, DNA and RNA polymerase, galactosidase, glucanase, glucose oxidase, granule-bound starch synthase, gtpase, helicase, hemicellulase, integrase, inulin enzyme, invertase, isomerase, kinase, lactase, lipase, lipoxygenase, lysozyme, nopaline synthase, octopine synthase, pectinase, peroxidase, phosphatase, phospholipase, phosphorylase, phytase, plant growth regulator synthase, polygalacturonase, protease and peptidase, pullulanase, recombinase, reverse transcriptase, RUBISCO, topoisomerase, and xylanase). In embodiments, the gene comprises at least one mutation associated with a disease or condition mediated by a mutated form of the gene.

Provided herein are methods and compositions for analyzing a sample (e.g., sequencing nucleic acids within a sample). Samples (e.g., samples comprising nucleic acids) may be obtained from a suitable subject. The sample may be isolated or obtained directly from the subject or portion thereof. In some embodiments, the sample is obtained indirectly from an individual or medical professional. The sample may be any specimen isolated or obtained from a subject or portion thereof. The sample may be any specimen isolated or obtained from a plurality of subjects. Non-limiting examples of a sample include fluid or tissue from a subject, including but not limited to blood or blood products (e.g., serum, plasma, platelets, buffy coat, etc.), umbilical cord blood, chorionic villus, amniotic fluid, cerebrospinal fluid, spinal fluid, lavage fluid (e.g., lung, stomach, peritoneum, catheter, ear, arthroscope), biopsy samples, laparoscopy samples, cells (blood cells, lymphocytes, placental cells, stem cells, bone marrow-derived cells, embryonic or fetal cells), or portions thereof (e.g., mitochondria, nuclei, extracts, etc.), urine, stool, sputum, saliva, nasal mucosa, prostate fluid, lavage fluid, semen, lymph fluid, bile, tears, sweat, milk, breast milk, etc., or combinations thereof. The fluid or tissue sample from which the nucleic acid is extracted may be cell-free (e.g., cell-free). Non-limiting examples of tissue include organ tissue (e.g., liver, kidney, lung, thymus, adrenal gland, skin, bladder, reproductive organ, intestine, colon, spleen, brain, etc., or portions thereof), epithelial tissue, hair follicles, catheters, tubes, bones, eyes, nose, mouth, throat, ear, nails, etc., portions thereof, or combinations thereof. The sample may comprise normal, healthy, diseased (e.g., infected), and/or cancerous (e.g., cancerous cells) cells or tissue. Samples obtained from a subject may comprise cells or cellular material (e.g., nucleic acids) of a variety of organisms (e.g., viral nucleic acids, fetal nucleic acids, bacterial nucleic acids, parasite nucleic acids).

In some embodiments, the sample comprises a nucleic acid or fragment thereof. The sample may comprise nucleic acids obtained from one or more subjects. In some embodiments, the sample comprises nucleic acid obtained from a single subject. In some embodiments, the sample comprises a mixture of nucleic acids. The mixture of nucleic acids may comprise two or more nucleic acid species having different nucleotide sequences, different fragment lengths, different sources (e.g., genomic sources, cellular or tissue sources, subject sources, etc., or combinations thereof), or combinations thereof. The sample may comprise synthetic nucleic acids.

The subject may be any living or non-living organism, including but not limited to a human, a non-human animal, a plant, a bacterium, a fungus, a virus, or a protozoan. The subject may be of any age (e.g., embryo, fetus, infant, child, adult). The subject may be of any sex (e.g., male, female, or a combination thereof). The subject may have become pregnant. In some embodiments, the subject is a mammal. In some embodiments, the subject is a human subject. The subject may be a patient (e.g., a human patient). In some embodiments, the subject is suspected of having a genetic variation or a disease or condition associated with a genetic variation.

As used herein, the term "consensus sequence" refers to a sequence of nucleotides most common at each position within a nucleic acid sequence of a set of sequences (e.g., a set of sequencing reads) that are shown to be aligned at that position. The consensus sequence is typically "assembled" from shorter sequence reads that overlap at least in part. In the case where two sequences contain overlapping sequence information aligned at one end and non-overlapping sequence information at the opposite end, the consensus sequence formed by the two sequences will be longer than either sequence alone. Alignment of multiple such sequences allows for the assembly of many short sequences into longer consensus sequences representing longer sample polynucleotides. In embodiments, aligned sequences used to generate consensus sequences may contain gaps (e.g., representing nucleotides that are not present in a given read because they extend during the dark cycle and are not identified).

In some embodiments, the nucleic acid (e.g., an adapter, linear nucleic acid molecule, or primer) comprises a molecular identifier or a molecular barcode. As used herein, the term "molecular barcode" (which may be referred to as a "tag," "barcode," "molecular identifier," "identifier sequence," or "unique molecular identifier" (UMI)) refers to any material (e.g., nucleotide sequence, nucleic acid molecular features) that is capable of distinguishing between individual molecules in a large heterogeneous population of molecules. In embodiments, barcodes are unique in a pool of barcodes that differ in sequence from each other or are uniquely associated with a particular sample polynucleotide in a pool of sample polynucleotides. In embodiments, each barcode in the pool of adaptors is unique such that sequencing reads comprising the barcode can be identified as originating from a single sample polynucleotide molecule based solely on the barcode. In other embodiments, a single barcode sequence may be used more than once, but adaptors comprising duplicate barcodes are correlated with different sequences and/or different combinations of barcode adaptors, such that sequence reads may still be uniquely distinguished as originating from a single sample polynucleotide molecule based on the barcodes and adjacent sequence information (e.g., sample polynucleotide sequence and/or one or more adjacent barcodes). In embodiments, the length of the barcode is about or at least about 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, or more nucleotides. In embodiments, the length of the barcode is shorter than 20, 15, 10, 9, 8, 7, 6, or 5 nucleotides. In embodiments, the bar code is about 10 to about 50 nucleotides in length, such as about 15 to about 40 or about 20 to about 30 nucleotides in length. In a pool of different barcodes, the length of the barcodes may be the same or different. Typically, barcodes are of sufficient length and comprise sequences that are sufficiently different to allow identification of sequencing reads derived from the same sample polynucleotide molecule. In embodiments, each barcode of the plurality of barcodes differs from each other barcode of the plurality of barcodes in at least three nucleotide positions, such as at least 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotide positions. In some embodiments, a substantially degenerate bar code may be referred to as random. In some embodiments, the barcode may comprise a nucleic acid sequence from a pool of known sequences. In some embodiments, the bar code may be predefined.

In embodiments, the nucleic acid (e.g., an adapter, linear nucleic acid molecule, or primer) comprises a sample barcode. Typically, a "sample barcode" is a nucleotide sequence sufficiently different from other sample barcodes to allow identification of the source of the sample based on the sample barcode sequence associated therewith. In embodiments, multiple nucleotides (e.g., all nucleotides from a particular sample source or sub-sample thereof) are joined to a first sample barcode, while different multiple nucleotides (e.g., all nucleotides from a different sample source or different sub-sample) are joined to a second sample barcode, thereby correlating each multiple polynucleotide with a different sample barcode indicative of a sample source. In embodiments, each sample barcode of the plurality of sample barcodes differs from each other sample barcode of the plurality of sample barcodes in at least three nucleotide positions, such as at least 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotide positions. In some embodiments, a substantially degenerate sample barcode may be referred to as random. In some embodiments, the sample barcode may comprise nucleic acid sequences from a pool of known sequences. In some embodiments, the sample bar code may be predefined. In an embodiment, the sample barcode comprises about 1 to about 10 nucleotides. In embodiments, the sample barcode comprises about 3, 4, 5, 6, 7, 8, 9, or about 10 nucleotides. In an embodiment, the sample barcode comprises about 3 nucleotides. In an embodiment, the sample barcode comprises about 5 nucleotides. In an embodiment, the sample barcode comprises about 7 nucleotides. In an embodiment, the sample barcode comprises about 10 nucleotides. In an embodiment, the sample barcode comprises about 6 to about 10 nucleotides.

Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or non-stated intervening value in that stated or smaller range is encompassed within the invention. The upper and lower limits of any such smaller ranges (in the more widely enumerated ranges) may independently be included in the smaller ranges, or may be specified in the specific value itself, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

The term "kit" is used in accordance with its ordinary meaning and refers to any delivery system for delivering materials or reagents for practicing the methods of the invention. Such delivery systems include systems that allow for storage, transport, or delivery of the reactive agent (e.g., nucleotides, enzymes, nucleic acid templates, etc. in an appropriate container) and/or support material (e.g., buffers for conducting the reaction, written instructions, etc.) from one location to another. For example, the kit comprises one or more housings (e.g., cassettes) containing the relevant reagents and/or support materials. Such inclusions may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme and a second container contains a nucleotide. In embodiments, the kit comprises a vessel containing one or more enzymes, primers, adaptors, or other reagents as described herein. The vessel may comprise any structure capable of supporting or containing a liquid or solid material, and may comprise a tube, vial, canister, container, tip, or the like. In an embodiment, the wall of the vessel may allow light to be transmitted through the wall. In an embodiment, the vessel may be optically transparent. The kit may comprise enzymes and/or nucleotides in a buffer.

The methods and kits of the present disclosure can be applied, mutatis mutandis, to RNA sequencing or for determining the identity of ribonucleotides.

An aqueous solution herein refers to a liquid comprising at least 20vol% water. In embodiments, the aqueous solution comprises at least 50vol%, such as at least 75vol%, at least 95vol%, greater than 98vol%, or 100vol% water as the continuous phase.

The term "nucleic acid sequencing device" or the like means an integrated system having one or more chambers, ports and channels that are interconnected and in fluid communication and designed for performing an analytical reaction or process, such as sample introduction, fluid and/or reagent driven devices, temperature control, detection systems, data collection and/or integrated systems, alone or in cooperation with an instrument or instrument providing support functions, for determining the nucleic acid sequence of a template polynucleotide. The nucleic acid sequencing device may further comprise special functional coatings on valves, pumps and internal walls. The nucleic acid sequencing device may comprise a receiving unit or platen that orients the flow cell such that the maximum surface area of the flow cell is available for exposure to the optical lens. Other nucleic acid sequencing devices include those provided by: illumina (tm) limited (e.g., hiSeqTM, miSeqTM, nextSeqTM or NovaSeqTM system), life TechnologiesTM (e.g., abiprism or solid system), pacific biosciences (e.g., systems using SMRTTM technology, such as the sequenl (tm) or RS IITM system), or Qiagen (Qiagen) (e.g., genereaderTM system).

"disease" or "condition" or "disease state" refers to any abnormal organism or condition of a cell, tissue or organism. A disease may refer to a survival state or health condition of a patient or subject. In some embodiments, the disease is a disease associated with (e.g., caused by) an activated or overactive kinase or abnormal kinase activity. The disease state may be the result of, inter alia, environmental pathogens, such as viral infections (e.g., HIV/AIDS, hepatitis b, hepatitis c, influenza, measles, etc.), bacterial infections, parasitic infections, fungal infections, or infections of some other organism. The disease state may also be the result of some other environmental factor, such as a chemical toxin or a chemical carcinogen. As used herein, a disease state further comprises a genetic disorder in which one or more copies of a gene are altered or disrupted, thereby affecting its biological function. Exemplary genetic diseases include, but are not limited to polycystic kidney disease, familial multiple endocrine tumor type I, neurofibromatosis, tay-Sachs disease (Huntington's disease), sickle cell anemia, thalassemia and Down's syndrome (Down's syndrome), among others (see, e.g., metabolic and molecular basis of genetic diseases (The Metabolic andMolecular Bases of Inherited Diseases), 7 th edition, mcGraw-Hill inc (New York)). Other exemplary diseases include, but are not limited to, cancer, hypertension, alzheimer's disease, neurodegenerative diseases, and neuropsychiatric disorders such as bipolar disorder or paranoid schizophrenia. The disease state is monitored to determine the level or severity (e.g., stage or progression) of one or more disease states in the subject, and more particularly, to detect a change in a biological state of the subject associated with the one or more disease states (see, e.g., U.S. patent No. 6,218,122, which is incorporated herein by reference in its entirety). In embodiments, the methods provided herein are also applicable to monitoring a disease state of a subject undergoing one or more therapies. Thus, in some embodiments, the present disclosure also provides methods for determining or monitoring the efficacy of one or more therapies on a subject (i.e., determining the level of therapeutic effect). In embodiments, the methods of the present disclosure may be used to assess treatment efficacy in clinical trials, e.g., as early surrogate markers of success or failure in such clinical trials. Within eukaryotic cells, there are hundreds to thousands of interconnected signaling pathways. Thus, perturbation of intracellular protein function has many effects on transcription of other proteins and other genes linked by primary, secondary, and sometimes tertiary pathways. This wide interconnection between the functions of the various proteins means that changes in any one protein may result in compensatory changes in a large number of other proteins. In particular, partial disruption of even a single protein within a cell, such as by exposure to a drug or by a disease state that regulates gene copy number (e.g., genetic mutation), results in sufficiently many other characteristic compensatory changes in gene transcription that can be used to define "signatures" of specific transcript alterations associated with functional disruption, e.g., specific disease states or therapies, even at stages where changes in protein activity are undetectable.

As used herein, the term "neurodegenerative disease" refers to a disease or condition in which the function of the subject's nervous system is impaired. Examples of neurodegenerative diseases that can be detected by the methods described herein include Alexander's disease, alzheimer's disease (Alzheimer's disease)eimer's disease), amyotrophic lateral sclerosis, ataxia telangiectasia, bat disease (Batten disease) (also known as s Pi Ermei Ier-woget-Sjogren-Batten disease), bovine Spongiform Encephalopathy (BSE), kanvan disease (Canavan disease), kechen syndrome (Cockayne syndrome), corticobasal degeneration, creutzfeldt-Jakob disease (Creutzfeldt-Jakob disease), frontotemporal dementia, grave Shi Xiesan syndrome (Gerstmann-17--Scheinker syndrome), huntington's disease, HIV-associated dementia, kennedy's disease, krabbe's disease, kuru, lewy body dementia (Lewy body dementia), mahado-Joseph disease (Machado-Joseph disease) (spinocerebellar ataxia type 3), multiple sclerosis, multiple system atrophy, narcolepsy, neurophobia, parkinson's disease, peter's barch disease (petizaeus-Merzbacher Disease), pick's disease, primary lateral sclerosis, prion disease, lei Fusu ms disease (Refsum's disease), sandhoff's disease, sxie's disease, spinocerebirth's disease, spinocerebirt-spinocerebirt, or spinocerebirt-amy-amyotrophic lateral degeneration (stethod-37) of the type 23, or spinocerebirth-amyotrophic lateral degeneration (stethosis).

As used herein, the term "autoimmune disease" refers to a disease or condition in which the subject's immune system responds abnormally to one or more components (e.g., biomolecules, proteins, cells, tissues, organs, etc.) of the subject. In some embodiments, the autoimmune disease is a condition in which the subject's immune system responds abnormally to one or more components of the subject as if the components were not themselves. Examples of exemplary autoimmune diseases that can be detected using the methods provided herein include Acute Disseminated Encephalomyelitis (ADEM), acute necrotizing hemorrhagic leukoencephalitis, addison's disease, low blood gammaglobulin, asthma, allergic rhinitis, alopecia areata, amyloidosis, ankylosing spondylitis, anti-GBM/anti-TBM nephritis, anti-phospholipid syndrome (APS), arthritis, autoimmune aplastic anemia, autoimmune familial autonomic nerve abnormality (Autoimmune dysautonomia), autoimmune hepatitis, autoimmune hyperlipidemia, autoimmune immunodeficiency, autoimmune Inner Ear Disease (AIED), autoimmune myocarditis, autoimmune pancreatitis autoimmune retinopathy, autoimmune Thrombocytopenic Purpura (ATP), autoimmune thyroid disease, axonal or neuronal neuropathy (Axonalor neuronal neuropathies), ballosis disease (Balo disease), behcet's disease, bullous pemphigoid, cardiomyopathy, giant lymph node hyperplasia, celiac disease, chagas disease (Chagas disease), chronic Inflammatory Demyelinating Polyneuropathy (CIDP), chronic Recurrent Multifocal Osteomyelitis (CRMO), chager-Schttus syndrome (Churg-Strausslenderstyle), cicatricial pemphigoid/benign mucosal pemphigoid, crohn's disease, ke Genshi syndrome (Cogans syndom), condensation of concentrated disease (Cold agglutinindisease), congenital heart block, chronic demyelinating polyneuropathy (CIDP), ke Saiji viral myocarditis, CREST disease, mixed condensed globulinemia (Essential mixed cryoglobulinemia), demyelinating neuropathy, dermatitis herpetiformis, dermatomyositis, devic's disease (neuromyelitis optica), discoid lupus erythematosus, deretsler's syndrome (Dressler's syndrome), endometriosis, eosinophilic fasciitis, erythema nodosum, experimental allergic encephalomyelitis, ewens syndrome (Evanslyndrome), fibroalveolitis, giant cell arteritis (temporal arteritis), glomerulonephritis, goodpasture's syndrome (Graves ' disease), gravey ' ophtalmopathy, gravey's eye disease (Grave's) Gravey's hand syndrome, green-Barlish syndrome, behcet's encephalitis, gravee's disease Hashimoto thyroiditis, hemolytic anemia, kennock-Lin Zidian (Henoch-Schonlein purpura), herpes gestation, hypogammaglobulinemia, ichthyosis, idiopathic Thrombocytopenic Purpura (ITP), igA nephropathy, igG 4-related sclerosing diseases, immunoregulatory lipoproteins, inclusion body myositis, inflammatory bowel disease, insulin dependent diabetes mellitus (type 1), interstitial cystitis, juvenile arthritis, juvenile diabetes, kawasaki syndrome (Kawasaki syndrome), lambert-Eatone's syndrome (Lambert-Eatonsyndrome), white cell disruption vasculitis, lichen planus, lichen sclerosus, wood-like conjunctivitis, linear IgA disease (LAD), lupus (SLE), lyme disease (Lyme disease), chronic disease, meniere's disease, microscopic polyangiitis, mixed tienchymosis (MCTD), mo Lunshi ulcers (Mooren's ulcer), muha-beemann disease (Mucha-haemanndisease), multiple sclerosis, myasthenia gravis, myositis, narcolepsy, neuromyelitis optica (Devic's), neutropenia, ocular scarring pemphigoid, optic neuritis, recurrent rheumatism, PANDAS (childhood autoimmune neuropsychiatric conditions associated with streptococci), paraneoplastic cerebellar degeneration, paroxysmal sleep hemoglobinuria (PNH), paro's syndrome (Parry Romberg syndrome), pal Sang Nage-tescens syndrome (Parsonna-Turnersyndrome), ciliary body flatitis (Pars) peripheral uveitis, pemphigus, peripheral neuropathy, PANDAS venous encephalomyelitis (Perivenous encephalomyelitis), pernicious anemia, POEMS syndrome, polyarteritis nodosa, autoimmune polyadenylic syndrome type I, type II and type III, polymyositis rheumatica, polymyositis, post myocardial infarction syndrome, post pericardial osteotomy syndrome (Postpericardiotomy syndrome), autoimmune progesterone dermatitis, primary biliary cirrhosis, primary sclerosing cholangitis, psoriasis, psoriatic arthritis, idiopathic pulmonary fibrosis, pyoderma gangrene, pure red cell aplasia, raynaud's phenomenon (Raynauds phenomenon), reflex sympathetic dystrophia, lyter's syndrome (Reiter's syndrome), recurrent polyadenylic chondritis, polymorphous leg syndrome, post peritoneal fibrosis, rheumatic fever, rheumatoid arthritis, sarcoidosis, schmidt syndrome (Schmidt syndrome), scleritis, scleroderma, sjogren's syndrome (Sjogren's ssyndrome), sperm and testis autoimmunity, stiff person syndrome, subacute Bacterial Endocarditis (SBE), su Saike syndrome (Susac's syndrome), sympathogenic ophthalmitis, large arteritis (Takayasu's) temporal arteritis/giant cell arteritis, thrombocytopenic purpura (TTP), tolosha-hunter syndrome (Tolosa-Hunt syndrome), transverse myelitis, ulcerative colitis, undifferentiated Connective Tissue Disease (UCTD), uveitis, vasculitis, vesicular dermatosis, vitiligo or Wegener's granulomatosis (Wegener's).

Primary immunodeficiency disease (PIDD) comprises a rare genetic condition that impairs the immune system. Without a functional immune response, people with PID may be chronically debilitating infected, such as Epstein-Barr virus (EBV), which increases the risk of developing cancer. Non-limiting examples of primary immunodeficiency disorders include autoimmune lymphoproliferative syndrome (ALPS), APS-1 (apec), BENTA disease, caspase Eight Deficiency Status (CEDS), CARD9 deficiency and other candidiasis susceptibility syndrome, chronic Granulomatosis (CGD), common Variable Immunodeficiency (CVID), congenital neutropenia syndrome, CTLA4 deficiency, DOCK8 deficiency, GATA2 deficiency, glycosylation disorders with immunodeficiency, high Immunoglobulin E Syndrome (HIES), high immunoglobulin M syndrome, interferon gamma, interleukin 12 and interleukin 23 deficiency, leukocyte Adhesion Deficiency (LAD), LRBA deficiency, PI3 kinase disease, PLCG2 related antibody deficiency and immune disorders (PLAID), severe Combined Immunodeficiency (SCID), STAT3 dominant negative disease, STAT3 function acquired disease, warts, hypopropylemia, infection and granulocytopenia (wh) syndrome, wister-alder syndrome, wilt-aldrich, and lymphoproliferative disorder (xak-X), and non-linked lymphosis.

As used herein, the term "cardiovascular disease" refers to a disease or condition that affects the heart or blood vessels. In embodiments, the cardiovascular disease comprises a disease caused by or exacerbated by atherosclerosis. Exemplary cardiovascular diseases that can be detected using the methods provided herein include alcoholic cardiomyopathy, coronary artery disease, congenital heart disease, arrhythmogenic right ventricular cardiomyopathy, restrictive cardiomyopathy, non-obstructive cardiomyopathy, diabetes mellitus, hypertension, hyperhomocysteinemia, hypercholesterolemia, atherosclerosis, ischemic heart disease, heart failure, pulmonary heart disease, hypertensive heart disease, left ventricular hypertrophy, coronary heart disease, (congestive) heart failure, hypertensive cardiomyopathy, cardiac arrhythmias, inflammatory heart disease, endocarditis, inflammatory cardiac hypertrophy, myocarditis, valvular heart disease, stroke, or myocardial infarction. In embodiments, the disease is a cardiovascular disease associated with gene fusion. Whole genome association (GWA) studies reveal many potential disease modifying gene fusion events; see, e.g., paone et al, front of cardiovascular medicine (front. Cardioview. Med.), 6.01, 2018, which is incorporated herein by reference.

As used herein, the term "cancer" refers to all types of cancers, neoplasms, or malignant tumors found in mammals, including leukemia, carcinoma, and sarcoma. Exemplary cancers that can be detected using the methods provided herein include thyroid cancer, endocrine system cancer, brain cancer, breast cancer, cervical cancer, colon cancer, head and neck cancer, liver cancer, kidney cancer, lung cancer, non-small cell lung cancer, melanoma, mesothelioma, ovarian cancer, pancreatic cancer, sarcoma, gastric cancer, uterine cancer, or medulloblastoma. Further examples include Hodgkin's Disease, non-Hodgkin's Lymphoma, multiple myeloma, neuroblastoma, glioma, glioblastoma multiforme, ovarian cancer, rhabdomyosarcoma, primary thrombocythemia, primary macroglobulinemia, primary brain tumor, malignant pancreatic cancer, malignant carcinoid, bladder cancer, precancerous skin lesions, testicular cancer, lymphoma, thyroid cancer, neuroblastoma, esophageal cancer, genitourinary tract cancer, malignant hypercalcemia, endometrial cancer, adrenal cortex cancer, endocrine or exocrine pancreatic neoplasm, medullary thyroid cancer, melanoma, colorectal cancer, papillary thyroid cancer, hepatocellular carcinoma or prostate cancer.

The term "leukemia" refers to a progressive, malignant disease of the blood-forming organs and is generally characterized by the deregulated proliferation and development of leukocytes and their precursors in the blood and bone marrow. Leukemia is generally classified clinically based on the following: (1) duration and nature of acute or chronic disease; (2) the type of cell involved; medullary (myelogenous), lymphoid (lymphoid) or monocytic; and (3) an increase or non-increase in the number of abnormal cells in the blood-leukemic or non-leukemic (sub-leukemic) cell line. Exemplary leukemias that can be detected using the methods provided herein include, for example, acute non-lymphoblastic leukemia, chronic lymphocytic leukemia, acute myelogenous leukemia, chronic myelogenous leukemia, acute promyelocytic leukemia, adult T-cell leukemia, non-leukemia (aleukemic leukemia), leukemia (a leukocythemic leukemia), basophilic leukemia, blast leukemia, bovine leukemia, chronic myelogenous leukemia, skin leukemia, embryogenic leukemia, eosinophilic leukemia, gross's leukemia, hairy cell leukemia, hematoblast leukemia (hemoblastic leukemia), hematoblast leukemia (hemocytoblastic leukemia), histiocytic leukemia, stem cell leukemia, acute monocytic leukemia leukopenia, lymphoblastic leukemia (lymphogenous leukemia), lymphoid leukemia, lymphosarcoma cell leukemia, mast cell leukemia, megakaryoblastic leukemia, micro myeloblastic leukemia (micromyeloblastic leukemia), monocytic leukemia, myeloblastic leukemia, myelogenous leukemia, myelomonocytic leukemia, internal Grignard leukemia (Naegeli leukemia), plasma cell leukemia, multiple myeloma, plasma cell leukemia, promyelocytic leukemia, reed's cell leukemia (Rieder cell leukemia), schilin's leukemia (Schiling's leukemia), stem cell leukemia, leukemia, sub-leukemic leukemia or undifferentiated cell leukemia.

The term "sarcoma" generally refers to a tumor that consists of a substance similar to embryonic connective tissue and is generally composed of tightly packed cells embedded in a fibrillar or homogeneous substance. Sarcomas which can be detected by the methods provided herein include chondrosarcoma, fibrosarcoma, lymphosarcoma, melanoma, myxosarcoma, osteosarcoma, abbe's sarcoma (Abemethyl's sarcoma), liposarcoma, acinoid soft tissue sarcoma, ameloblastic sarcoma, glucagonoma, green carcinoma sarcoma, choriocarcinoma, embryonal sarcoma, wilms 'tumor sarcoma (Wilms' tur sarcoma), endometrial sarcoma, interstitial sarcoma, ewing's sarcoma (Ewing's sarcoma), fascia sarcoma, fibroblast sarcoma, giant cell sarcoma, granulocytosarcoma, hodgkin's sarcoma, idiopathic multiple pigmentation hemorrhagic sarcoma, B cell immunoblastic sarcoma, lymphomas, T cell immunoblastic sarcoma, zhan Enxun's sarcoma (Jensen's sarcomas), kaposi's sarcomas), propox (Kupffer cell sarcoma), vascular sarcoma, leukemia sarcoma, kaposi's sarcomas, reticuloma, capillary sarcoma, emotion sarcoma (EW) or hemangiosarcoma (Edrum's), hemangiosarcoma (35, equipped sarcoma, or hemangiosarcoma

The term "melanoma" means a tumor caused by the melanocyte system of the skin and other organs. Melanoma that can be detected using the methods provided herein include, for example, acrofreckle nevus melanoma, melanotic melanoma, benign young melanoma, claudeman' S melanoma, S91 melanoma, harding-Passey melanoma (hardding-Passey melanoma), juvenile melanoma, malignant lentigo, malignant melanoma, nodular melanoma, subungual melanoma, or superficial diffuse melanoma.

The term "cancer" refers to a malignant new growth consisting of epithelial cells that tend to infiltrate the surrounding tissue and cause metastasis. Exemplary carcinomas that can be detected using the methods provided herein include, for example, medullary thyroid carcinoma, familial medullary thyroid carcinoma, acinar carcinoma, adenoid cystic carcinoma (adenocystic carcinoma), adenoid cystic carcinoma (adenoid cystic carcinoma), adenocarcinoma, adrenocortical carcinoma, alveolar cell carcinoma, basal-like carcinoma, basal squamous cell carcinoma, bronchioloalveolar carcinoma, bronchiolar carcinoma, brain carcinoma, cholangiocellular carcinoma, choriocarcinoma, mucinous carcinoma, acne carcinoma, uterine body carcinoma, ethmoid carcinoma, armor carcinoma, skin carcinoma, columnar cell carcinoma, ductal carcinoma, hard carcinoma, embryo carcinoma, medullary carcinoma, epidermoid carcinoma, adenoid epithelial cell carcinoma, explanted carcinoma, ulcerative carcinoma (carcinoma ex ulcere) fibrocarcinoma, mucilaginous carcinoma (gelatiniforni carcinoma), gelatinous carcinoma, giant cell carcinoma, adenocarcinoma, granulosa cell carcinoma, hair matrix carcinoma (hematoid carcinoma), blood sample carcinoma hepatocellular carcinoma, greetings cell adenocarcinoma (Hurthle cell carcinoma), vitreous carcinoma (hyaline cancer), adrenoid carcinoma, naive embryonal carcinoma, carcinoma in situ, epidermoid carcinoma, intraepithelial carcinoma gram Long Paqie mole's cancer (Krompcher's cancer), cookigitz cell carcinoma (Kulchitzky-cell cancer), large cell carcinoma, bean-like carcinoma (lenticular carcinoma), bean-like carcinoma (carcinoma lenticulare), lipoma-like carcinoma, lymphoepithelial carcinoma, medullary carcinoma, melanin carcinoma, soft carcinoma, mucous carcinoma, mucinous carcinoma, mucous cell carcinoma, mucous epidermoid carcinoma, mucinous carcinoma, myxoid carcinoma, myxoma-like carcinoma, and myxoma-like carcinoma, nasopharyngeal carcinoma, oat cell carcinoma, ossifiable carcinoma (carcinomassias), bone-like carcinoma (osteoid carcinoma), papillary carcinoma, periportal carcinoma, premalignant carcinoma, acanthocellular carcinoma, mushy-paste carcinoma (pultaceous carcinoma), renal cell carcinoma, reserve cell carcinoma, sarcoid carcinoma, schneider carcinoma (schneiderian carcinoma), hard carcinoma, scrotum carcinoma, ring cell carcinoma, simple carcinoma, small cell carcinoma, potato carcinoma, globular cell carcinoma, spindle cell carcinoma, medullary carcinoma (carcinomasphingaosum), squamous carcinoma, squamous cell carcinoma, string carcinoma (string carcinoma), vasodilatory carcinoma (carcinoma telangiectaticum), vasodilatory carcinoma (carcinoma telangiectodes), transitional cell carcinoma, nodular skin carcinoma (carcinoma tuberosum), nodular skin carcinoma (turberscarcinoma), wart or villous carcinoma.

As used herein, the term "abnormal" refers to a difference from normal. When used to describe enzymatic activity, abnormal refers to activity that is greater or less than the average activity of a normal control or normal non-diseased control sample. Abnormal activity may refer to the amount of activity that causes a disease, wherein returning the abnormal activity to a normal or non-disease related amount (e.g., by administering a compound) results in a decrease in the disease or one or more symptoms of the disease.

"blocking element" refers to an agent (e.g., polynucleotide, protein, nucleotide) that reduces and/or inhibits nucleotide incorporation (i.e., extension of a primer) relative to the absence of the blocking element. In embodiments, the blocking element is a non-extendable oligomer (e.g., a 3' -blocked oligonucleotide). The blocking element on a nucleotide may be reversible, whereby the blocking moiety may be removed or modified to allow the 3 'hydroxyl group to form a covalent bond with the 5' phosphate of another nucleotide. For example, a reversible terminator may refer to a blocking moiety located, for example, at the 3' position of a nucleotide, and may be a chemically cleavable moiety, such as allyl, azidomethyl, or methoxymethyl. In embodiments, the blocking moiety is irreversible (e.g., a blocking element comprising the blocking moiety irreversibly prevents extension). In embodiments, the blocking element comprises an oligonucleotide with a 3' dideoxynucleotide or similar modification to prevent polymerase extension and is used in conjunction with a non-strand displacement polymerase. In another example embodiment, the blocking element comprises one or more modified nucleotides comprising a cleavable linker (e.g., linked to a 5', 3', or nucleobase) comprising PEG, thereby blocking extension. In another example embodiment, the blocking element comprises one or more modified nucleotides that are linked to biotin to which a protein (e.g., streptavidin) can bind, thereby blocking polymerase extension. In another exemplary embodiment, the blocking element comprises modified nucleotides that are complementary to each other, such as iso dGTP or iso dCTP. In a polymerization reaction lacking the appropriate complementary modified nucleotide, the extension of the primer is stopped. In another exemplary embodiment, the blocking element comprises one or more sequences that are recognized and bound by one or more single-stranded DNA binding proteins, thereby blocking polymerase extension at the binding site. In another exemplary embodiment, the blocking element comprises one or more sequences that are recognized and bound by one or more short RNA or PNA oligonucleotides, thereby blocking the extension of DNA polymerase that is incapable of strand displacement RNA or PNA.

The term "clonotype" is used in accordance with its ordinary meaning in the art and refers to a recombinant nucleic acid encoding an immune receptor or a portion thereof. For example, clonotype refers to a recombinant nucleic acid that is typically extracted from a T cell or B cell, but it may also be derived from a cell-free source encoding a T Cell Receptor (TCR) or B Cell Receptor (BCR) or a portion thereof. In embodiments, the clonotype may encode all or a portion of a VDJ rearrangement of IgH, a DJ rearrangement of IgH, a VJ rearrangement of IgK, a VJ rearrangement of IgL, a VDJ rearrangement of tcrβ, a DJ rearrangement of tcrβ, a VJ rearrangement of tcrα, a VJ rearrangement of tcrγ, a VDJ rearrangement of tcrδ, a VD rearrangement of tcrδ, a Kde-V rearrangement, and the like. Clonotypes may also encode translocation breakpoint regions that involve immunoreceptor genes, such as Bcl1-JH or Bc12-JH. On the one hand, clonotypes have sequences long enough to represent or reflect the diversity of the immune molecules from which they are derived, and therefore, the length of a clonotype may vary greatly. In some embodiments, the clonotypes range in length from 25 to 400 nucleotides; in other embodiments, the clonotypes range in length from 25 to 200 nucleotides.

It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.

II method

In one aspect, a method of detecting a genetic feature in one or more nucleic acid molecules is provided, the method comprising: a) Providing one or more linear nucleic acid molecules; b) Circularizing one or more linear nucleic acid molecules to form a circular template polynucleotide comprising a continuous strand lacking free 5 'and 3' ends, and amplifying the one or more circular template polynucleotides to produce a plurality of amplified products; c) Sequencing the plurality of amplification products to produce a plurality of sequencing reads; d) Identifying the presence or absence of a genetic feature in a nucleic acid molecule by analyzing the plurality of sequencing reads (e.g., analyzing the plurality of sequencing reads relative to a control or reference); and e) detecting a genetic feature in one or more nucleic acid molecules when the presence of the genetic feature is identified in the plurality of sequencing reads, wherein the genetic feature comprises an intrachromosomal rearrangement or a gene fusion. In embodiments, the genetic trait is clonotype. In embodiments, the genetic feature is a polynucleotide fusion (e.g., fusion gene).

In one aspect, a method is provided for detecting a polynucleotide fusion comprising a sequence of a first region fused to a sequence of a second region at a fusion junction. In an embodiment, the method comprises: (a) Circularizing one or more linear nucleic acid molecules to form a circular template polynucleotide comprising a continuous strand lacking free 5 'and 3' ends; (b) Amplifying a circular template polynucleotide comprising a fusion junction in an amplification reaction comprising a first primer, a second primer, a blocking element, and a polymerase to produce a fusion amplification product; and (c) detecting the fusion amplification product, thereby detecting polynucleotide fusion. In an embodiment, the method comprises: (a) Circularizing one or more linear nucleic acid molecules to form a circular template polynucleotide comprising a continuous strand lacking free 5 'and 3' ends; (b) Amplifying a circular template polynucleotide comprising a fusion junction in an amplification reaction comprising a first primer, a second primer, a blocking element, and a polymerase to produce a fusion amplification product, wherein: (i) The first region comprises a first strand comprising, from 5 'to 3', a sequence that specifically binds to the blocking element, a sequence that specifically hybridizes to the first primer, and a sequence that is complementary to the sequence that specifically hybridizes to the second primer; (ii) The fusion junction is located between the sequence that specifically binds to the blocking element and the sequence that specifically hybridizes to the first primer; (iii) The blocking element inhibits extension of the polymerase along the sequence to which it binds; and (iv) the circular template polynucleotide comprising the fusion junction does not comprise a sequence or complement thereof that specifically binds to the blocking element; and (c) detecting the fusion amplification product, thereby detecting polynucleotide fusion.

In another aspect, a method of differentially amplifying a polynucleotide comprising a fusion gene relative to a polynucleotide not comprising the fusion gene is provided. In an embodiment, the method comprises: i) Circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises a fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise a fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides; ii) binding a blocking element to the one or more non-fused circular template polynucleotides; and iii) hybridizing a first primer and a second primer to the one or more non-fusion circular template polynucleotides and the one or more fusion circular template polynucleotides and extending with a polymerase to produce a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product, wherein the first amount is detectably less than the second amount; thereby differentially amplifying polynucleotides comprising fusion genes (e.g., fusion genes comprising fusion junctions). In embodiments, the circular template polynucleotide comprises a continuous strand lacking free 5 'and 3' ends. In an embodiment, the first quantity is a quantity or quantity. In an embodiment, the second number is a quantity or number. In an embodiment, the first number is a plurality. In an embodiment, the second number is a plurality of

In one aspect, a method of differentially amplifying a polynucleotide comprising a fusion gene relative to a polynucleotide not comprising the fusion gene is provided. In an embodiment, the method comprises: i) Binding a blocking element to one or more non-fusion circular template polynucleotides; and ii) hybridizing the first and second primers to the one or more non-fusion circular template polynucleotides; iii) Hybridizing the first primer and the second primer to one or more fusion circular template polynucleotides; and iv) extending with a polymerase to produce a first amount of non-fused polynucleotide amplification product and a second amount of fused polynucleotide amplification product, wherein the first amount is detectably less than the second amount; thereby differentially amplifying polynucleotides comprising fusion genes (e.g., fusion genes comprising fusion junctions). In embodiments, the circular template polynucleotide comprises a continuous strand lacking free 5 'and 3' ends. In embodiments, prior to step i) (i.e., binding blocking element), the method further comprises circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises a fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise a fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides.

In one aspect, there is provided a method of amplifying a polynucleotide comprising a fusion gene, the method comprising: i) Binding a blocking element to a non-fusion circular template polynucleotide, wherein the non-fusion circular template does not comprise a fusion gene; ii) hybridizing a first primer and a second primer to the non-fusion circular template polynucleotide; and hybridizing the first primer and the second primer to a fusion circular template polynucleotide, wherein the fusion circular template polynucleotide comprises a fusion gene; and iii) extending the first primer and the second primer with a non-strand displacement polymerase to produce a fusion polynucleotide amplification product.

In another aspect, a method of amplifying a plurality of polynucleotides is provided, the method comprising: circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises a target sequence (e.g., a sequence of interest, such as a gene, SNV, CNV, indel, or fusion gene); binding the blocking element to one or more circular template polynucleotides that do not contain the target sequence; and hybridizing the first primer and the second primer to the circular template polynucleotide and extending with a polymerase amplification product, wherein the amount of amplification product comprising the target sequence is greater than the amount of amplification product not comprising the target sequence. In embodiments, the target sequences comprise cancer somatic mutations, copy number variations, and gene fusions, including those involving novel partners or breakpoints.

In yet another aspect, a method of amplifying a polynucleotide comprising an unknown sequence is provided. In an embodiment, the method comprises: contacting a plurality of circular nucleic acid molecules with a plurality of blocking elements, wherein one or more of the circular nucleic acid molecules comprises an unknown sequence and one or more of the circular nucleic acid molecules comprises a known sequence, and wherein a blocking element binds to a known sequence; contacting the plurality of circular nucleic acid molecules with a plurality of first primers and a plurality of second primers; and extending the first primer and the second primer to produce a plurality of amplification products comprising known and unknown sequences, wherein a greater amount of amplification product comprising the unknown sequence is produced relative to the amplification product comprising the known sequence. In embodiments, the method further comprises detecting (e.g., sequencing) an amplification product comprising an unknown sequence.

In one aspect, a method of differentially amplifying a polynucleotide comprising a first fusion gene relative to a polynucleotide comprising a second fusion gene is provided. In an embodiment, the method comprises: i) Circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises a first fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules comprises a second fusion gene, thereby forming one or more second fusion gene circular template polynucleotides; ii) binding a blocking element to the one or more second fusion gene circular template polynucleotides; and iii) hybridizing the first and second primers to the one or more second fusion gene circular template polynucleotides and the one or more fusion circular template polynucleotides and extending with a polymerase to produce a first amount of second fusion gene polynucleotide amplification product and a second amount of fusion polynucleotide amplification product, wherein the first amount is detectably less than the second amount; thereby differentially amplifying the polynucleotide comprising the first fusion gene. In embodiments, the circular template polynucleotide comprises a continuous strand lacking free 5 'and 3' ends.

In one aspect, a method is provided for identifying the frequency of convergence of a subject's immune repertoire (e.g., for predicting a clinical response of a subject to a therapy by identifying the frequency of convergence of a subject's immune repertoire prior to receiving the therapy). In an embodiment, the method further comprises: a) Obtaining a sample from a subject, the sample comprising one or more linear nucleic acid molecules comprising an immunoreceptor sequence (e.g., a T Cell Receptor (TCR), B cell receptor (BCR, or Ab) target); b) Circularizing one or more linear nucleic acid molecules to form circular template polynucleotides comprising contiguous strands lacking free 5 'and 3' ends, and amplifying the one or more circular template polynucleotides to produce a plurality of amplified products comprising an immunoacceptor sequence; c) Sequencing the plurality of amplification products to produce a plurality of sequencing reads; d) Identifying immune receptor clones by analyzing the plurality of sequencing reads; and e) detecting a converged immune receptor clone in the immune receptor clone, wherein the converged immune receptor clone has a similar or identical amino acid sequence and a different nucleotide sequence. In embodiments, the method comprises hybridizing a blocking element to the one or more circular template polynucleotides prior to amplification. In embodiments, the method does not comprise hybridizing a blocking element to the one or more circular template polynucleotides. In embodiments, the method further comprises determining the frequency of convergent immune receptor clones in the sample. In embodiments, the method further comprises treating the subject with immunotherapy when the frequency of the converged immune receptor clone in the sample is greater than the converged frequency cutoff value, wherein the sequence identifying the converged immune receptor clone comprises a CDR3 sequence.

As used herein, the term "immune repertoire" refers to a collection of T cell receptors and B cell receptors (e.g., immunoglobulins) that make up the adaptive immune system of an organism. As used herein, "frequency of convergence" refers to the aggregate frequency (excluding allele information) of clones sharing variable genes.

In an embodiment, the amplification comprises a multiplex amplification reaction comprising a plurality of amplification primer pairs comprising a plurality of junction (J) gene primers for a majority of J genes of the target immune receptor (i.e., the primer pairs comprise complementary sequences of J genes). The methods described herein allow for targeting of the junction genes with outward facing primers and thereby detection of V (D) J regions, rather than direct targeting of each V gene. In an example, V gene identity and sequences comprising CDR3 amino acid sequences are used to identify convergent immune receptor clones. In embodiments, the sequences identifying the converged immune receptor clone comprise CDR1 and CDR3 sequences or CDR2 and CDR3 sequences. In an embodiment, the converged immune receptor clone has the same CDR3 amino acid sequence. In embodiments, the target immunoreceptor nucleic acid molecule comprises FR1, CDR1, FR2, CDR2, FR3, and CDR3 coding regions of the target immunoreceptor.

As used herein, a "convergent TCR set" is a set of T Cell Receptors (TCRs) that are similar in amino acid sequence and functionally equivalent or identical or hypothesized to be identical in amino acid sequence. Because of amino acid similarity, it is generally assumed that the convergent TCR sets recognize the same antigen. In some embodiments, the converging TCR panel members are identical or assumed to be identical in the variable gene and CDR3 amino acid sequences, despite having different nucleotide sequences. Convergent TCR panel members may be caused by differences in non-templated nucleotide bases at VDJ junctions that occur during the generation of productive TCR gene rearrangements. To assess TCR convergence, for example, it is determined that TCR β chains are identical in amino acid sequence but have different nucleotide sequences.

In some embodiments, the subject is treated with therapy in a manner that depends on the frequency of converging immune receptor clones. For example, in some embodiments, a subject having a frequency of convergent immunoreceptor clones greater than a frequency cutoff value for convergent indicates that the subject is a candidate for therapy, and a subject having a frequency of convergent immunoreceptor clones less than a frequency cutoff value for convergent indicates that the subject is not a candidate for therapy. In some embodiments, provided methods comprise identifying converging immune receptor clones from immune receptor clones present in a sample at a frequency of greater than 1/50,000. In some embodiments, the converged frequency cutoff value is a frequency greater than 0.01. In some embodiments, the subject has cancer and is a candidate for immunotherapy. In other embodiments, the subject is a candidate for vaccination against the source of infection or infectious disease. In other embodiments, the subject is a candidate for treatment with an autoimmune inhibitor.

In some embodiments, provided methods comprise using V gene identity and sequences comprising CDR3 amino acid sequences to identify convergent immunoreceptor clones. In some embodiments, provided methods comprise identifying a converged immune receptor clone using a sequence comprising CDR3 sequences, CDR1 and CDR3 sequences, or CDR2 and CDR3 sequences.

In some embodiments, provided methods comprise identifying a converged TCR clone as comprising those having TCR variability and CDR3 rearrangement that are similar or identical in amino acid sequence but different in nucleotide sequence. For example, a significant portion of TCRs that differ from each other by one amino acid residue may have similar or identical specificity for an antigen, and thus such TCRs may be considered convergent.

In some embodiments, the change in TCR clone frequency that converges during treatment with therapy can be used as a predictor of response to therapy. In a manner that depends on the type of disease and the treatment, in some embodiments, responders may be distinguished from non-responders by an increase in the frequency of TCR clones converging during therapy. For example, in cancers (or chronic viral infections) in which the converging TCR clones of the T cell population consist primarily of progenitor-depleted T cell phenotypes, terminally depleted phenotypes or effector phenotypes, and effector T cells of other T cell phenotypes, an increase in the frequency of converging TCR clones during treatment may be indicative of an increase in anti-cancer (or anti-viral) T cell activity. In other cancers, the converged TCR clones may have predominantly a T regulatory phenotype, and an increase in frequency of converged TCR clones during therapy may indicate poor prognosis.

In some embodiments, the measurement or determination of the frequency of converging TCR clones is combined with other T cell library features, such as measurement of T cell clonal expansion, to improve prediction of clinical responsiveness. In some embodiments, a measurement or determination of the frequency of converging TCR clones is combined with a measurement of B cell pool characteristics, such as B cell clone expansion, to improve prediction of clinical responsiveness. In some embodiments, measurement or determination of the frequency of converging TCR clones is combined with measurement or detection of expression of one or more genes associated with immune responses to improve prediction of clinical responsiveness. Such immune response related genes include, but are not limited to, PD-1 and/or PD-L1 genes, interferon-gamma pathway genes, and myeloid-derived suppressor cell related genes. Procedures and reagents for detecting or measuring such gene expression are known in the art and include, but are not limited to, quantitative or semi-quantitative PCR assays, comparative hybridization methods or sequencing procedures, and reagents and kits for use, including, but not limited to, taqman assays and oncoming immune response research assays (sameinshi technologies).

In embodiments, the method further comprises identifying a clonotype. In embodiments, the method further comprises quantifying the clonotypes present in the sample (e.g., exhibiting clonotype properties). "clonotype properties" refers to a collection of different clonotypes derived from a lymphocyte population and their relative abundances, which may be expressed as frequencies (i.e., values between 0 and 1) in a given population, for example. Typically, the lymphocyte population is obtained from a tissue sample. The term "clonotype properties" relates to the immunological concept of the immune "pool" as described below, but is more general: arstina et al Science 280:958-961 (1999); and Kedzierka et al, molecular immunology (mol. Immunol.), 45 (3): 607-618 (2008).

In an embodiment, the clonotype profile comprises at least 10 ³ Different clonotypes. In an embodiment, the clonotype profile comprises at least 10 ⁸ Different clonotypes. In an embodiment, the clonotype profile comprises at least 10 ⁵ Different clonotypes. In an embodiment, the clonotype profile comprises at least 10 ⁶ Different clonotypes. In the case of an embodiment of the present invention,such clonotype properties may further comprise the abundance (i.e., quantification) or relative frequency of each different clonotype. In an embodiment, the clonotype property is a set of different recombinant nucleotide sequences (and abundance thereof) or fragments thereof encoding a T receptor (TCR) or B Cell Receptor (BCR), respectively, in a lymphocyte population of an individual, wherein the nucleotide sequences of the set have a correspondence (e.g., a 1:1 correspondence) with different lymphocytes or clonal sub-populations thereof of substantially all lymphocytes of the population.

In embodiments, the first primer hybridizes to one or more non-fused circular template polynucleotides and the second primer hybridizes to one or more fused circular template polynucleotides. In embodiments, the second primer hybridizes to one or more non-fused circular template polynucleotides and the first primer hybridizes to one or more fused circular template polynucleotides. In an embodiment, the plurality of first primers hybridizes to the plurality of non-fused circular template polynucleotides. In an embodiment, the plurality of second primers hybridizes to the plurality of fusion circular template polynucleotides.

In embodiments, the one or more linear nucleic acid molecules comprise DNA, RNA, or cDNA; optionally wherein the DNA or RNA is cell-free nucleic acid. In embodiments, the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion junction comprises an exon junction. In embodiments, the one or more linear nucleic acid molecules comprise cDNA and the fusion junctions comprise exon junctions. In embodiments, the one or more linear nucleic acid molecules comprise RNA and the fusion junction comprises an exon junction. In embodiments, the one or more linear nucleic acid molecules comprise DNA and the fusion junction comprises an exon junction. In embodiments, the one or more linear nucleic acid molecules comprise a sample barcode sequence, a molecular identifier sequence, or both a sample barcode sequence and a molecular identifier sequence.

In embodiments, the fusion gene comprises an inter-chromosomal translocation (e.g., a fusion junction of two different chromosomes) or an intra-chromosomal translocation (e.g., a fusion junction of the same chromosome). In embodiments, the fusion gene comprises an interchhromosomal translocation. In embodiments, the fusion gene comprises an intrachromosomal translocation. In embodiments, the chromosomal translocation comprises a partially or fully rearranged B cell or T cell antigen receptor. In embodiments, the intrachromosomal translocation comprises a partially rearranged B cell antigen receptor. In embodiments, the intrachromosomal translocation comprises a partially rearranged T cell antigen receptor. In embodiments, the intrachromosomal translocation comprises a fully rearranged B cell antigen receptor. In embodiments, the intrachromosomal translocation comprises a fully rearranged T cell antigen receptor.

In embodiments, the sequence of the first region comprises the sequence of a first gene (e.g., the entire gene sequence or a portion thereof) and the sequence of the second region comprises the sequence of a second gene (e.g., the entire gene sequence or a portion thereof). In embodiments, the location at which the first gene is linked to the second gene by an internucleoside linkage is a fusion junction.

In an embodiment, the linear nucleic acid molecules are obtained from a peripheral blood sample using conventional techniques. For example, white blood cells can be isolated from a blood sample using conventional techniques, such as the rosetteep kit. The volume of the blood sample may range from 100 μl to 10mL. In embodiments, the volume of the blood sample ranges from 100 μl to 2mL, and nucleic acid molecules (e.g., DNA and/or RNA) can then be extracted from such blood samples using conventional techniques, such as dnasy blood and tissue kits. Optionally, subsets of leukocytes, such as lymphocytes, may be further isolated using conventional techniques, such as Fluorescence Activated Cell Sorting (FACS) or Magnetically Activated Cell Sorting (MACS). Cell-free DNA nucleic acid molecules may also be extracted from peripheral blood samples using conventional techniques as described in: US6,258,540 or Huang et al, methods of molecular biology (biol.), 444:203-208 (2008), each of which is incorporated herein by reference. For example, peripheral blood may be collected in EDTA tubes, which may then be fractionated into plasma, white blood cells, and red blood cell components by centrifugation. DNA from cell-free plasma fractions (e.g., 0.5 to 2.0 mL) can be extracted using a QIAamp DNA Blood Mini Kit (Blood Mini Kit) Kit according to manufacturer's protocol. Various methods and commercially available kits for isolating different sub-populations of T cells and B cells are known in the art and include, but are not limited to, subset selection immunomagnetic bead isolation or flow cytometric cell sorting using antibodies specific for one or more of any of a variety of known T cell and B cell surface markers. Illustrative markers include, but are not limited to, one or a combination of the following: CD2, CD3, CD4, CD8, CD14, CD19, CD20, CD25, CD28, CD45RO, CD45RA, CD54, CD62L, CDw137 (41 BB), CD154, GITR, foxP3, CD54, and CD28. For example, and as known to those of skill in the art, cell surface markers such as CD2, CD3, CD4, CD8, CD14, CD19, CD20, CD45RA, and CD45RO can be used to determine T, B and monocyte lineages and subpopulations in flow cytometry. Similarly, forward light scattering, side scattering, and/or cell surface markers, such as CD25, CD62L, CD, CD137, CD154, may be used to determine the activation status and functional characteristics of the cells. The linear nucleic acid molecules (e.g., DNA or RNA) can be extracted from cells in a sample, such as a blood or lymph sample or other sample from a subject known to have or suspected of having a disease (e.g., a lymphohematologic malignancy), using standard methods known in the art or commercially available kits.

In embodiments, the blocking element comprises an oligonucleotide, a protein, or a combination thereof. In an embodiment, the blocking element comprises an oligonucleotide. In an embodiment, the blocking element is an oligonucleotide. In an embodiment, the blocking element is an oligonucleotide having 5-25 nucleotides. In an embodiment, the blocking element is an oligonucleotide having 10-50 nucleotides. In an embodiment, the blocking element is an oligonucleotide having 20-75 nucleotides. In embodiments, the blocking element is an oligonucleotide having about 5, about 10, about 20, about 25, about 50, or about 75 nucleotides. In an embodiment, the blocking element is a non-extendable oligomer. In embodiments, the blocking element comprises two or more oligonucleotides arranged in tandem. In embodiments, the blocking element comprises an oligonucleotide and an oligonucleotide that is the inverse complement or partial inverse complement of the oligonucleotide (e.g., producing a pair of partially overlapping oligonucleotides). In an embodiment, the blocking element is a single stranded oligonucleotide having a 5 'end and a 3' end. In an embodiment, the blocking element comprises a 3' -blocked oligonucleotide. In an embodiment, the blocking element comprises a blocking moiety on the 3' nucleotide. The blocking moiety on a nucleotide may be reversible, whereby the blocking moiety may be removed or modified to allow the 3 'hydroxyl group to form a covalent bond with the 5' phosphate of another nucleotide. For example, a reversible terminator may refer to a blocking moiety located, for example, at the 3' position of a nucleotide, and may be a chemically cleavable moiety, such as allyl, azidomethyl, or methoxymethyl, or may be an enzymatically cleavable group, such as a phosphate. In embodiments, the blocking moiety is irreversible (e.g., a blocking element comprising the blocking moiety irreversibly prevents extension).

In an embodiment, the blocking element is a non-extendable oligonucleotide. Blocking groups known in the art may be placed at or near the 3' end of an oligonucleotide (e.g., primer) to prevent extension, as described in US 2010/0167353. Primers or other oligonucleotides may be modified at the 3 'terminal nucleotide to prevent or inhibit the onset of DNA synthesis by, for example, adding a 3' deoxyribonucleotide residue (e.g., cordycepin), a 2',3' -dideoxyribonucleotide residue, a non-nucleotide linkage, or an alkane-diol modification (see, e.g., U.S. patent No. 5,554,516). Alkane diol modifications that can be used to inhibit or block primer extension are also described by: wilk et al (1990 Nucleic Acids Res.) 18 (8): 2065) and Arnold et al (U.S. Pat. No. 6,031,091). Further examples of suitable blocking groups include 3' hydroxy substitution (e.g., 3' -phosphate, 3' -triphosphate or 3' -phosphodiester with an alcohol, such as 3-hydroxypropyl), 2'3' -cyclic phosphate, 2' hydroxy substitution of terminal RNA bases (e.g., phosphate or a sterically bulky group, such as Triisopropylsilyl (TIPS) or tert-butyldimethylsilyl (TBDMS)). 2 '-alkylsilyl groups substituted at the 3' end of oligonucleotides, such as TIPS and TBDMS, are described in US2007/0218490, which is incorporated herein by reference. Bulky substituents may also be incorporated on the base of the 3' terminal residue of the oligonucleotide to block primer extension.

In embodiments, the blocking element comprises an oligonucleotide with a 3' dideoxynucleotide or similar modification to prevent polymerase extension and is used in conjunction with a non-strand displacement polymerase. In some embodiments, the blocking oligomer contains one or more non-natural bases (e.g., LNA bases) that facilitate hybridization of the blocking agent to the target sequence. In some embodiments, the blocking oligomer contains additional modified bases to increase resistance to exonuclease digestion (e.g., one or more phosphorothioate linkages). In an embodiment, the blocking element is an oligonucleotide comprising one or more modified nucleotides that are complementary to each other, such as iso-dGTP or iso-dCTP. In a polymerization reaction lacking complementary modified nucleotides, extension is blocked. In another embodiment, the blocking element is an oligonucleotide comprising a 3' cleavable linker comprising PEG, thereby blocking extension. In another embodiment, the blocking element is an oligonucleotide comprising one or more sequences recognized and bound by one or more short RNA or PNA oligonucleotides, thereby blocking the extension of strand displacing DNA polymerase that is not capable of strand displacing RNA or PNA. In embodiments, the blocking element is a modified nucleotide (e.g., a nucleotide comprising a reversible terminator, such as a 3' -reversible termination moiety).

In embodiments, the blocking element comprises an oligonucleotide, a protein, or a combination thereof. In an embodiment, the blocking element comprises a protein. In embodiments, the blocking element comprises one or more proteins. The blocking element need not be an oligomer; in some embodiments, for example, the blocking element is a protein that selectively binds to the target sequence and prevents polymerase extension. In embodiments, the blocking element is an oligonucleotide comprising one or more modified nucleotides. In embodiments, the blocking element is an oligonucleotide comprising one or more modified nucleotides, wherein the one or more modified nucleotides are linked to biotin to which a protein (e.g., streptavidin) can bind, thereby blocking polymerase extension. In embodiments, the blocking element comprises one or more sequences that are recognized and bound by one or more single-stranded DNA binding proteins, thereby blocking polymerase extension at the binding site.

In embodiments, the blocking element comprises a CRISPR-Cas9 complex. For example, guide RNAs that specifically target non-fusion sequences are used to introduce them into samples containing circularized ssDNA. The CRISPR-Cas9 complex then targets and cleaves the non-fusion sequences present in any circular ssDNA molecule. After linearizing the non-fused circular ssDNA molecules by CRISPR complexes, an exonuclease digestion can then be performed to digest the linear ssDNA molecules, thereby enriching the circular ssDNA molecules containing the fusion gene (e.g., lacking the non-fused gene sequence targeted by the guide RNA).

In an embodiment, the blocking element comprises biotin. For example, after circularization, the biotinylated blocking element is hybridized to a non-fusion gene sequence. The circular ssDNA molecules hybridized to the biotinylated blocking elements are then pulled down using, for example, streptavidin-coated magnetic beads, thereby depleting any sample containing non-fused circular molecules prior to amplification.

In embodiments, the blocking element comprises a restriction site. For example, the blocking element acts as a splint to enable restriction enzyme mediated digestion of non-fused circular ssDNA containing molecules into non-amplifiable linear fragments. The methylation-blocking oligomer can be used in combination with a methylation-sensitive restriction enzyme (e.g., notI, naeI, nsbI, salI, hapII or HaeII).

In an embodiment, the binding blocking member comprises a binding blocking member upstream of the first primer. The terms "upstream" and "downstream" are used in accordance with their ordinary meaning in the art and refer to positioning toward the 5 'end (upstream) or positioning toward the 3' end (downstream) when referring to a nucleic acid. In an embodiment, the blocking element binds to about 1 to 150 nucleotides upstream relative to the first primer. In an embodiment, the blocking element binds to about 1 to 15 nucleotides upstream relative to the first primer. In embodiments, the blocking element binds to about 10 to about 25 nucleotides upstream relative to the first primer.

In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 1 to 100 nucleotides, downstream of the fusion junction within the fusion gene. In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 10 to about 50 nucleotides, downstream of the fusion junction within the fusion gene. In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 50 to about 200 nucleotides, downstream of the fusion junction within the fusion gene. In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 50 to about 100 nucleotides, downstream of the fusion junction within the fusion gene. In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 25 to about 50 nucleotides, downstream of the fusion junction within the fusion gene. In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 50 nucleotides, downstream of the fusion junction within the fusion gene. In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 25 nucleotides, downstream of the fusion junction within the fusion gene. In embodiments, the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 10 nucleotides, downstream of the fusion junction within the fusion gene.

In embodiments, the method further comprises binding a second blocking element to the one or more non-fusion circular template polynucleotides downstream relative to the second primer. In embodiments, the second blocking element binds to about 100 to about 300 nucleotides downstream relative to the second primer. In embodiments, the second blocking element binds to about 75 to about 150 nucleotides downstream relative to the second primer. In embodiments, the second blocking element binds to about 50 to about 300 nucleotides downstream relative to the second primer. In embodiments, the second blocking element binds to about 100 to about 400 nucleotides downstream relative to the second primer. In embodiments, the second blocking element binds to about 100 to about 400 nucleotides downstream relative to the second primer.

In an embodiment, the method further comprises repeating steps ii) and iii). In an embodiment, the method further comprises repeating the following: ii) binding a blocking element to the one or more non-fused circular template polynucleotides; and iii) hybridizing a first primer and a second primer to the one or more non-fusion circular template polynucleotides and the one or more fusion circular template polynucleotides and extending with a polymerase to produce a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product, wherein the first amount is detectably less than the second amount; thereby differentially amplifying polynucleotides comprising fusion genes (e.g., fusion genes comprising fusion junctions).

In embodiments, the first primer and the second primer hybridize to complementary sequences of the one or more fused circular template polynucleotides and the one or more non-fused circular template polynucleotides, wherein the first primer and the second primer are about 1 to about 50 nucleotides apart. In embodiments, the first primer and the second primer hybridize to complementary sequences of the one or more fused circular template polynucleotides and the one or more non-fused circular template polynucleotides, wherein the first primer and the second primer are about 1 to about 10 nucleotides apart. In embodiments, the first primer and the second primer hybridize to complementary sequences of the one or more fused circular template polynucleotides and the one or more non-fused circular template polynucleotides, wherein the first primer and the second primer are about 5 to about 25 nucleotides apart. In an embodiment, the first primer and the second primer are about 10 nucleotides apart. In an embodiment, the first primer and the second primer are about 25 nucleotides apart. In an embodiment, the first primer and the second primer are about 50 nucleotides apart. In an embodiment, the first primer and the second primer are about 75 nucleotides apart. In an embodiment, the first primer and the second primer are separated by about 100 nucleotides.

In embodiments, the second amount is about 1%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 75% greater than the first amount. In embodiments, the second amount is about 0.01%, about 0.05%, about 0.010%, about 0.015%, about 0.020%, about 0.025%, about 0.030%, about 0.040%, about 0.050%, about 0.075% greater than the first amount. In embodiments, the second amount is about 0.1%, about 0.5%, about 0.10%, about 0.15%, about 0.20%, about 0.25%, about 0.30%, about 0.40%, about 0.50%, about 0.75% greater than the first amount. In an embodiment, the second number is greater than the first number. In embodiments, the first amount is about 1%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 75% less than the second amount. In embodiments, the first amount is about 0.01%, about 0.05%, about 0.010%, about 0.015%, about 0.020%, about 0.025%, about 0.030%, about 0.040%, about 0.050%, about 0.075% less than the second amount. In embodiments, the first amount is about 0.1%, about 0.5%, about 0.10%, about 0.15%, about 0.20%, about 0.25%, about 0.30%, about 0.40%, about 0.50%, about 0.75% less than the second amount.

In embodiments, the second number is about 2 times, at least about 1.5 times, at least about 2.0 times, at least about 2.5 times, at least about 5 times, at least about 10 times, or more than about 10 times the first number. In an embodiment, the second number is about 1.0 times the first number. In an embodiment, the second number is about 2.0 times the first number. In an embodiment, the second number is about 5.0 times the first number. In an embodiment, the second number is about 20 times the first number.

In an embodiment, the second amount of quantification after one extension cycle is measurably higher than the first amount. In an embodiment, the method produces a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product at a ratio of 1.00:1.01. In an embodiment, the ratio of the first number to the second number is 1.00:1.02. In an embodiment, the ratio of the first number to the second number is 1.00:1.05. In an embodiment, the ratio of the first number to the second number is 1.00:1.10. After 35 extension cycles (e.g., 35 PCR cycles, each of which comprises the steps of primer hybridization, primer extension, and denaturation), a second amount enriched by about 1.999-fold relative to the first amount is produced at a ratio of 1.00:1.02, wherein the enrichment is Multiple of 1.02 ³⁵ . In an embodiment, the second number of quantification after a plurality of extension cycles (e.g., 5, 10, 15, 20) is measurably higher than the first number. In embodiments, the second amount quantified after 1 minute, 2 minutes, 3 minutes, 4 minutes, 5 minutes, 10 minutes, 15 minutes, or 20 minutes of amplification (e.g., eRCA) is measurably higher than the first amount.

In embodiments, the one or more linear nucleic acid molecules are about 20 to about 1000 nucleotides in length, about 100 to about 300 nucleotides in length, about 300 to about 500 nucleotides in length, or about 500 to about 1000 nucleotides in length. In embodiments, the one or more linear nucleic acid molecules are about 20 to 1000 nucleotides in length. In embodiments, the one or more linear nucleic acid molecules are about 100 to about 300 nucleotides in length. In embodiments, the one or more linear nucleic acid molecules are about 300 to about 500 nucleotides in length. In embodiments, the one or more linear nucleic acid molecules are about 500 to about 1000 nucleotides in length. In embodiments, the one or more linear nucleic acid molecules are about 20, about 50, about 75, about 100, about 150, about 200, about 250, about 300, about 350, about 400, about 450, about 500, about 550, about 600, about 650, about 700, about 750, about 800, about 850, about 900, about 950, or about 1000 nucleotides in length.

In an embodiment, the linear molecule is derived from a biological sample. In an embodiment, the linear molecule is derived from a sample. In an embodiment, the linear molecule is derived from a patient suffering from a disease. In embodiments, the linear molecule is derived from a cancer patient. "patient" refers to a living organism (i.e., a subject) suffering from or susceptible to a disease or condition. Non-limiting examples include humans, other mammals, cows, rats, mice, dogs, monkeys, goats, sheep, cows, deer, and other non-mammals. In some embodiments, the patient is a human.

In embodiments, the one or more linear nucleic acid molecules comprise DNA, RNA, or cDNA; optionally wherein the DNA or RNA is a cell-free nucleic acid molecule. In embodiments, the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion junction is located at an exon junction. In embodiments, the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction formed by alternative splicing. In embodiments, the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction formed by a splice defect.

In embodiments, the one or more linear nucleic acid molecules comprise a barcode sequence. In embodiments, a plurality of linear nucleic acid molecules (e.g., all linear nucleic acid molecules from a particular sample source or sub-sample thereof) are conjugated to a first barcode sequence, while a different plurality of linear nucleic acid molecules (e.g., all linear nucleic acid molecules from a different sample source or different sub-sample) are conjugated to a second barcode sequence, thereby correlating each of the plurality of linear nucleic acid molecules with a different barcode sequence indicative of the sample source. In embodiments, each barcode sequence of the plurality of barcode sequences differs from each other barcode sequence of the plurality of barcode sequences by at least three nucleotide positions, such as at least 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotide positions. In some embodiments, a substantially degenerate barcode sequence may be referred to as random. In some embodiments, the barcode sequence may comprise a nucleic acid sequence from a pool of known sequences. In some embodiments, the barcode sequence may be predefined. In embodiments, the barcode sequence comprises about 1 to about 10 nucleotides. In embodiments, the barcode sequence comprises about 3, 4, 5, 6, 7, 8, 9, or about 10 nucleotides. In an embodiment, the barcode sequence comprises about 3 nucleotides. In an embodiment, the barcode sequence comprises about 5 nucleotides. In an embodiment, the barcode sequence comprises about 7 nucleotides. In an embodiment, the barcode sequence comprises about 10 nucleotides. In embodiments, the barcode sequence comprises about 6 to about 10 nucleotides.

FIGS. 1 and example 1 describe examples of how cDNA can be fragmented to produce linear nucleic acid molecules. In embodiments, the polynucleotide is fragmented to an average length of about 150, about 250, or about 350 base pairs prior to circularizing one or more linear nucleic acid molecules. Fragmentation can be achieved by methods known in the art (e.g., enzymatic fragmentation, acoustic fragmentation). In embodiments, the polynucleotide is fragmented using enzymatic or acoustic fragmentation to produce linear nucleic acid molecules. In embodiments, the input polynucleotide is derived from a fresh or freshly frozen sample and is minimally degraded prior to fragmentation. Next, the ssDNA fragments are circularized by the CircLigaseTM or methods described herein. In some embodiments, circularization is facilitated by denaturing the nucleic acid prior to circularization. Residual linear DNA molecules may optionally be digested. This can be accomplished by methods known in the art (e.g., treatment with Exo I and/or Exo III enzymes).

In embodiments, circularization comprises intramolecular conjugation of the 5 'and 3' ends of the linear nucleic acid molecules. In an embodiment, cyclizing comprises a ligation reaction. In an embodiment, the two ends of the linear nucleic acid molecule are directly linked together. In an embodiment, the two ends of the linear nucleic acid molecule are joined together with the aid of bridging oligonucleotides (sometimes referred to as splint oligonucleotides) that are complementary to the two ends of the linear nucleic acid molecule. Methods for forming circular DNA templates are known in the art, e.g., linear polynucleotides are prepared in a non-template driven reaction with a circularized ligase, such as CircLigase ^TM 、CircLigase ^TM II. Taq DNA ligase, hiFiTaq DNA ligase, T4 DNA ligase orThe DNA ligase performs circularization. In some embodiments, circularization is facilitated by denaturing the double-stranded linear nucleic acid prior to circularization. Residual linear DNA molecules may optionally be digested. In some embodiments, cyclization is promoted by chemical ligation (e.g., click chemistry, e.g., copper catalyzed reaction of an alkyne (e.g., 3 'alkyne) and an azide (e.g., 5' azide). In an embodiment, the linear DNA fragment is a-tailed (e.g., a-tailed using Taq DNA polymerase) prior to circularization.

In an embodiment, the circularization of the linear nucleic acid molecule is performed with CircLigase ^TM The enzyme is performed. In embodiments, circularization of the linear nucleic acid molecule is performed with a thermostable RNA ligase or a mutant thereof. In an embodiment, the circularization of the linear nucleic acid molecule is performed with RNA ligase from phage TS2126 or mutants thereof. For example, the RNA ligase may be a TS2126 RNA ligase as described in U.S. patent publication 2005/0266439, which is incorporated herein by reference in its entirety.

In embodiments, circularization comprises ligating the first hairpin and the second hairpin adaptors to a linear nucleic acid molecule, thereby forming a circular polynucleotide.

In embodiments, the hairpin adaptors comprise a single nucleic acid strand comprising a stem loop structure. The hairpin adaptors may be of any suitable length. In some embodiments, the hairpin adaptors are at least 40, at least 50, or at least 100 nucleotides in length. In some embodiments, the hairpin adaptors are in the range of 45 to 500 nucleotides, 75 to 500 nucleotides, 45 to 250 nucleotides, 60 to 250 nucleotides, or 45 to 150 nucleotides in length. In some embodiments, the hairpin adaptors comprise a nucleic acid having a 5 'end, a 5' portion, a loop, a 3 'portion, and a 3' end (e.g., arranged in a 5 'to 3' orientation). In some embodiments, the 5 'portion of the hairpin adapter anneals to and/or hybridizes with the 3' portion of the hairpin adapter, thereby forming the stem portion of the hairpin adapter. In some embodiments, the 5 'portion of the hairpin adapter is substantially complementary to the 3' portion of the hairpin adapter. In certain embodiments, the hairpin adaptors comprise a stem portion (i.e., a stem) and a loop, wherein the stem portion is substantially double-stranded, thereby forming a duplex. In some embodiments, the loop of the hairpin adapter comprises a nucleic acid strand that is non-complementary (e.g., substantially non-complementary) to itself or any other portion of the hairpin adapter. In some embodiments, the second adapter comprises a sample barcode sequence, a molecular identifier sequence, or both a sample barcode sequence and a molecular identifier sequence. In some embodiments, the second adapter comprises a sample barcode sequence.

In some embodiments, the duplex region or stem portion of the hairpin adapter comprises an end configured for ligation to an end of a double-stranded nucleic acid (e.g., a nucleic acid fragment, e.g., a library insert). In embodiments, the end of the duplex region or stem portion of the hairpin adapter comprises a 5 'overhang or 3' overhang that is complementary to a 3 'overhang or 5' overhang of one end of the double stranded nucleic acid. In some embodiments, one end of the duplex region or stem portion of the hairpin adapter comprises a blunt end that can be linked to a blunt end of a double-stranded nucleic acid. In certain embodiments, the end of the duplex region or stem portion of the hairpin adapter comprises a phosphorylated 5' end. In some embodiments, the stem portion of the hairpin adapter is at least 15, at least 25, or at least 40 nucleotides in length. In some embodiments, the stem portion of the hairpin adapter ranges in length from 15 to 500 nucleotides, 15 to 250 nucleotides, 15 to 200 nucleotides, 15 to 150 nucleotides, 20 to 100 nucleotides, or 20 to 50 nucleotides.

In some embodiments, the loop of the hairpin adapter comprises one or more of the following: primer binding sites, capture nucleic acid binding sites (e.g., nucleic acid sequences complementary to capture nucleic acids), UMI, sample barcodes, sequencing adaptors, tags, and the like, or a combination thereof. In certain embodiments, the loop of the hairpin adapter comprises a primer binding site. In certain embodiments, the loop of the hairpin adapter comprises a primer binding site and UMI. In certain embodiments, the loop of the hairpin adapter comprises a binding motif.

In some embodiments, the predicted, calculated, average, mean, or absolute melting temperature (Tm) of the loop of the hairpin adapter is greater than 50 ℃, greater than 55 ℃, greater than 60 ℃, greater than 65 ℃, greater than 70 ℃, or greater than 75 ℃. In some embodiments, the predicted, estimated, calculated, average, mean or absolute melting temperature (Tm) of the loop of the hairpin adapter is in the range of 50-100 ℃, 55-100 ℃, 60-100 ℃,65-100 ℃, 70-100 ℃, 55-95 ℃, 65-95 ℃, 70-95 ℃, 55-90 ℃, 65-90 ℃, 70-90 ℃, or 60-85 ℃. In an embodiment, the Tm of the ring is about 65 ℃. In an embodiment, the Tm of the ring is about 75 ℃. In an embodiment, the Tm of the ring is about 85 ℃. The Tm of the loop of the hairpin adapter can be altered (e.g., increased) to the desired Tm using suitable methods, such as by altering (e.g., increasing GC content), altering (e.g., increasing) length, and/or by including modified nucleotides, nucleotide analogs, and/or modified nucleotide linkages, non-limiting examples of which include locked nucleic acids (LNA, e.g., bicyclic nucleic acids), bridged nucleic acids (BNA, e.g., limiting nucleic acids), C5 modified pyrimidine bases (e.g., 5-methyl-dC, propynylpyrimidine, etc.), and alternative backbone chemicals, such as Peptide Nucleic Acids (PNA), morpholino, etc., or combinations thereof. Thus, in some embodiments, the loop of the hairpin adapter comprises one or more modified nucleotides, nucleotide analogs, and/or modified nucleotide linkages.

In some embodiments, the loops of the hairpin adaptors independently comprise a GC content of greater than 40%, greater than 50%, greater than 55%, greater than 60%, greater than 65%, or greater than 70%. In certain embodiments, the loops of the hairpin adaptors independently comprise a GC content in the range of 40-100%, 50-100%, 60-100%, or 70-100%. In embodiments, the GC content of the ring is about or greater than about 40%. In embodiments, the GC content of the ring is about or greater than about 50%. In embodiments, the GC content of the ring is about or greater than about 60%. Non-base modifying genes can also be incorporated into the loop of the hairpin adapter to increase Tm, non-limiting examples of which include Minor Groove Binders (MGBs), spermine, G-clamp, uaq anthraquinone caps, and the like, or combinations thereof. The loop of the hairpin adapter may be of any suitable length. In some embodiments, the loop of the hairpin adapter is at least 15, at least 25, or at least 40 nucleotides in length. In some embodiments, the hairpin adaptors are in the range of 15 to 500 nucleotides, 15 to 250 nucleotides, 20 to 200 nucleotides, 30 to 150 nucleotides, or 50 to 100 nucleotides in length.

In certain embodiments, the predicted, estimated, calculated, average, mean or absolute Tm of the duplex region or stem region of the hairpin adapter is in the range of 30-70 ℃, 35-65 ℃, 35-60 ℃, 40-65 ℃, 40-60 ℃, 35-55 ℃, 40-55 ℃, 45-50 ℃ or 40-50 ℃. In embodiments, the Tm of the stem region is about or above about 35 ℃. In embodiments, the Tm of the stem region is about or above about 40 ℃. In embodiments, the Tm of the stem region is about or above about 45 ℃. In embodiments, the Tm of the stem region is about or above about 50 ℃.

In embodiments, circularization comprises contacting the double stranded polynucleotide with at least one prokaryotic telomerase. In embodiments, the double-stranded polynucleotide comprises complementary prokaryotic telomerase target sequences at both ends (e.g., the 5 'and 3' ends of each strand comprise a prokaryotic telomerase recognition sequence or complement thereof). For example, a double-stranded enzyme recognition DNA molecule is inserted at both ends of a target double-stranded DNA molecule (e.g., double-stranded prokaryotic telomerase recognition sequences, such as TeIN prokaryotic telomerase recognition sequences, have been ligated to each end of the dsDNA molecule). Then, for example, E.coli phage N15 prokaryotic telomerase (TelN) catalyzes the recognition of the DNA molecule by a double-stranded enzyme on both ends of the target double-stranded DNA molecule to produce a circularized DNA molecule of the circularized target double-stranded DNA molecule. The TelN recognition sequence is TATCAGCACACAATTGCCCATTATACGCGCGTATAATGGACTATTGTGTGCTGATA (SEQ ID NO: 1). TelN cleaves this sequence at its midpoint and engages the ends of the complementary strand to form a covalent closed end. Additional methods for prokaryotic telomerase cyclization and prokaryotic telomerase are disclosed in PCT patent publications WO2021236792 and WO2021/078947 and U.S. patent publication 2013/0216562, each of which is incorporated herein by reference in its entirety.

In embodiments, circularization comprises hybridizing a splint to both ends of the linear nucleic acid molecule and either i) ligating adjacent ends, or ii) extending the 3' end of the linear nucleic acid molecule along the splint to create a splint complementary sequence, and ligating the 3' end of the complementary sequence to the 5' end of the linear nucleic acid molecule. In an embodiment, the splint comprises a bar code. In embodiments, the splint comprises primer binding sites (e.g., sequences complementary to amplification or sequencing primers).

In one embodiment, enzymes are used to ligate the two ends of the linear nucleic acid molecule. For example, linear polynucleotides are purified in a non-template driven reaction using a circularized ligase (e.g., circLigase ^TM The enzyme, taq DNA ligase, hiFi Taq DNA ligase, T4 DNA ligase, PBCV-1DNA ligase (also known as SplingR ligase) or amplinase DNA ligase). Non-limiting examples of ligasesComprising a DNA ligase such as DNA ligase I, DNA ligase II, DNA ligase III, DNA ligase IV, T4 DNA ligase, T7 DNA ligase, T3 DNA ligase, E.coli DNA ligase, PBCV-1DNA ligase (also known as SplingR ligase) or Taq DNA ligase. In embodiments, the ligase comprises a T4 DNA ligase, T4 RNA ligase 1, T4 RNA ligase 2, T3 DNA ligase, or T7 DNA ligase. In an embodiment, the enzymatic ligation is performed by a mixture of ligases. In embodiments, the ligase is selected from the group consisting of: t4 DNA ligase, T4 RNA ligase 1, T4 RNA ligase 2, rtcB ligase, T3 DNA ligase, T7 DNA ligase, taq DNA ligase, PBCV-1DNA ligase, thermostable DNA ligases (e.g., 5' AppDNA/RNA ligases), ATP dependent DNA ligases, RNA dependent DNA ligases (e.g., splingR ligases), and combinations thereof. In an embodiment, the two ends of the template polynucleotide are ligated together with the aid of splint primers that are complementary to the two ends of the template polynucleotide. For example, the T4 DNA ligase reaction may be performed by combining a linear polynucleotide, ligation buffer, ATP, T4 DNA ligase, water, and incubating the mixture between about 20 ℃ and about 45 ℃ for about 5 minutes to about 30 minutes. In some embodiments, the T4 ligation reaction is incubated for 30 minutes at 37 ℃. In some embodiments, the T4 ligation reaction is incubated at 45 ℃ for 30 minutes. In the examples, the ligase reaction was terminated by adding Tris buffer with high EDTA and incubating for 1 min.

In embodiments, the linear nucleic acid molecule may undergo intramolecular circularization (by ligation or annealing) without being ligated to a circularized adaptor (e.g., self-circularization). Circularization can be achieved with a ligase at about 4℃to 35 ℃ (without circularization of the adaptors). In an embodiment, a linear nucleic acid molecule of interest may be ligated to a loxP adaptor, and circularization may be mediated by Cre recombinase reaction at about 4-35 ℃, see e.g., US 6,465,254, which is incorporated herein by reference.

In embodiments, the circular polynucleotide is about 100 to about 1000 nucleotides in length, about 100 to about 300 nucleotides in length, about 300 to about 500 nucleotides in length, or about 500 to about 1000 nucleotides in length. In embodiments, the circular polynucleotide is about 300 to about 600 nucleotides in length. In embodiments, the circular polynucleotides are about 100-1000 nucleotides, about 150-950 nucleotides, about 200-900 nucleotides, about 250-850 nucleotides, about 300-800 nucleotides, about 350-750 nucleotides, about 400-700 nucleotides, or about 450-650 nucleotides in length. In embodiments, the cyclic polynucleotide molecule is about 100-1000 nucleotides in length. In embodiments, the cyclic polynucleotide molecule is about 100-300 nucleotides in length. In embodiments, the cyclic polynucleotide molecule is about 300-500 nucleotides in length. In embodiments, the cyclic polynucleotide molecule is about 500-1000 nucleotides in length. In an embodiment, the cyclic polynucleotide molecule is about 100 nucleotides in length. In an embodiment, the circular polynucleotide molecule is about 300 nucleotides in length. In an embodiment, the cyclic polynucleotide molecule is about 500 nucleotides in length. In an embodiment, the circular polynucleotide molecule is about 1000 nucleotides in length. The circular polynucleotides may be conveniently isolated by conventional purification columns, digestion of non-circular DNA by one or more suitable exonucleases, or both.

In embodiments, the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both is about 1 to about 100 nucleotides from the fusion junction. In embodiments, the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both, is about 5 to about 100 nucleotides from the fusion junction. In embodiments, the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both is about 10 to about 100 nucleotides from the fusion junction. In embodiments, the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both, is about 25 to about 100 nucleotides from the fusion junction. In embodiments, the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both, is about 50 to about 100 nucleotides from the fusion junction. In embodiments, the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both is about 75 to about 100 nucleotides from the fusion junction. In embodiments, the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both is about 1, about 5, about 10, about 25, about 50, about 75, or about 100 nucleotides from the fusion junction. In an embodiment, the sequence that specifically hybridizes to the first primer and the sequence that specifically hybridizes to the blocking element do not overlap. In embodiments, the sequence that specifically hybridizes to the first primer and the sequence that specifically hybridizes to the blocking element are about 5, about 10, or about 20 nucleotides apart. In an embodiment, the sequence that specifically binds to the blocking element and the sequence that specifically hybridizes to the first primer are about the same distance from the fusion junction. In an embodiment, the sequence that specifically binds to the blocking element and the sequence that specifically hybridizes to the first primer are at different distances from the fusion junction.

In embodiments, the sequence that specifically hybridizes to the first primer is about 1 to about 50 nucleotides apart from the sequence that is complementary to the sequence that specifically hybridizes to the second primer. In embodiments, the sequence that specifically hybridizes to the first primer is about 5 to about 50 nucleotides apart from the sequence that is complementary to the sequence that specifically hybridizes to the second primer. In embodiments, the sequence that specifically hybridizes to the first primer is about 10 to about 50 nucleotides apart from the sequence that is complementary to the sequence that specifically hybridizes to the second primer. In embodiments, the sequence that specifically hybridizes to the first primer is about 20 to about 50 nucleotides apart from the sequence that is complementary to the sequence that specifically hybridizes to the second primer. In embodiments, the sequence that specifically hybridizes to the first primer is about 30 to about 50 nucleotides apart from the sequence that is complementary to the sequence that specifically hybridizes to the second primer. In embodiments, the sequence that specifically hybridizes to the first primer is about 40 to about 50 nucleotides apart from the sequence that is complementary to the sequence that specifically hybridizes to the second primer. In embodiments, the sequence that specifically hybridizes to the first primer is separated by about 1, about 5, about 10, about 20, about 30, about 40, or about 50 nucleotides from the sequence that is complementary to the sequence that specifically hybridizes to the second primer.

In an embodiment, the sequence that specifically hybridizes to the first primer and the sequence that is complementary to the sequence that specifically hybridizes to the second primer are located within the same exon of the target gene. In an embodiment, the sequence that specifically hybridizes to the first primer and the sequence that is complementary to the sequence that specifically hybridizes to the second primer are located within different exons of the target gene. In an embodiment, the sequence that specifically hybridizes to the first primer and the sequence that is complementary to the sequence that specifically hybridizes to the second primer are adjacent exons of the target gene. Specific hybridization distinguishes non-specific hybridization interactions (e.g., two nucleic acids that are not configured for specific hybridization, e.g., two nucleic acids that are 80% or less, 70% or less, 60% or less, or 50% or less) by about 2-fold or more, typically about 10-fold or more, and sometimes about 100-fold or more, 1000-fold or more, 10,000-fold or more, 100,000-fold or more, or 1,000,000-fold or more. Two strands of nucleic acid hybridized to each other may form a duplex comprising a double-stranded portion of nucleic acid.

In embodiments, the linear nucleic acid molecule is a single stranded nucleic acid molecule. In embodiments, the linear nucleic acid molecule is a double stranded nucleic acid molecule. In embodiments, the method comprises less than 200ng of linear nucleic acid molecules. In embodiments, the method comprises less than 100ng of linear nucleic acid molecules. In embodiments, the method comprises less than 50ng of a linear nucleic acid molecule. In embodiments, the method comprises less than 20ng of linear nucleic acid molecules. In embodiments, the method comprises less than 10ng of linear nucleic acid molecules. In embodiments, the method comprises about 200ng of the linear nucleic acid molecule. In embodiments, the method comprises about 100ng of a linear nucleic acid molecule. In embodiments, the method comprises about 50ng of a linear nucleic acid molecule. In embodiments, the method comprises about 20ng of a linear nucleic acid molecule. In embodiments, the method comprises about 10ng of a linear nucleic acid molecule.

In some embodiments, the double-stranded nucleic acid comprises two complementary strands of nucleic acid. In certain embodiments, the double-stranded nucleic acid comprises a first strand and a second strand that are complementary or substantially complementary to each other. The first strand of double-stranded nucleic acid is sometimes referred to herein as the forward strand, and the second strand of double-stranded nucleic acid is sometimes referred to herein as the reverse strand. In some embodiments, the double stranded nucleic acid comprises two opposite ends. Thus, a double stranded nucleic acid typically comprises a first end and a second end. The ends of the double stranded nucleic acids may comprise 5 'overhangs, 3' overhangs or blunt ends. In some embodiments, one or both ends of the double stranded nucleic acid are blunt ends. In certain embodiments, one or both ends of the double stranded nucleic acid are manipulated using a suitable method to comprise a 5 'overhang, a 3' overhang, or a blunt end. In some embodiments, one or both ends of the double stranded nucleic acid are manipulated during library preparation such that one or both ends of the double stranded nucleic acid are configured for ligation to adaptors using a suitable method. For example, one or both ends of the double stranded nucleic acid may be digested with a restriction enzyme, polished, end repaired, filled, phosphorylated (e.g., by addition of a 5' -phosphate), dT-tailed, dA-tailed, or the like, or a combination thereof.

In embodiments, (i) the first primer comprises a 5' sequence that does not hybridize under amplification conditions to the first strand of the first region; and/or (ii) the second primer comprises a 5' sequence that does not hybridize under amplification conditions to the complement of the first strand of the first region. In embodiments, (i) the first primer comprises a 5' sequence that does not hybridize under amplification conditions to the first strand of the first region; and (ii) the second primer comprises a 5' sequence that does not hybridize under amplification conditions to the complement of the first strand of the first region. In embodiments, (i) the first primer comprises a 5' sequence that does not hybridize under amplification conditions to the first strand of the first region; or (ii) the second primer comprises a 5' sequence that does not hybridize under amplification conditions to the complement of the first strand of the first region. In some embodiments, the 5' sequence of the first primer that does not hybridize to the first strand of the first region comprises a primer binding site for secondary amplification. In some embodiments, the 5' sequence of the first primer that does not hybridize to the first strand of the first region comprises a first sequencing adapter for clustering templates on a flow cell. In some embodiments, the 5' sequence of the first primer that does not hybridize to the first strand of the first region comprises a sample barcode. In some embodiments, the 5' sequence of the second primer that does not hybridize to the complement of the first strand of the first region comprises a primer binding site for secondary amplification. In some embodiments, the 5' sequence of the second primer that does not hybridize to the first strand of the first region comprises a second sequencing adapter for clustering templates on the flow cell. In some embodiments, the 5' sequence of the second primer that does not hybridize to the complement of the first strand of the first region comprises a sample barcode.

In embodiments, (i) the amplification reaction further comprises a second blocking element that inhibits polymerase extension along the sequence to which it binds, and (ii) the first region comprises a first strand comprising, from 5 'to 3', a sequence complementary to the sequence that specifically hybridizes to the second primer, and a sequence complementary to the sequence to which the second blocking element specifically binds. In embodiments, the sequence complementary to the sequence that specifically hybridizes to the second primer is about 100 to about 300 nucleotides apart from the sequence complementary to the sequence that specifically binds to the second blocking element. In embodiments, the sequence complementary to the sequence that specifically hybridizes to the second primer is about 100 to about 200 nucleotides apart from the sequence complementary to the sequence that specifically binds to the second blocking element. In embodiments, the sequence complementary to the sequence that specifically hybridizes to the second primer is about 100 to about 150 nucleotides apart from the sequence complementary to the sequence that specifically binds to the second blocking element. In embodiments, the sequence complementary to the sequence that specifically hybridizes to the second primer is about 100, about 150, about 200, or about 300 nucleotides apart from the sequence complementary to the sequence that specifically binds to the second blocking element.

In an embodiment, the method further comprises: iv) amplifying the one or more non-fused circular template polynucleotides to produce a third amount of non-fused polynucleotide amplification products; and amplifying the one or more fusion circular template polynucleotides to produce a fourth quantity of fusion polynucleotide amplification products, wherein the third quantity and the fourth quantity are substantially the same. In embodiments, amplifying the one or more non-fused circular template polynucleotides comprises hybridizing a third primer and a fourth primer to the one or more non-fused circular template polynucleotides and extending both primers with a polymerase, and wherein amplifying the one or more fused circular template polynucleotides comprises hybridizing a third primer and a fourth primer to the one or more fused circular template polynucleotides and extending both primers with a polymerase. In embodiments, the third primer hybridizes upstream (e.g., in the 5 'direction) and the fourth primer hybridizes downstream (e.g., in the 3' direction) of the target sequence, wherein the target sequence comprises a single nucleotide variant, an insertion, a deletion, an internal tandem repeat, or a copy number variant. In embodiments, the target sequence comprises one or more single nucleotide variants, one or more insertions, one or more deletions, one or more internal tandem repeats, and/or one or more copy number variants. In an embodiment, the method further comprises repeating steps ii), iii) and iv).

In an embodiment, amplification of the circularized or linear polynucleotide comprises a plurality of cycles comprising the steps of primer hybridization, primer extension and denaturation in the presence of a first primer, blocking element and second primer. While each cycle will contain each of these three events (hybridization, extension, and denaturation), the events within a cycle may or may not be discrete. For example, each step may have different reagents and/or reaction conditions (e.g., temperature). Alternatively, some steps may be performed without changing the reaction conditions. For example, the extension may be performed under the same conditions (e.g., the same temperature) as the hybridization. After extension, the conditions are changed to start a new cycle with a new denaturation step, thereby amplifying the polynucleotide. Primer extension products from the early cycle can serve as templates for the late amplification cycle. In an embodiment, the plurality of cycles is from about 5 to about 50 cycles. In an embodiment, the plurality of cycles is from about 10 to about 45 cycles. In an embodiment, the plurality of cycles is from about 10 to about 20 cycles. In an embodiment, the plurality of cycles is about 20 to about 30 cycles. In an embodiment, the plurality of cycles is 10 to 45 cycles. In an embodiment, the plurality of cycles is 10 to 20 cycles. In an embodiment, the plurality of cycles is 20 to 30 cycles. In an embodiment, the plurality of cycles is from about 10 to about 45 cycles. In an embodiment, the plurality of cycles is about 20 to about 30 cycles.

In embodiments, amplifying comprises exponentially amplifying circular template polynucleotides comprising fusion junctions. In embodiments, the amplification comprises exponential rolling circle amplification (eRCA). The exponential RCA is similar to the linear process, except that it uses a second primer having the same sequence as at least a portion of the circular template (Lizardi et al, nature genet., 19:225 (1998)). The double primer system realizes isothermal and exponential amplification. Exponential RCA has been applied to the amplification of acyclic DNA by using linear probes that bind to successive regions of target DNA at both ends thereof, followed by circularization using DNA ligase (Nilsson et al Science 265 5181:208 5 (1994)). In an embodiment, the amplification comprises Hyperbranched Rolling Circle Amplification (HRCA). Hyperbranched RCA uses a second primer complementary to the first amplification product. This allows replication of the product by a strand displacement mechanism, which can produce a dramatic amplification in isothermal reactions (Lage et al, genome Research, 13:294-307 (2003), which is incorporated herein by reference in its entirety).

In embodiments, methods for amplification include, but are not limited to, polymerase Chain Reaction (PCR), strand Displacement Amplification (SDA), transcription Mediated Amplification (TMA), and Nucleic Acid Sequence Based Amplification (NASBA), e.g., as described in U.S. patent No. 8,003,354, which is incorporated herein by reference in its entirety. The amplification methods described above may be used to amplify one or more nucleic acids of interest. For example, PCR, multiplex PCR, SDA, TMA, NASBA, and the like can be used to amplify immobilized nucleic acid fragments resulting from the first amplification method of the two-step methods described herein.

In embodiments, the amplifying comprises bridge amplification; such as described, for example, by U.S. Pat. nos. 5,641,658; 7,115,400; 7,790,418; the disclosure of U.S. patent publication No. 2008/0009420 is listed, each of which is incorporated herein by reference in its entirety. Typically, bridge amplification uses repeated steps of primer annealing to the template, primer extension, and separation of the extended primer from the template. Because the forward primer and the reverse primer are attached to the solid support, the extension product released upon separation from the initial template is also attached to the solid support. The two chains are preferably immobilized on a solid support at the 5' end by covalent attachment. The 3' end of the amplified product is then allowed to anneal to the nearby reverse primer, thereby forming a "bridge" structure. The reverse primer is then extended to produce an additional template molecule that can form another bridge. During bridge PCR, additional chemical additives may be included in the reaction mixture, wherein the DNA strands are denatured by a flow denaturant on the DNA, thereby chemically denaturing the complementary strands. The denaturing agent is then washed out and the polymerase is reintroduced under buffer conditions that allow the primer to anneal and extend.

In an embodiment, the amplification comprises thermal bridge polymerase chain reaction (t-bPCR) amplification. In an embodiment, t-bPCR amplification comprises incubation in an additive that reduces the denaturation temperature of the DNA. In embodiments, the additive is betaine, dimethyl sulfoxide (DMSO), ethylene glycol, formamide, glycerol, guanidine thiocyanate, 4-methylmorpholine-4-oxide (NMO), or a mixture thereof. In embodiments, the additive is betaine, DMSO, ethylene glycol, or mixtures thereof. In embodiments, the additive is betaine, DMSO, or ethylene glycol.

In an embodiment, the amplification comprises chemical bridge polymerase chain reaction (c-bPCR) amplification. In an embodiment, the c-bPCR amplification comprises denaturation using a chemical denaturant. In embodiments, the c-bPCR amplification comprises denaturation using acetic acid, hydrochloric acid, nitric acid, formamide, guanidine, sodium salicylate, sodium hydroxide, dimethyl sulfoxide (DMSO), propylene glycol, urea, or mixtures thereof. In an embodiment, the chemical denaturant is sodium hydroxide or formamide. Chemical bridge polymerase chain reaction involves fluid circulation of a denaturing agent (e.g., formamide) and maintaining the temperature within a narrow temperature range (e.g., +/-5 ℃). In contrast, a thermal bridge polymerase chain reaction comprises a thermal cycle between a high temperature (e.g., 85 ℃ to 95 ℃) and a low temperature (e.g., 60 ℃ to 70 ℃). The thermal bridge polymerase chain reaction may also contain denaturing agents, typically at a much lower concentration than conventional chemical bridge polymerase chain reactions.

In an embodiment, the amplification comprises a fluidic cycle between an extension mixture comprising a polymerase and dntps and a chemical denaturant. In embodiments, the polymerase is a strand displacement polymerase or a non-strand displacement polymerase. In an embodiment, the solution is thermally cycled between about 40 ℃ and about 65 ℃ during fluid circulation of the extension mixture and the chemical denaturant. For example, the extension cycle is maintained at a temperature of 55 ℃ to 65 ℃ and then the denaturation cycle is maintained at a temperature of 40 ℃ to 65 ℃, or the temperature of the denaturation step begins at 60 ℃ to 65 ℃ and drops to 40 ℃ prior to exchanging reagents. In an embodiment, amplifying comprises adjusting the reaction temperature before starting the next cycle. In embodiments, the denaturation cycle and/or extension cycle is maintained at a temperature for a sufficient time and the temperature is adjusted (e.g., increased relative to the starting temperature or decreased relative to the starting temperature) before starting the next cycle. In an embodiment, the denaturation cycle is carried out at a temperature of 60-65 ℃ for about 5-45 seconds, and then the temperature is reduced (e.g., to about 40 ℃) prior to the initiation of the extension cycle (i.e., prior to the introduction of the extension mixture). When the amplicon is exposed to conditions that promote hybridization, the reduced temperature facilitates primer hybridization in a subsequent step, even in the presence of a chemical denaturant. In embodiments, the extension cycle is performed at a temperature of 50 ℃ to 60 ℃ for about 0.5 to 2 minutes, then the temperature is raised (e.g., to between about 60 ℃ and about 70 ℃, or between about 65 ℃ and about 72 ℃) after the extension mixture is introduced. In embodiments, the cycling between the extension mixture and the chemical denaturant is performed at least 5 times, at least 10 times, at least 20 times, at least 30 times, at least 40 times, at least 50 times, at least 75 times, at least 100 times, or at least 200 times. In embodiments, the cycle between the extension mixture and the chemical denaturant is performed about 5 times, about 10 times, about 20 times, about 30 times, about 40 times, about 50 times, about 75 times, about 100 times, or about 200 times. In embodiments, the cycle between the extension mixture and the chemical denaturant is performed a total of 5, 10, 20, 30, 40, 50, 75, 100, 200, or more times. In an embodiment, the fluid circulation is performed in the presence of about 2 to about 15mM Mg2+. In embodiments, the fluid circulation is performed in the presence of about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, or about 15mm mg2+.

In embodiments, detecting the fusion amplification product comprises detecting (e.g., quantifying) the length of the fusion amplification product, detecting one or more probes bound to the fusion amplification product, or sequencing the fusion amplification product. In an embodiment, detecting the fusion amplification product comprises sequencing the fusion amplification product to generate a sequencing read. In an embodiment, detecting the fusion amplification product comprises sequencing the fusion amplification product to generate a sequencing read. In an embodiment, detecting the fusion amplification product comprises sequencing the fusion amplification product to generate a sequencing read.

In embodiments, the method comprises detecting a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product. In an embodiment, the method comprises: detecting the length of the non-fusion polynucleotide amplification product and the length of the fusion polynucleotide amplification product; detecting one or more probes bound to the non-fusion polynucleotide amplification product and the fusion polynucleotide amplification product; or sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product.

In embodiments, sequencing comprises hybridizing one or more sequencing primers to the fusion amplification product and extending the one or more sequencing primers (e.g., extending the one or more sequencing primers with modified, labeled nucleotides and detecting incorporation of the modified, labeled nucleotides).

In embodiments, sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product produces one or more sequencing reads. In embodiments, the method further comprises aligning the substring of one or more sequencing reads with a reference sequence and quantifying the number of sequencing reads of the circular template polynucleotide comprising the fusion junction. In embodiments, the method further comprises aligning the substring of the one or more sequencing reads with a reference sequence, quantifying the number of sequencing reads of the fusion gene circular template polynucleotide, wherein quantifying comprises aligning the substring of the sequencing reads with the reference sequence. In embodiments, the method further comprises aligning the one or more sequencing reads to a reference sequence.

In an embodiment, the method comprises comparing the k-mer substring of one or more sequencing reads to a k-mer table of a fusion gene reference. In embodiments, the method comprises quantifying the number of k-mer substrings shared (i.e., measured and/or detected) between a sequencing read and a fusion gene reference. In an embodiment, the method comprises: (i) Grouping one or more sequencing reads based on the barcode sequence and/or the sequence comprising the fusion splice site; and (ii) within the set, aligning the reads and forming a consensus sequence of reads having the same barcode sequence and/or a sequence comprising a fusion junction. In embodiments, sequencing further comprises generating sequencing reads that span the circularized junction formed between the 5 'and 3' ends of the linear nucleic acid molecule, and quantifying the number of different circularized junction sequences containing the fusion gene (fusion gene circular template polynucleotides).

In embodiments, sequencing comprises sequencing by synthesis, sequencing by binding, sequencing by hybridization, sequencing by ligation, or sequencing by pyrophosphate. A variety of sequencing methods may be used, such as Sequencing By Synthesis (SBS), pyrosequencing, sequencing By Ligation (SBL), or Sequencing By Hybridization (SBH). Pyrosequencing detects the release of inorganic pyrophosphate (PPi) because specific nucleotides are incorporated into nascent nucleic acid strands (Ronaghi et al, analytical biochemistry (Analytical Biochemistry), 242 (1), 84-9 (1996), ronaghi, genome research, 11 (1), 3-11 (2001), ronaghi et al, science, 281 (5375), 363 (1998), U.S. Pat. No. 6,210,891, no. 6,258,568, and No. 6,274,320, each of which is incorporated herein by reference in its entirety). In pyrosequencing, the released PPi can be detected by conversion of ATP sulfurylase to Adenosine Triphosphate (ATP), and the level of ATP produced can be detected by light produced by luciferase. In this way, the sequencing reaction may be monitored by a luminescence detection system. In both SBL and SBH methods, repeated cycles of oligonucleotide delivery and detection are performed on target nucleic acids and their amplicons present at features of the array. SBL methods, including Shendure et al, science 309:1728-1732 (2005); U.S. patent No. 5,599,675; and the method described in U.S. Pat. No. 5,750,341, each of which is incorporated herein by reference in its entirety; and SBH methods as described in Bains et al, journal of theory biology (Journal of TheoreticalBiology), 135 (3), 303-7 (1988); drmanac et al, nature Biotechnology (Nature Biotechnology), 16,54-58 (1998); fodor et al science 251 (4995), 767-773 (1995); and WO 1989/10977, each of which is incorporated herein by reference in its entirety.

In SBS, the extension of a nucleic acid primer along a nucleic acid template is monitored to determine the nucleotide sequence in the template. The underlying chemical process may be catalyzed by a polymerase in which fluorescently labeled nucleotides are added to the primer (and thereby extend the primer) in a template-dependent manner, such that detection of the order and type of nucleotides added to the primer can be used to determine the sequence of the template. A plurality of different nucleic acid fragments that have been ligated at different locations in an array may undergo SBS techniques under specific conditions, where events that occur for different templates are distinguishable due to their location in the array. In embodiments, the sequencing step comprises annealing and extending the sequencing primer to incorporate a detectable label indicative of the identity of the nucleotide in the target polynucleotide, detecting the detectable label, and repeating the extending and detecting of the steps. In embodiments, the methods comprise sequencing one or more bases of a target nucleic acid by extending a sequencing primer that hybridizes to the target nucleic acid (e.g., an amplification product produced by an amplification method described herein). In an embodiment, the sequencing step may be accomplished by a Sequencing By Synthesis (SBS) process. In an embodiment, sequencing comprises sequencing by a synthetic process, wherein individual nucleotides are iteratively identified as they polymerize to form a growing complementary strand. In an embodiment, the nucleotides added to the growing complementary strand comprise both a tag and a reversible chain terminator that prevents further extension, such that the nucleotides can be identified by the tag before the terminator is removed to add and identify another nucleotide. Such reversible chain terminators comprise a removable 3' blocking group, for example as described in U.S. publication nos. 7,541,444, 7,057,026 and 10,738,072. Once such modified nucleotides have been incorporated into the growing polynucleotide strand complementary to the region of the template being sequenced, no free 3' -OH groups are available to direct additional sequence extension and thus no additional nucleotides can be added by the polymerase. Once the identity of the bases incorporated into the growing chain has been determined, the 3' block can be removed to allow the addition of the next consecutive nucleotide. By ordering products derived using these modified nucleotides, it is possible to infer the DNA sequence of the DNA template. Sequencing can be performed using any suitable Sequencing By Synthesis (SBS) technique in which modified nucleotides are added in succession to the free 3' hydroxyl groups, which are typically initially provided by sequencing primers, resulting in synthesis of the polynucleotide strand in the 5' to 3' direction. In embodiments, sequencing comprises detecting a signal sequence. In an embodiment, sequencing comprises extending the sequencing primer with labeled nucleotides. Examples of sequencing include, but are not limited to, sequencing By Synthesis (SBS) processes in which reversibly terminated fluorescent dye-carrying nucleotides are incorporated into a growing strand that is complementary to a target strand being sequenced. In an embodiment, the nucleotides are labeled with up to four unique fluorescent dyes. In embodiments, the nucleotides are labeled with at least two unique fluorescent dyes. In an embodiment, the readout is done by epifluorescence imaging. Non-limiting examples of suitable tags are described in the following: U.S. patent No. 8,178,360; U.S. Pat. No. 5,188,934 (4, 7-dichlorofluorescein dye); U.S. Pat. No. 5,366,860 (spectrally resolvable rhodamine dyes); U.S. Pat. No. 5,847,162 (4, 7-dichloro rhodamine dye); U.S. patent No. 4,318,846 (ether substituted fluorescein dye); U.S. patent No. 5,800,996 (energy transfer dye); U.S. patent No. 5,066,580 (xanthene dye); U.S. patent No. 5,688,648 (energy transfer dye); etc.

In embodiments, generating the first sequencing read or the second sequencing read comprises sequencing by combination (see, e.g., U.S. patent publications US2017/0022553 and US2019/0048404, each of which is incorporated herein by reference in its entirety). As used herein, "binding sequencing" refers to a sequencing technique in which specific binding of a polymerase and homologous nucleotides to a primed template nucleic acid molecule (e.g., a blocked primed template nucleic acid molecule) is used to identify the next correct nucleotide to be incorporated into the primer strand of the primed template nucleic acid molecule. Specific binding interactions do not require nucleotide chemistry to be incorporated into the primer. In some embodiments, the specific binding interactions may be incorporated into the primer strand prior to the nucleotide chemistry, or may be incorporated into the primer prior to a similar next correct nucleotide chemistry. Thus, detection of the next erroneous nucleotide can be performed without incorporating the next correct nucleotide. As used herein, the "next correct nucleotide" (sometimes referred to as a "homologous" nucleotide) is a nucleotide having a base complementary to the base of the next template nucleotide. The next correct nucleotide will hybridize at the 3' end of the primer to complement the next template nucleotide. The next correct nucleotide may, but need not, be capable of being incorporated at the 3' end of the primer. For example, the next correct nucleotide may be a member of a ternary complex that will complete the incorporation reaction, or alternatively, the next correct nucleotide may be a member of a stable ternary complex that does not catalyze the incorporation reaction. Nucleotides having bases that are not complementary to the next template base are referred to as "incorrect" (or "non-homologous") nucleotides.

The use of the sequencing methods outlined above is a non-limiting example, as essentially any sequencing method that relies on nucleotide continuous incorporation into a polynucleotide strand can be used. Suitable alternative techniques include, for example, pyrosequencing methods, fiseq (fluorescence in situ sequencing), MPSS (large scale parallel tag sequencing), or ligation-based sequencing methods.

In embodiments, sequencing comprises multiple sequencing cycles. In embodiments, the sequencing cycle comprises extending the complementary polynucleotide by incorporating a first nucleotide using a polymerase, wherein the polynucleotide hybridizes to the template nucleic acid, thereby detecting the first nucleotide and identifying the first nucleotide. In an embodiment, to begin the sequencing cycle, one or more differently labeled nucleotides and a DNA polymerase may be introduced. After the nucleotides are added, the resulting signal can be detected (e.g., by excitation and emission of a detectable label) to determine the identity of the incorporated nucleotide (based on the label on the nucleotide). Reagents may then be added to remove the 3' reversible terminator and remove the tag from each incorporated base. Reagents, enzymes and other materials can be removed from between steps by washing. Cycling may involve repeating these steps and reading the sequence of each cluster in multiple iterations. In an embodiment, the reads generated by sequencing are greater than 25bp in read length. In an embodiment, the reads generated by sequencing are greater than 50bp in read length. In an embodiment, the reads produced by sequencing are greater than 75bp in read length. In an embodiment, the reads generated by sequencing are greater than 100bp in read length. In an embodiment, the reads generated by sequencing are greater than 150bp in read length. In embodiments, generating a sequencing read comprises determining the identity of a nucleotide in a template polynucleotide.

In an embodiment, the sequencing method relies on the use of modified nucleotides that can act as reversible terminators. Once the modified nucleotide has been incorporated into the growing polynucleotide strand complementary to the region of the template being sequenced, no free 3' -OH groups are available to direct additional sequence extension and therefore no additional nucleotide can be added by the polymerase. Once the identity of the bases incorporated into the growing chain has been determined, the 3' reversible terminator end can be removed to allow the addition of the next consecutive nucleotide. These reactions can be performed in a single experiment if each modified nucleotide is attached with a different label known to correspond to a particular base in order to distinguish between the bases added in each incorporation step. Alternatively, separate reactions may be performed to contain each modified nucleotide separately.

The modified nucleotide may carry a label (e.g., a fluorescent label) to facilitate its detection. Each nucleotide type may carry a different fluorescent label. However, the detectable label need not be a fluorescent label. Any label that allows detection of the incorporated nucleotide may be used. A method for detecting fluorescently labeled nucleotides comprises using a laser of a wavelength specific to the labeled nucleotides, or using other suitable illumination sources. Fluorescence from the label on the nucleotide may be detected (e.g., by a CCD camera or other suitable detection means).

In embodiments, a method of sequencing a nucleic acid comprises extending a complementary polynucleotide (e.g., a primer) hybridized to a nucleic acid by incorporating a first nucleotide (e.g., a modified, labeled nucleotide). In embodiments, the method comprises a buffer exchange or wash step. In an embodiment, a method of sequencing a nucleic acid comprises a sequencing solution. The sequencing solution comprises (a) adenine nucleotides or analogs thereof; (b) (i) a thymine nucleotide or analogue thereof, or (ii) a uracil nucleotide or analogue thereof; (c) a cytosine nucleotide or analog thereof; and (d) guanine nucleotide or analog thereof.

In an embodiment, sequencing comprises extending the sequencing primer by incorporating labeled nucleotides or labeled nucleotide analogs, and detecting the label to generate a signal for each incorporated nucleotide or nucleotide analog, wherein the sequencing primer hybridizes to one of the fusion amplification products.

In an embodiment, detecting the fusion amplification product comprises aligning the substring of each sequencing read with a reference sequence and quantifying the number of aligned sequencing reads of the fusion gene circular template polynucleotide.

In an embodiment, detecting the fusion amplification product comprises comparing the k-mer substring of each sequencing read to a k-mer table of fusion junction references, and quantifying the number of k-mers shared between the sequencing reads and the fusion junction references. The term "fusion junction reference" refers to a collection of previously detected fusion sequences involving the one or more genes of interest.

In an embodiment, detecting the fusion amplification product comprises: (i) Grouping sequencing reads based on the barcode sequence and/or the sequence comprising the fusion splice site; and (ii) within each group, aligning the reads and forming a consensus sequence of reads having the same barcode sequence and/or sequence comprising fusion junctions.

In embodiments, sequencing further comprises generating sequencing reads comprising circularized junctions formed between the 5 'and 3' ends of the linear nucleic acid molecules, and quantifying the number of different circularized junction sequences comprising the fusion junctions. In embodiments, sequencing further comprises generating sequencing reads comprising circularized junctions formed between the 5 'and 3' ends of the linear nucleic acid molecules, and quantifying the number of different circularized junction sequences comprising the fusion junctions.

In embodiments, the method further comprises quantifying the fusion amplification product. The molecular count of the fusion amplification product may be used for diagnostic purposes. As described herein, polynucleotides containing fusions are preferably amplified, enabling accurate quantification over large background levels. Conventional bioinformatics analysis can be used to quantify fusion amplification products. In some embodiments, the bioinformatic analysis may involve counting the number of unique circularized junctions associated with a particular fusion amplification product. In other embodiments, quantification of the fusion amplification product is achieved by comparing the number of sequencing reads or circularized junctions corresponding to the fusion amplification product to the number of controls (e.g., spikes in controls) that are present in a predetermined number of template copies. In still other embodiments, quantification may be performed by qPCR or semi-quantitative PCR.

In embodiments, the one or more linear nucleic acid molecules are derived from a sample of the subject, optionally wherein the sample is an FFPE sample. In an example, FFPE samples were incubated with xylene and washed with ethanol to remove embedded wax, followed by treatment with proteinase K to permeabilize the tissue. In embodiments, the one or more linear nucleic acid molecules are derived from a liquid biopsy (e.g., plasma).

In embodiments, the polynucleotide fusion is a biomarker for cancer, autoimmune disease, primary immunodeficiency, or infectious disease. In embodiments, the polynucleotide fusion is a biomarker for cancer. In embodiments, the polynucleotide fusion is a biomarker for lymphoid malignancies. In embodiments, the polynucleotide fusion is a biomarker of primary immunodeficiency. In embodiments, the polynucleotide fusion is a biomarker for infectious disease. A "biomarker" is a substance associated with a particular property, such as a disease or condition. The change in biomarker levels may be associated with the risk or progression of the disease or the susceptibility of the disease to a given treatment.

In embodiments, the fusion gene causes a disease in a subject in whom the fusion gene is found. In embodiments, the fusion gene is associated with a disease. In embodiments, the disease is cancer, an autoimmune disease, a primary immunodeficiency, or an infectious disease. In some embodiments, the disease is an infectious disease, an autoimmune disease, a genetic disease, or cancer. In embodiments, the disease is an acute disease, a chronic disease (e.g., a disease that exists for more than 6 months), a idiopathic disease, or a syndrome (e.g., down's syndrome). In embodiments, the disease is a recurrent disease (e.g., a disease detectable after an undetectable period of time).

In embodiments, the infectious disease is a disease or disorder associated with infection from a pathogenic organism. In embodiments, the infectious disease is a. African sleeping disease (african trypanosomiasis), AIDS (acquired immunodeficiency syndrome), amebiasis, anaplasmosis, angiostromatosis, xenobiotic, anthrax, cryptosporidiosis, argentina hemorrhagic fever, ascariasis, aspergillosis, astrovirus infection, babesia, bacillus cereus infection, bacterial meningitis, bacterial pneumonia, bacterial vaginosis, bacteroides infection, pouchitis, bartonasis, belis ascariasis infection, BK virus infection, black rot, blastomycosis, bolivia hemorrhagic fever, botulism (and infant botulism), brazil hemorrhagic fever, brucellosis, black stills, burkholderia infection, bruise ulcers, calix virus infection (norovirus and saponaria variabilis), mycosis, candidiasis (white fungus disease; thrush), capillary nematodiasis, calicheasis, cat scratch disease, cellulitis, chagas disease (trypanosomiasis in the United states), chancre, varicella, chikungunya fever, chlamydia pneumoniae infection (taiwan acute respiratory pathogen or TWAR), cholera, blastomycosis, pot disease, clonorchiasis, clostridium difficile colitis, coccidioidomycosis, colorado Tick Fever (CTF), common cold (acute viral nasopharyngitis; acute rhinitis), 2019 coronavirus disease (COVID-19), creutzfeldt-Jakob disease (CJD), creutzfeldt-Jakob disease (CCHF), cryptococcosis, cryptosporidiosis, cryptosporidium, skin larval transfer (CLM), cyclosporine, cyst-tail, cytomegalovirus infection, dengue fever, chain-belt algae infection, binuclear amoeba, diphtheria, schizocephaliasis, melilosis, ebola hemorrhagic fever, echinococcosis, ehrlichiosis, enterobiasis (enterobiasis), enterococci infection, enterovirus infection, epidemic typhus, infectious erythema (fifth disease), infant eruption (sixth disease), fasciolopsis, gingiva, fatal Familial Insomnia (FFI), filariasis, food poisoning caused by clostridium perfringens, free living amoeba infection, clostridium infection, gas gangrene (clostridium necrosis), geotrichum, epidemic typhus, infectious erythema (fifth disease), infant eruption (sixth disease), fasciolopathy Gettman-Stlausle-Shen Kezeng syndrome (GSS), giardiasis, meliosis, jaw nematode disease, gonorrhea, inguinal granuloma (Du Nuofan disease), group A streptococcal infection, group B streptococcal infection, haemophilus influenzae infection, hand-foot-and-mouth disease (HFMD), hantavirus Pulmonary Syndrome (HPS), protoviral disease, helicobacter pylori infection, hemolytic Uremic Syndrome (HUS), hemorrhagic fever with renal syndrome (HFRS), hundla virus infection, hepatitis A, hepatitis B, hepatitis C, hepatitis B, hepatitis D, hepatitis E, herpes simplex, histoplasmosis, hookworm infection, human Bokavirus infection, human Ehrlichia disease, human Granulocytopenia (HGA), human metapneumovirus infection, human monocytic Epstein-Barr disease, human Papilloma Virus (HPV) infection, human parainfluenza virus infection, membranous taeniasis, epstein-Barr virus infectious mononucleosis (Mono), influenza (influenza), isospora, nakaki disease, keratitis, gold Geobacillus infection, kuru, laxafever, legionella (Legionella's disease), ponticke's fever, leishmaniasis, leprosy, leptospirosis, listeria, lyme (Leymbosch borreliosis), lyme filariasis (elephant's disease), lymphocytic choriomeningitis, malaria, marburg Hemorrhagic Fever (MHF), measles, middle East Respiratory Syndrome (MERS), melenoid (Whiter's disease), meningitis, meningococcal disease, postamblymatosis microsporidian, molluscum Contagiosum (MC), monkey pox, mumps, murine typhoid (typhoid), mycoplasma pneumonia, mycoplasma genitalium infection, podophyllosis, myiasis, neonatal conjunctivitis (neonate's eye), nippon virus infection, norovirus, variant Creutzfeldt-Jakob disease (vCJD, nvCJD), nocardia, onchocerciasis (river blindness), posttestosterone, paracoccidioidosis (southern metazosis), pneumonitis, pasteurella, head lice (head lice), body lice (body lice), pubic lice (ani, hair lice), pelvic Inflammatory Disease (PID), pertussis (tussilags), plague, pneumococcal infection, pneumoconiosis (PCP), pneumonia, poliomyelitis, prevotella infection, primary amenorrhea encephalitis (PAM), progressive multifocal leukoencephalopathy, psittacosis, Q fever, rabies, regressive fever, respiratory syncytial virus infection, rhinosporosis, rhinovirus infection, rickettsia pox, rift Valley Fever (RVF), chinesemetic fever (RMSF), rotavirus infection, rubella, salmonellosis, severe Acute Respiratory Syndrome (SARS), scabies, scarlet fever, schistosomiasis, septicemia, shigellosis (bacillary dysentery), shingles, smallpox, sporozoites, staphylococcal food poisoning, staphylococcal infection, round-wire disease, subacute sclerotic encephalitis, non-sexual syphilis, syphilis and yas taeniasis, tetanus (dental autism), contact-dyeing type sores (tinea barbae), tinea capitis (tinea capitis), tinea corporis (tinea corporis), tinea cruris, tinea manuum, tinea nigrum, tinea pedis, tinea unguium (onychomycosis), tinea versicolor (pityriasis versicolor), toxic Shock Syndrome (TSS), toxoplasmosis (ocular larva transitional syndrome (OLM)), toxoplasmosis (visceral larva transitional syndrome (VLM)), toxoplasmosis, trachoma, trichinosis, trichomoniasis, whipworm disease (whipworm infection), tuberculosis, tularemia, typhoid fever, ureaplasma urealyticum infection, valley fever, venezuelan equine encephalitis, wound vibrio vulnerae infection, vibrio parahaemolyticus enteritis, viral pneumonia, west nile fever, white hair sarcoidosis (white sores), yersinia pseudotuberculosis, yersinia, yellow fever, zis-bala, zika fever or binomiasis.

In embodiments, the disease is an autoimmune disease. In the case of an embodiment of the present invention, autoimmune diseases are arthritis, rheumatoid arthritis, psoriatic arthritis, juvenile idiopathic arthritis, multiple sclerosis, systemic Lupus Erythematosus (SLE), myasthenia gravis, juvenile onset diabetes, type 1 diabetes, guillain-Barre syndrome, hashimoto's encephalitis, hashimoto's thyroiditis, ankylosing spondylitis, psoriasis, sjogren's syndrome, vasculitis, glomerulonephritis, autoimmune thyroiditis Behcet's disease, crohn's disease, ulcerative colitis, bullous pemphigoid, sarcoidosis, ichthyosis, graves ' eye disease (Graves ophthalmopathy), inflammatory bowel disease, addison's disease, vitiligo, asthma, allergic asthma, acne vulgaris, celiac disease, chronic prostatitis, inflammatory bowel disease, pelvic inflammatory disease, reperfusion injury, ischemia-reperfusion injury, stroke, sarcoidosis, transplant rejection, interstitial cystitis, atherosclerosis, scleroderma or atopic dermatitis. In embodiments, the autoimmune disease is achalasia, addison's disease, adult Shi Dier disease (Addison's disease), human agammaglobulins, alopecia areata, amyloidosis, ankylosing spondylitis, anti-GBM/anti-TBM nephritis, antiphospholipid syndrome, autoimmune angioedema, autoimmune familial autonomic nerve abnormality, autoimmune encephalomyelitis, autoimmune hepatitis, autoimmune Inner Ear Disease (AIED), autoimmune myocarditis, autoimmune oophoritis, autoimmune orchitis, autoimmune pancreatitis, autoimmune retinopathy, autoimmune urticaria, axons and neuronal neuropathy (an), bal, bullous disease (Bal amese), white plug disease (Behcet's disease), benign pemphigomphosis, bullous disease, karst disease (Castleman disease, CD), disease, gazewal disease (chase), chronic demyelinating disease (chronic demyelinating disease) (chronic myelopathy), chronic myelogenous inflammation (chronic demyelinating disease) Allergic granulomatosis syndrome (CSS) or Eosinophilic Granulomatosis (EGPA), cicatricial pemphigus, crohn's syndrome (Cogan's syndrome), condensed set disease, congenital heart block, coxsackie viral myocarditis (Coxsackie myocarditis), CREST syndrome, crohn's disease, dermatitis herpetiformis, dermatomyositis, devickers disease (Devic's disease) (neuromyelitis), discoid lupus, deler's syndrome, endometriosis, eosinophilic esophagitis (EoE), eosinophilic fasciitis, erythema nodosum, mixed condensed globulinemia, evans syndrome (Evans syndrome), fibromyalgia, fibrotic inflammation, giant cell arteritis, giant cell myositis, glomerulonephritis, pneumococcal syndrome (Goodyear's disease), schodder's disease, multiple sclerosis-finger-haemopoiesis, graves-barren's disease, graves-barren's syndrome (Graves-Barre) disease, HSP), herpes gestation or Pemphigoid Gestation (PG), hidradenitis Suppurativa (HS) (acne), hypogammaglobulinemia, igA nephropathy, igG 4-related sclerosing diseases, immune Thrombocytopenic Purpura (ITP), inclusion Body Myositis (IBM), interstitial Cystitis (IC), and, juvenile arthritis, juvenile diabetes (type 1 diabetes), juvenile Myositis (JM), kawasaki disease (Kawasaki disease), lambert-Eaton syndrome (Lambert-Eaton syndrome), leukocyte-fragmenting vasculitis, lichen planus, lichen sclerosus, conjunctivitis, linear IgA disease (LAD), lupus, chronic lyme disease (Lyme disease chronic), meniere's disease, microscopic multiple vasculitis (MPA), mixed Connective Tissue Disease (MCTD), silkworm erosion keratoulcer (Mooren's ulcer), mu Haer disease (Mucha-Habermann disease), multifocal Motor Neuropathy (MMN) or ncb, multiple sclerosis, myasthenia gravis, myositis, narcolepsy, neonatal lupus, neuromyelitis, PR neutropenia, ocular cicatrix, optic neuritis, recurrent rheumatism (PANDAS), PANDAS, secondary tumors), and nocturnal degeneration (nocturnal) of the blood of the human eye (PNH), pampers Luo Zeng syndrome (Parry Romberg syndrome), parson-Turner syndrome (parson-Turner syndrome), pemphigus, peripheral neuropathy, perivenous encephalomyelitis, pernicious Anemia (PA), POEMS syndrome, polyarteritis nodosa, type I, II, III polyadenylic syndrome, polymyalgia rheumatica, polymyositis, post myocardial infarction syndrome, post pericardial opening syndrome, primary biliary cirrhosis, primary sclerosing cholangitis, progesterone dermatitis, psoriasis, psoriatic arthritis, pure red cell aplastic anemia (PRCA), pyoderma gangrene, raynaud's phenomenon, reactive arthritis, reflex neurotrophic malnutrition, recurrent polyarthritis, polyarteritis nodosa (RLS), retroperitoneal fibrosis, rheumatic arthritis, sarcoidosis, schmidkinetosyndt, schmidday syndrome, scleroderma, sympathocritic picornase, sclerodermasyndrome), sperm and testis autoimmunity, stiff Person Syndrome (SPS), subacute Bacterial Endocarditis (SBE), sosaxogram syndrome (Susac 'ssyndrome), sympathogenic Ophthalmitis (SO), aortic inflammation (Takayasu's arttis), temporal arteritis/giant cell arteritis, thrombocytopenic purpura (TTP), thyroiditis (TED), painful oculoplegia syndrome (tos-Hunt syndrome, THS), transverse myelitis, type 1 diabetes, ulcerative Colitis (UC), undifferentiated connective tissue Disease (uccd), uveitis, vasculitis, vitiligo or small Liu Yuantian Disease (Vogt-Koyanagi-Harada Disease).

In embodiments, the disease is a genetic disease. In embodiments, the genetic disorder is cystic fibrosis, alpha thalassemia, beta thalassemia, sickle cell anemia (sickle cell disease), marfansyndrome, fragile X syndrome, huntington's disease, or hemochromatosis.

In an embodiment, the amplification reaction further comprises: (a) One or more different first primers that specifically hybridize to different portions of the first strand of the first region; (b) For each different first primer, a different second primer that specifically hybridizes to a complement of a portion of the first strand of the first region, the complement being in a 3' position relative to the corresponding different first primer specific hybridization; and (c) for each different first primer, a different blocking oligonucleotide that specifically hybridizes to a portion of the first strand of the first region at a position of 5' relative to the specific hybridization of the different first primer.

In embodiments, the method further comprises detecting one or more different polynucleotide fusions, each different polynucleotide fusion comprising a fusion between a sequence of a different first region and a sequence fusion of a different second region at a different fusion junction, wherein the amplification reaction further comprises a corresponding first primer, a corresponding second primer, and a corresponding blocking oligonucleotide for each different first region.

In embodiments, the polynucleotide fusion comprises a sequence of a first region fused to a sequence of a second region at a fusion junction, wherein the fusion is between two gene sequences, referred to as a gene fusion. A fusion junction may represent a location where a first nucleotide sequence (e.g., a first gene sequence or gene fragment) meets or joins with a second nucleotide sequence (e.g., a second gene or gene fragment). In an embodiment, the polynucleotide fusion is a hybrid gene formed from two previously independent genes (or gene fragments). In some embodiments, the fusion junction is located between the sequence that specifically binds to the blocking element and the sequence that specifically binds to the first primer. In embodiments, the polynucleotide fusion comprises a gene fusion or a gene fragment of any of the foregoing fusions: AGTRAP-BRAF, AKAP9-BRAF, ATIC-ALK, CCDC6-RET, CD74-NRG1, CD74-ROS1, CEP89-BRAF, CLCN6-BRAF, DCTN1-ALK, EML4-ALK, EZR-ROS1, FAM131B-BRAF, FCHSD1-BRAF, GATM-BRAF, GNAI1-BRAF, GOLGA5-RET, GOPC-ROS1, HIP1-ALK, HOOK3-RET, KIF5B-ALK, KIF5B-RET, KTN1-RET, LRIG3-ROS1, LSM 14A-F MKRN1-BRAF, MSN-ALK, MYO5A-ROS1, NCOA4-RET, PCM1-RET, RANBP2-ALK, RELCH-RET, RNF130-BRAF, SDC4-ROS1, SLC34A2-ROS1, SLC3A2-NRG1, SLC45A3-BRAF, SQSTM1-ALK, STRN-ALK, TFG-ALK, TPM3-ROS1, TPR-ALK, TRIM24-BRAF, TRIM24-RET, TRIM27-RET, TRIM33-RET, VCL-ALK, WDCP-ALK, ZCCHC8-ROS1.

In embodiments the polynucleotide fusion comprises a gene fusion or a gene fragment of any of the foregoing fusions: ACSL3-ETV1, ACTB-GLI1, AGGAT 5-MCPH1, AGTRAP-BRAF, AKAP9-BRAF, ARID1A-MAST2, ATIC-ALK, BBS9-PKD1L1, BCR-JAK2, CBFA2T3-GLIS2, CCDC6-RET, CD74-NRG1, CD74-ROS1, CENPK-KMT2A, CEP-BRAF, CLCN6-BRAF, COL1A1-PDGFB, COL1A2-PLAG1, CRTC3-MAML2, DCTN1-ALK, DDX5-ETV4, DHH-RHEBL1 DNAJB1-PRKACA, EIF3E-RSPO2, EIF3K-CYP39A1, EML4-ALK, EPC1-PHF1, ETV6-ITPR2, ETV6-JAK2, ETV6-PDGFRB, ETV6-RUNX1, EZR-ERBB4, EZR-ROS1, FAM131B-BRAF, FBXL18-RNF216, FCHSD1-BRAF, FUS-ATF1, FUS-CREB3L2, FUS-FEV, GATM-BRAF, GMDS-PDE8B, GNAI-BRAF, GOLGA5-RET, GOPC-ROS1 HACL1-RAF1, HAS2-PLAG1, HIP1-ALK, HOOK3-RET, IL6R-ATP8B2, INTS4-GAB2, IRF2BP2-CDX1, JAZF1-PHF1, JAZF1-SUZ12, JPT1-USH1G, KIF B-ALK, KIF5B-RET, KLK2-ETV1, KLK2-ETV4, KMT2A-aBI1, KMT2A-aCTN4, KMT2A-aFF3, KMT2A-aFF4, KMT 2A-aRHGP 26, KMT2A-aRHGEF12, KMT2A-BTBD18 KMT2A-CASP8AP2, KMT2A-CBL, KMT2A-CEP170B, KMT2A-CIP2A, KMT A-CREBBP, KMT2A-EEFSEC, KMT2A-ELL, KMT2A-EP300, KMT2A-EPS15, KMT2A-FOXO4, KMT2A-FRYL, KMT2A-GAS7, KMT2A-GMPS, KMT2A-GPHN, KMT2A-KNL1, KMT2A-LASP1, KMT2A-LPP, KMT2A-MAPRE1, KMT2A-MLLT11, KMT2A-MLLT3, KMT2A-MLLT6, KMT2A-MYO1F, KMT A-NCKIPSD, KMT2A-NRIP3, KMT2A-PDS5A, KMT A-PICALM, KMT2A-SARNP, KMT2A-SH3GL1, KMT2A-TET1, KMT2A-ZFYVE19, KTN1-RET, LIFR-PLAG1, LRIG3-ROS1, LSM14A-BRAF, MBOAT2-PRKCE, MBTD1-CXorf67, MEAF6-PHF1, MKRNN 1-BRAF MN1-ETV6, MSN-ALK, MYO5A-ROS1, NAB2-STAT6, NCOA4-RET, NF1-ASIC2, NONO-TFE3, NOTCH1-GABBR2, NTN1-ACLY, NUP107-LGR5, NUP98-KDM5A, PAX-FOXO 1, PAX3-NCOA2, PAX5-JAK2, PAX7-FOXO1, PCM1-JAK2, PCM1-RET, PLA2R1-RBMS1, PLXND1-TMCC1, PML-RARA PRCC-TFE3, RANBP2-ALK, RBM14-PACS1, RELCH-RET, RNF130-BRAF, SDC4-ROS1, SEC16A-NOTCH1, SFPQ-TFE3, SLC26A6-PRKAR2A, SLC A2-ROS1, SLC3A2-NRG1, SLC45A3-BRAF, SLC45A3-ELK4, SLC45A3-ETV1, SLC45A3-ETV5, SND1-BRAF, SQSTM1-ALK, SRGAP3-RAF1, SS18-SSX1 SS18-SSX2, SS18-SSX4B, SS L1-SSX1, STRN-ALK, TADA2A-MAST1, TBL1XR1-TP63, TCEA1-PLAG1, TCF3-PBX1, TFG-ALK, TPM3-ROS1, TPR-ALK, TRIM24-BRAF, TRIM24-RET, TRIM27-RET, TRIM33-RET, VCL-ALK, WDCP-ALK, YWHAE-NUTM2A, YWHAE-NUTM2B, ZC H7B-BCOR, ZCCHC8-ROS1. In embodiments, the polynucleotide fusion comprises a sequence of a first region fused to a sequence of a second region at a fusion junction, wherein the first region and the second region comprise different genes. In embodiments, the polynucleotide fusion comprises a gene fusion of CREBBP-SRGAP2B, DNAH-IKZF 1, ETV6-SNUPN or ETV6-NUFIP 1. The genes described herein correspond to registered genes identified in the national center for biotechnology information catalogue of the national library of medicine, accessible www.ncbi.nlm.nih.gov/gene/. Alternatively, the gene may be a fusion gene found in a database of known fusion genes, such as ChimerDB, e.g., ye Eun Jang et al, nucleic acids research (Nucleic Acids Research), volume 48, D1, month 08, page D817-D824, or fusion GDB, such as Kim P and Zhou X, nucleic acids research, month 1, 8, 2019; 47 (D1) D994-D1004, each of which is incorporated herein by reference.

In embodiments, the polynucleotide fusion comprises a sequence of a first region fused to a sequence of a second region at a fusion junction, wherein the first region comprises an ABI1 gene or part thereof, an ACLY gene or part thereof, an ACSL3 gene or part thereof, an ACTB gene or part thereof, an ACTN4 gene or part thereof, an AFF3 gene or part thereof, an AFF4 gene or part thereof, an AGPT 5 gene or part thereof, an AKAP9 gene or part thereof, an ALK gene or part thereof, an ARHGAP26 gene or part thereof, an ARHGEF12 gene or part thereof, an ARID1A gene or part thereof, an ASIC2 gene or part thereof, an ATF1 gene or part thereof, an ATIC gene or part thereof, an ATP8B2 gene or part thereof, a BBS9 gene or part thereof, a BCOR gene or part thereof, a BRAF gene or part thereof, a BTBD18 gene or part thereof, a CASP8AP2 gene or part thereof, a CBFA2T3 gene or part thereof, a CBL gene or part thereof, a CCDC6 gene or part thereof, a CD74 gene or part thereof, a CDX1 gene or part thereof, a BCR gene or part thereof, a BRAF 2 gene or part thereof, a BRF 2T3 gene or part thereof, a BL 18 gene or part thereof CENPK gene or a part thereof, CEP170B gene or a part thereof, CEP89 gene or a part thereof, CIP2A gene or a part thereof, CLCN6 gene or a part thereof, COL1A1 gene or a part thereof, COL1A2 gene or a part thereof, CREB3L1 gene or a part thereof, CREBBP gene or a part thereof, CRTC3 gene or a part thereof, CXorf67 gene or a part thereof, CYP39A1 gene or a part thereof, DCTN1 gene or a part thereof, CREB3L2 gene or a part thereof, CRBP gene or a part thereof, CRTC3 gene or a part thereof, CXorf67 gene or a part thereof, CYP39A1 gene or a part thereof, DCTN1 gene or a part thereof, CRTN 1 gene or a part thereof, CREB 3A2 gene or a part thereof, CEL 1B gene or a part thereof, CEL DDX5 gene or a part thereof, DHH gene or a part thereof, DNAJB1 gene or a part thereof, EEFSEC gene or a part thereof, EIF3E gene or a part thereof, EIF3K gene or a part thereof, ELK4 gene or a part thereof, ELL gene or a part thereof, EML4 gene or a part thereof, EP300 gene or a part thereof, EPC1 gene or a part thereof, EPS15 gene or a part thereof, ERBB4 gene or a part thereof, ETV1 gene or a part thereof, ETV4 gene or a part thereof, E1E 2 gene 1E 2 or a part E or a part E or part E or part or portions, ETV5 gene or a part thereof, ETV6 gene or a part thereof, EZR gene or a part thereof, FAM131B gene or a part thereof, FBXL18 gene or a part thereof, FCHSD1 gene or a part thereof, FEV gene or a part thereof, FOXO1 gene or a part thereof, FOXO4 gene or a part thereof, FRYL gene or a part thereof, FUS gene or a part thereof, GAB2 gene or a part thereof, GABBR2 gene or a part thereof, GAS7 gene or a part thereof, GATM gene or a part thereof, GLI1 gene or a part thereof, GLIS2 gene or a part thereof, GABBR2 gene or a part thereof, GAS7 gene or a part thereof, GATM gene or a part thereof, GABBR2 gene or a part thereof, GABBR 7 gene or a part thereof, GABBR2 gene or a part thereof, GATM gene or a part thereof GMDS gene or part thereof, GMPS gene or part thereof, GNAI1 gene or part thereof, GOLGA5 gene or part thereof, GOPC gene or part thereof, GPHN gene or part thereof, HACL1 gene or part thereof, HAS2 gene or part thereof, HIP1 gene or part thereof, HOOK3 gene or part thereof, IL6R gene or part thereof, INTS4 gene or part thereof, IRF2BP2 gene or part thereof, ITPR2 gene or part thereof, JAK2 gene or part thereof, JAZF1 gene or part thereof, and a JPT1 gene or portion thereof, a KDM5A gene or portion thereof, a KIF5B gene or portion thereof, a KLK2 gene or portion thereof, a KMT2A gene or portion thereof, a KNL1 gene or portion thereof, a KTN1 gene or portion thereof, a LGR5 gene or portion thereof, a LIFR gene or portion thereof, an LPP gene or portion thereof, a LRIG3 gene or portion thereof, a LSM14A gene or portion thereof, a MAml2 gene or portion thereof, a MApre1 gene or portion thereof, a MAST2 gene or portion thereof, a MBOAT2 gene or portion thereof, a td1 gene or portion thereof, a MCPH1 gene or portion thereof, a MEAF6 gene or portion thereof, a MLLT1 gene or portion thereof, a MLLT11 gene or portion thereof, a MLLT6 gene or portion thereof, a MN1 gene or portion thereof, a MSN 1 gene or portion thereof, a MYO1 gene or portion thereof, a myf 1 gene or portion thereof, a myla 2 gene or portion thereof, a psd 2 gene or portion thereof, a kib 2 or portion thereof, a k 1 gene or portion thereof, a, NCOA1 gene or a part thereof, NCOA2 gene or a part thereof, NCOA4 gene or a part thereof, NF1 gene or a part thereof, NONO gene or a part thereof, NOTCH1 gene or a part thereof, NRG1 gene or a part thereof, NRIP3 gene or a part thereof, NTN1 gene or a part thereof, NUP107 gene or a part thereof, NUP98 gene or a part thereof, NUTM2A gene or a part thereof, NUTM2B gene or a part thereof, PACS1 gene or a part thereof, PAX3 gene or a part thereof, PAX5 gene or a part thereof, PAX7 gene or a part thereof, PBX1 gene or a part thereof, PCM1 gene or a part thereof, PDE8B gene or a part thereof, PDGFB gene or a part thereof, PDS5A gene or a part thereof, PHF1 gene or a part thereof, PICALM gene or a part thereof, PKD1L1 gene or a part thereof, PLA2R1 gene or a part thereof, XAG 1 gene or a part thereof, PMND 1 gene or a part thereof, KACC 1 gene or a part thereof, PRC 2 or a part thereof; PRKCE gene or a part thereof, RAF1 gene or a part thereof, RANBP2 gene or a part thereof, RARA gene or a part thereof, RBM14 gene or a part thereof, RBMS1 gene or a part thereof, RELCH gene or a part thereof, RET gene or a part thereof, RHEBL1 gene or a part thereof, RNF130 gene or a part thereof, RNF216 gene or a part thereof, ROS1 gene or a part thereof, RSPO2 gene or a part thereof, RUNX1 gene or a part thereof, SARNP gene or a part thereof, SDC4 gene or a part thereof, SEC16A gene or a part thereof, SFPQ gene or a part thereof, SH3GL1 gene or a part thereof, SLC26A6 gene or a part thereof, SLC34A2 gene or a part thereof, SLC3A2 gene or a part thereof, SLC45A3 gene or a part thereof, SND1 gene or a part thereof, SQSTM1 gene or a part thereof, SRGAP3 gene or a part thereof, SS18L1 gene or a part thereof, SSX2 or a part thereof, SSX4 or a part thereof, SSB 4 or a part thereof, or a part thereof STRN gene or a part thereof, SUZ12 gene or a part thereof, TADA2A gene or a part thereof, TBL1XR1 gene or a part thereof, TCEA1 gene or a part thereof, TCF3 gene or a part thereof, TET1 gene or a part thereof, TFE3 gene or a part thereof, TFG gene or a part thereof, TMCC1 gene or a part thereof, TP63 gene or a part thereof, TPM3 gene or a part thereof, TPR gene or a part thereof, TRIM24 gene or a part thereof, TRIM27 gene or a part thereof, TRIM33 gene or a part thereof, USH1G gene or a part thereof, VCL gene or a part thereof, WDCP gene or a part thereof, ywae gene or a part thereof, ZC3H7B gene or a part thereof, zchc 8 gene or a part thereof, or ZFYVE19 gene or a part thereof.

In embodiments, the polynucleotide fusion comprises a sequence of a first region fused to a sequence of a second region at a fusion junction, wherein the second region comprises an ABI1 gene or part thereof, an ACLY gene or part thereof, an ACSL3 gene or part thereof, an ACTN4 gene or part thereof, an AFF3 gene or part thereof, an AFF4 gene or part thereof, an AGPAT5 gene or part thereof, an AKAP9 gene or part thereof, an ALK gene or part thereof, an ARHGAP26 gene or part thereof, an ARHGEF12 gene or part thereof, an ARID1A gene or part thereof, an ASIC2 gene or part thereof, an ATF1 gene or part thereof, an ATIC gene or part thereof, an ATP8B2 gene or part thereof, a BBS9 gene or part thereof, a BCOR gene or part thereof, a BRAF gene or part thereof, a BTBD18 gene or part thereof, a CASP8AP2 gene or part thereof, a CBFA2 gene or part thereof, a CBL gene or part thereof, a CCDC6 gene or part thereof, a CD74 gene or part thereof, a CDX1 gene or part thereof, a BX 2 gene or part thereof, a BRAF 8B2 gene or part thereof, a BTBD18 gene or part thereof, a CASP8AP2 gene or part thereof, a CBA 3 gene or part thereof CENPK gene or a part thereof, CEP170B gene or a part thereof, CEP89 gene or a part thereof, CIP2A gene or a part thereof, CLCN6 gene or a part thereof, COL1A1 gene or a part thereof, COL1A2 gene or a part thereof, CREB3L1 gene or a part thereof, CREBBP gene or a part thereof, CRTC3 gene or a part thereof, CXorf67 gene or a part thereof, CYP39A1 gene or a part thereof, DCTN1 gene or a part thereof, DDX5 gene or a part thereof, DHH gene or a part thereof, DNAJB1 gene or a part thereof, EEFSEC gene or a part thereof, EIF3E gene or a part thereof, EIF3K gene or a part thereof, ELK4 gene or a part thereof, ELL gene or a part thereof, EML4 gene or a part thereof, EPC 300 gene or a part thereof, EPS1 gene or a part thereof, 15 gene or a part thereof, BB4 gene or a part thereof, ETV1 gene or a part thereof, ETV4 gene or a part thereof, or a part thereof ETV6 gene or a part thereof, EZR gene or a part thereof, FAM131B gene or a part thereof, FBXL18 gene or a part thereof, FCHSD1 gene or a part thereof, FEV gene or a part thereof, FOXO1 gene or a part thereof, FOXO4 gene or a part thereof, frayl gene or a part thereof, FUS gene or a part thereof, GAB2 gene or a part thereof, GABBR2 gene or a part thereof, GAS7 gene or a part thereof, GATM gene or a part thereof, GLI1 gene or a part thereof, GLIs2 gene or a part thereof, GMDS gene or a part thereof, GMPS gene or a part thereof, GNAI1 gene or a part thereof, GOLGA5 gene or a part thereof, GOLGA gene or a part thereof, GPHN 1 gene or a part thereof, HACL1 gene or a part thereof, hasa HAS2 gene or a part thereof, HIP1 gene or a part thereof, hos 3 gene or a part thereof, IL6R gene or a part thereof, INTS4 gene or a part thereof, IRF2BP2 gene or a part thereof, JAK2 gene or a part thereof, zpr 1 or a part thereof, JAK2 gene or a part thereof, zpr 1 or a part thereof; KDM5A gene or a part thereof, KIF5B gene or a part thereof, KLK2 gene or a part thereof, KMT2A gene or a part thereof, KNL1 gene or a part thereof, KTN1 gene or a part thereof, LASP1 gene or a part thereof, LGR5 gene or a part thereof, LIFR gene or a part thereof, LPP gene or a part thereof, LRIG3 gene or a part thereof, LSM14A gene or a part thereof, MAML2 gene or a part thereof, MAPRE1 gene or a part thereof, MAST2 gene or a part thereof, LRIG3 gene or a part thereof, MAST2 gene or a part thereof, MBOAT2 gene or a part thereof, MBTD1 gene or a part thereof, MCPH1 gene or a part thereof, MEAF6 gene or a part thereof, MKRN1 gene or a part thereof, MLLT11 gene or a part thereof, MLLT3 gene or a part thereof, MLLT6 gene or a part thereof, MN1 gene or a part thereof, MSN gene or a part thereof, MYO1F gene or a part thereof, MYO5A gene or a part thereof, NAB2 gene or a part thereof, NCKIPSD gene or a part thereof, NCOA1 gene or a part thereof, MSN gene or a part thereof, and, NCOA2 gene or a part thereof, NCOA4 gene or a part thereof, NF1 gene or a part thereof, NONO gene or a part thereof, NOTCH1 gene or a part thereof, NRG1 gene or a part thereof, NUP 3 gene or a part thereof, NUP107 gene or a part thereof, NUP98 gene or a part thereof, NUTM2A gene or a part thereof, NUTM2B gene or a part thereof, PACS1 gene or a part thereof, PAX3 gene or a part thereof, PAX5 gene or a part thereof, PAX7 gene or a part thereof, PBX1 gene or a part thereof, PCM1 gene or a part thereof, PDE8B gene or a part thereof, PDGFRB gene or a part thereof, PDS5A gene or a part thereof, PHF1 gene or a part thereof, PICAL gene or a part thereof, PLXND1 gene or a part thereof, PLR 2A gene or a part thereof, PLAG1 gene or a part thereof, PML gene or a part thereof, KACC 1 gene or a part thereof, PRCC 1 gene or a part thereof, PRC 2 gene or a part thereof, PRC 2 or a part thereof; RAF1 gene or a part thereof, RANBP2 gene or a part thereof, RARA gene or a part thereof, RBM14 gene or a part thereof, RBMS1 gene or a part thereof, RELCH gene or a part thereof, RET gene or a part thereof, RHEBL1 gene or a part thereof, RNF130 gene or a part thereof, RNF216 gene or a part thereof, ROS1 gene or a part thereof, RSPO2 gene or a part thereof, RUNX1 gene or a part thereof, SARNP gene or a part thereof, SDC4 gene or a part thereof, SEC16A gene or a part thereof, SFPQ gene or a part thereof, SH3GL1 gene or a part thereof, SLC26A6 gene or a part thereof, SLC34A2 gene or a part thereof, SLC3A2 gene or a part thereof, SLC45A3 gene or a part thereof, SND1 gene or a part thereof, SQSTM1 gene or a part thereof, SRGAP 18 gene or a part thereof, SS18L1 gene or a part thereof, SSX2 gene or a part thereof, SSX4 or a part thereof, STR 6A6 or a part thereof SUZ12 gene or a part thereof, TADA2A gene or a part thereof, TBL1XR1 gene or a part thereof, TCEA1 gene or a part thereof, TCF3 gene or a part thereof, TET1 gene or a part thereof, TFE3 gene or a part thereof, TFG gene or a part thereof, TMCC1 gene or a part thereof, TP63 gene or a part thereof, TPM3 gene or a part thereof, TPR gene or a part thereof, TRIM24 gene or a part thereof, TRIM27 gene or a part thereof, TRIM33 gene or a part thereof, USH1G gene or a part thereof, VCL gene or a part thereof, WDCP gene or a part thereof, YWHAE gene or a part thereof, ZC3H7B gene or a part thereof, ZCC 8 gene or a part thereof, or ZFYVE19 gene or a part thereof.

In embodiments, the fusion junction may be an unknown fusion junction event, as the methods disclosed herein do not require a priori knowledge of the exact nature of the genomic rearrangement to detect and characterize the fusion. In an embodiment, only the sequence of the first region is known prior to cyclization. In an embodiment, only the sequence of the second region is known prior to cyclization.

In embodiments, the first region and the second region are located on the same chromosome. In embodiments, the first region and the second region are located on different chromosomes.

In embodiments, the polynucleotide fusion comprises a gene encoding a kinase domain or a portion thereof. In embodiments, the polynucleotide fusion comprises a gene fusion of BCL1-JH, BCL2-JH, or MYC-IGL.

In embodiments, the polynucleotide fusion comprises a B-cell or T-cell intrachromosomal rearrangement. In embodiments, the polynucleotide fusion comprises a B cell intrachromosomal rearrangement. In embodiments, the polynucleotide fusion comprises a T cell intrachromosomal rearrangement.

In embodiments, the polynucleotide fusion comprises the following fusion: a rearranged T cell antigen receptor or fragment thereof, a T cell receptor alpha variable (TRAV) gene or fragment thereof, a T cell receptor alpha junction (TRAJ) gene or fragment thereof, a T cell receptor alpha constant (TRAC) gene or fragment thereof, a T cell receptor beta variable (TRBV) gene or fragment thereof, a T cell receptor beta diversity (TRBD) gene or fragment thereof, a T cell receptor beta junction (TRBJ) gene or fragment thereof, a T cell receptor beta constant (TRBC) gene or fragment thereof, a T cell receptor gamma variable (TRGV) gene or fragment thereof, a T cell receptor gamma junction (TRGJ) gene or fragment thereof, a T cell receptor gamma constant (TRGC) gene or fragment thereof, a T cell receptor delta variable (TRDV) gene or fragment thereof, a T cell receptor delta diversity (TRDD) gene or fragment thereof, or a T cell receptor delta constant (TRDC) gene or fragment thereof, or a fragment thereof.

In embodiments, the polynucleotide fusion comprises the following fusion: a rearranged B cell antigen receptor or fragment thereof, an IGHV gene or fragment thereof, an IGHD gene or fragment thereof, or an IGHJ gene or fragment thereof, an IGHJC gene or fragment thereof, an IGKV gene or fragment thereof, an IGKJ gene or fragment thereof, an IGKC gene or fragment thereof, an IGLV gene or portion thereof, an IGLJ gene or portion thereof, an IGLC gene or fragment thereof, an IGK kappa deletion element or portion thereof, an IGK intron enhancer element or portion thereof. In embodiments, the polynucleotide fusion comprises the following fusion: ALK gene or a part thereof, BRAF gene or a part thereof, EGFR gene or a part thereof, ERBB2 gene or a part thereof, KRAS gene or a part thereof, MET gene or a part thereof, NRG1 gene or a part thereof, FGFR2 gene or a part thereof, FGFR3 gene or a part thereof, NTRK1 gene or a part thereof, NTRK2 gene or a part thereof, NTRK3 gene or a part thereof, RET gene or a part thereof, or ROS1 gene or a part thereof.

III.Compositions and kits

In one aspect, a composition is provided that includes a blocking element, a first primer, and a second primer. In embodiments, the composition further comprises an annealing solution (alternatively referred to herein as hybridization buffer or hybridization solution). In embodiments, the annealing solution comprises an aqueous solution, which may contain a buffer (e.g., sodium citrate saline (SSC), tris (hydroxymethyl) aminomethane, or "tris"), an aqueous salt solution (e.g., KCl or (NH) ₄ ) ₂ SO ₄ ) Chelating agents (e.g., EDTA), detergents, surfactants, crowding agents or stabilizers (e.g., PEG, tween-20, BSA). In an embodiment, the annealing solution comprises Tris and the pH is maintained at about 8.0 to about 9.0. In an embodiment, the composition comprises an extension solution. In embodiments, the extension solution comprises an aqueous solution, which may contain a buffer (e.g., saline-sodium citrate (SSC), tris (hydroxymethyl) aminomethane, or "tris "), aqueous salts (e.g., KCl or (Mg) ₂ SO ₄ ) Nucleotides, polymerases, detergents, chelating agents (e.g., EDTA), surfactants, crowding agents or stabilizers (e.g., PEG, tween-20, BSA). In embodiments, the composition further comprises an additive that reduces the denaturation temperature of DNA. In embodiments, the composition comprises an additive such as betaine, dimethyl sulfoxide (DMSO), ethylene glycol, formamide, glycerol, guanidine thiocyanate, 4-methylmorpholine 4-oxide (NMO), or mixtures thereof. In embodiments, the composition further comprises a denaturant. The denaturant may be acetic acid, hydrochloric acid, nitric acid, formamide, guanidine, sodium salicylate, sodium hydroxide, dimethyl sulfoxide (DMSO), propylene glycol, urea, or mixtures thereof.

In embodiments, the composition comprises a cyclization solution (e.g., a cyclizing agent). In an embodiment, the circularization solution comprises a circularization ligase, e.g. CircLigase ^TM Taq DNA ligase, hiFi Taq DNA ligase, T4 ligase orDNA ligase. In an embodiment, the circularization solution comprises a splint primer. "splint primer" is used in accordance with its simple and ordinary meaning and refers to a primer having 2 or more sequences complementary to two or more portions of a template polynucleotide. In embodiments, the two sequences are adaptor sequences, wherein one adaptor sequence binds to (i.e., hybridizes to) the 5 'portion of the template polynucleotide and the other adaptor binds to (i.e., hybridizes to) the 3' portion of the template polynucleotide. In an embodiment, the cyclization solution comprises a crowding agent, such as PEG (e.g., 20% -25% PEG-8000). In an embodiment, the cyclizing solution comprises polyethylene glycol (PEG), such as PEG 4000 or PEG 6000, dextran, and/or Ficoll.

In embodiments, the splint primers are about 5 to about 25 nucleotides in length. In embodiments, the splint primers are about 10 to about 40 nucleotides in length. In embodiments, the splint primers are about 5 to about 100 nucleotides in length. In embodiments, the splint primers are about 20 to about 200 nucleotides in length. In embodiments, the splint primers are about or at least about 5, 6, 7, 8, 9, 10, 12, 15, 18, 20, 25, 30, 35, 40, 50 or more nucleotides in length. In embodiments, the splint primers are about or at least about 10 nucleotides in length. In embodiments, the splint primers are about or at least about 15 nucleotides in length. In embodiments, the splint primers are about or at least about 25 nucleotides in length.

In one aspect, a kit is provided comprising: a circularizing agent, wherein the circularizing agent is capable of binding the 5 'and 3' ends of a linear nucleic acid molecule; a blocking element capable of binding to one or more circular polynucleotides; a first primer and a second primer; and a polymerase. In an embodiment, the first primer and the second primer form a primer set. In an embodiment, the kit comprises a plurality of primer sets. In embodiments, the kit comprises 5, 10, 20, 25, 50 or more primer sets.

In an embodiment, the kit comprises at least 22 different primers, for example one forward primer (1F) and six reverse primers (6R) of the IGH locus; three forward (3F) and six reverse (6R) of IGK loci; and one forward primer (1F) and five reverse primers (5R) of the IGL locus. In an embodiment, the kit comprises about 18 elements (i.e., 18 blocking elements targeting 18 different regions). In an embodiment, the kit comprises primers targeting 7 different sequences of the IGH locus. In an embodiment, the kit comprises primers targeting 9 different sequences of the IGK locus. In an embodiment, the kit comprises primers targeting 6 different sequences of the IGL locus. In embodiments, the kit comprises a plurality of different populations of blocking elements, each population of blocking elements binding to a particular sequence.

In one aspect, a kit is provided containing the components necessary to perform the methods as described herein, including the examples. Typically, the kit comprises one or more containers that provide the composition and one or more additional reagents (e.g., buffers suitable for polynucleotide extension). The kit may also comprise template nucleic acids (DNA and/or RNA), one or more primersPolynucleotides, nucleotides (including, for example, deoxyribonucleotides, ribonucleotides, labeled nucleotides, and/or modified nucleotides), buffers, salts, and/or labels (e.g., fluorophores). In embodiments, the kit further comprises instructions. In embodiments, the kit comprises one or more housings (e.g., cassettes, bottles, cartridges) containing the relevant reagents and/or support materials. In embodiments, the kit comprises components useful for circularizing a template polynucleotide using chemical ligation techniques. In embodiments, the kit comprises a kit useful for using a ligase (e.g., circLigase ^TM Ligase, taq DNA ligase, hiFi Taq DNA ligase, T4 DNA ligase, or amplinase DNA ligase) to cyclize the template polynucleotide. In embodiments, the ligase is an RNA-dependent DNA ligase (e.g., a slintr ligase). For example, such a kit further comprises the following components: (a) For pH control and is a ligase (e.g., circLigase ^TM Ligase, taq DNA ligase, hiFi Taq DNA ligase, T4 DNA ligase or amplinase DNA ligase) provides a reaction buffer of optimized salt composition, and (b) ligase cofactors. In embodiments, the kit further comprises instructions for its use.

In an embodiment, the kit comprises a plurality of primers, wherein the primers are capable of hybridizing to linear nucleic acid molecules. Nucleic acid hybridization techniques can be used to assess the hybridization specificity of the primers described herein. Hybridization techniques are well known in the art, e.g., suitable moderately stringent conditions for testing hybridization of a polynucleotide as provided herein to other polynucleotides comprise pre-washing in a solution of 5x SSC, 0.5% sds, 1.0mM EDTA (pH 8.0); hybridization in 5 XSSC at 50℃to 60 ℃; then washed twice with 2x, 0.5x and 0.2x SSC each containing 0.1% sds at 65 ℃ for 20 minutes.

In an embodiment, the kit comprises a primer set. In an embodiment, the kit comprises a plurality of primer sets. The number of first set of primers may be the same or different than the number of second set of primers. As used herein, "primer set" or "primer pair" refers to two or more primers that target two or more regions of a polynucleotide. Typically, the primer set comprises a first primer that hybridizes to a 5 'portion of the polynucleotide and a second primer that hybridizes to a 3' portion of the polynucleotide. For example, the forward primer and the reverse primer are located on both sides of the target region of the polynucleotide, and the forward primer and the reverse primer are collectively referred to as a primer set. In an embodiment, the kit comprises a first set of "upstream" or "forward" primers and a second set of "downstream" or "reverse" primers. In an embodiment, the kit further comprises forward and reverse primer sets for specifically amplifying recombinant nucleic acids encoding IgH (VDJ), igH (DJ) and IgK. In some embodiments, the kit further comprises forward and reverse primer sets that specifically amplify recombinant nucleic acids encoding tcrβ, tcrδ, and tcrγ. In embodiments, the kit comprises a plurality of V segment primers (i.e., primers having a sequence complementary to the V coding region) and a plurality of J segment primers (e.g., primers having a sequence complementary to the J coding region), wherein the plurality of V segment primers and the plurality of J segment primers amplify substantially all combinations of V segments and J segments of the rearranged immunoreceptor locus. Substantially all combinations means at least 95%, 96%, 97%, 98%, 99% or more of all combinations of V and J segments of rearranged immunoreceptor loci. In certain embodiments, the plurality of V segment primers and the plurality of J segment primers amplify all combinations of V segments and J segments of the rearranged immunoreceptor locus. In embodiments, the primer may comprise or be at least about 15 nucleotides long, having a sequence identical to or complementary to a contiguous sequence of 15 nucleotides in length of the target V or J segment (i.e., a portion of a genomic polynucleotide encoding a V or J region polypeptide). Longer primers, such as about 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, or 50 nucleotide long primers having a sequence identical to or complementary to the contiguous sequence of the polynucleotide segment encoding the target V or J region, may also be used in the methods and kits described herein. In an embodiment, the kit comprises an inwardly facing primer. In an embodiment, the kit comprises an outward facing primer. The primer set may comprise more than two different primers, e.g., one forward primer (1F) and six reverse primers (6R) of the IGH locus, collectively referred to as a primer set for the IGH locus.

In embodiments, the kit further comprises forward and reverse primer sets for amplifying one or more target sequences comprising single nucleotide variants, insertions, deletions, internal tandem repeats, and/or copy number variants. In embodiments, the kit further comprises forward and reverse primer sets for amplifying one or more target sequences comprising one or more single nucleotide variants, one or more insertions, one or more deletions, one or more internal tandem repeats, or one or more copy number variants.

In embodiments, the kit comprises at least 2, 4, 6, 8, 10, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 or more primer sets. In embodiments, the kit comprises 2 to 10, 10 to 40, 40 to 80, 80 to 150, 150 to 300 or more primer sets. The number of primer sets provided in the kit can be tailored for a particular application, e.g., detecting a known number of recombinant nucleic acids, and/or detecting a known number of single nucleotide variants, insertions, deletions, internal tandem repeats, and/or copy number variants. In embodiments, the kit comprises a plurality (e.g., a plurality) of primer sets for amplifying a single genomic feature.

In an embodiment, the kit comprises a sequencing polymerase and one or more amplification polymerases. In embodiments, the sequencing polymerase is capable of incorporating modified nucleotides. In an embodiment, the polymerase is a DNA polymerase. In embodiments, the DNA polymerase is a Pol IDNA polymerase, pol IIDNA polymerase, pol IIIDNA polymerase, pol IV DNA polymerase, pol V DNA polymerase, pol β DNA polymerase, pol μ DNA polymerase, pol λ DNA polymerase, pol σdna polymerase, pol α DNA polymerase, pol δ DNA polymerase, pol epsilon DNA polymerase, pol ηdna polymerase, pol iota DNA polymerase, pol κdna polymerase, pol ζdna polymerase, pol γ DNA polymerase, pol θ DNA polymerase, pol V DNA polymerase, or thermophilic nucleic acid polymerase (e.g., thermomer γ, 9°n polymerase (exo-), thermomer II, thermomer III, or thermomer IX). In embodiments, the DNA polymerase is a thermophilic nucleic acid polymerase. In embodiments, the DNA polymerase is a modified archaebacteria DNA polymerase. In embodiments, the polymerase is a reverse transcriptase. In embodiments, the polymerase is a mutant deep sea pneumococcal polymerase (e.g., a mutant deep sea pneumococcal polymerase as described in WO 2018/148723 or WO 2020/056044, each of which is incorporated herein by reference for all purposes). In an embodiment, the kit comprises a strand displacement polymerase. In embodiments, the kit comprises a strand displacement polymerase, such as a phi29 polymerase, a Bst polymerase (e.g., bst Lf), a phi29 mutant polymerase, or a thermostable phi29 mutant polymerase.

In an embodiment, the kit comprises a buffer solution. Typically, the buffer solutions contemplated herein are made from weak acids and their conjugate bases or weak bases and their conjugate acids. For example, sodium acetate and acetic acid are buffers that may be used to form an acetate buffer. Other examples of buffers that may be used to prepare the buffer solution include, but are not limited to Tris, bicine, tricine, HEPES, TES, MOPS, MOPSO and PIPES. In addition, other buffers that may be used in enzymatic, hybridization and detection reactions are known in the art. In an embodiment, the buffer solution may comprise Tris. With respect to the embodiments described herein, the pH of the buffer solution may be adjusted to allow for any of the described reactions. In some embodiments, the pH of the buffer solution may be greater than pH 7.0, greater than pH 7.5, greater than pH 8.0, greater than pH 8.5, greater than pH 9.0, greater than pH 9.5, greater than pH 10, greater than pH 10.5, greater than pH 11.0, or greater than pH 11.5. In other embodiments, the pH of the buffer solution may be in the range of, for example, about pH 6 to about pH 9, about pH 8 to about pH 10, or about pH 7 to about pH 9. In embodiments, the buffer solution may comprise one or more divalent cations. Examples of divalent cations may include, but are not limited to, mg ²⁺ 、Mn ²⁺ 、Zn ²⁺ And Ca ²⁺ . In embodiments, the buffer solution may contain one or more divalent cations in a concentration sufficient to allow hybridization of the nucleic acidsIons. In an embodiment, the kit comprises an annealing solution, an extension solution, and a chemical denaturant. In an embodiment, the kit further comprises an internal standard comprising a plurality of nucleic acids having a length and composition representative of the target nucleic acid, wherein the internal standard is provided at a known concentration.

The kit may further comprise one or more additional containers comprising PCR and sequencing buffers, diluents, subject sample extraction means (e.g., syringe, swab, etc.), and package inserts with instructions for use. Additionally, labels with instructions for use, such as those described above, may be provided on the containers; and/or instructions and/or other information may also be contained on the insert contained with the kit; and/or by the website address provided therein. The kit may also contain laboratory tools such as sample tubes, plate sealers, microcentrifuge tube openers, labels, magnetic particle separators, foam inserts, ice bags, dry ice bags, insulating materials, and the like. The kit may further comprise a pre-packaged or dedicated functionalized substrate as described herein to amplify and/or detect the library molecules. In embodiments, the substrate may comprise a surface suitable for performing a sequencing reaction therein.

In one aspect, a kit is provided, wherein the kit comprises: i) An enzyme that circularizes a nucleic acid (e.g., a circularizing agent as described herein, such as a thermostable ATP-dependent ligase that catalyzes intramolecular ligation of ssDNA templates with 5 '-phosphate and 3' -hydroxyl); ii) a plurality of oligonucleotide primers; iii) A plurality of blocking elements (e.g., blocking elements as described herein); iv) polymerase (e.g., non-strand displacement polymerase, such as) The method comprises the steps of carrying out a first treatment on the surface of the And v) multiple nucleotides (e.g., dNTPs for amplification, extension and/or sequencing in a suitable buffer).

In embodiments, the plurality of oligonucleotide primers comprises at least 7 primers (IGH loci). In an embodiment, a subset of the plurality of primers all target a junction gene. In embodiments, the plurality of oligonucleotide primers comprises at least two different primer populations (e.g., first and second primer pairs, or primer sets). In embodiments, the plurality of oligonucleotide primers comprises about 1, 2, 3, 4, 5, 10, 15, 25, 50, 75, 100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, or 1000 different primer sets. In embodiments, each primer set is provided at a concentration of about 25nM to about 200 nM. In an embodiment, each primer set is provided at a concentration of about 100 nM. In an embodiment, one blocking element per group is provided.

In an embodiment, the plurality of blocking elements comprises at least two different populations of blocking elements. In embodiments, the blocking element comprises at least 6 different blocking elements (e.g., for IGH loci, 6 blocking elements are used to target each junction gene).

In embodiments, the polymerase isHigh fidelity DNA polymerase, taq DNA polymerase, bst DNA polymerase, T7 DNA polymerase, sulfolobus DNA polymerase or DNA polymerase I.

In embodiments, the kit further comprises a fragmenting enzyme (e.g., an enzyme capable of fragmenting a high molecular weight DNA sample into about 200-300bp DNA fragments). In some embodiments, the primers are used in a single-pool PCR reaction. In other embodiments, the primers are used in a multiplex PCR reaction.

In embodiments, the kit further comprises a restriction enzyme or CRISPR/Cas9 protein for consuming the WT DNA loop. For example, in an embodiment, the WT DNA-specific deletion will be mediated by a WT DNA-specific oligonucleotide (e.g., a blocking element), i.e., cas9 will be guided by a "blocker" guide RNA (i.e., the blocking element is a guide RNA) that will linearize the WT DNA loop, preventing exponential amplification from occurring in subsequent steps. In embodiments, the kit further comprises a plurality of adaptors. In embodiments, the kit further comprises instructions.

In an embodiment, the kit further comprises a blocking element comprising biotin. In embodiments, the kit further comprises a blocking element comprising a restriction site. In embodiments, the kit further comprises a methylation sensitive restriction enzyme (e.g., notI, naeI, nsbI, salI, hapII or HaeII).

In one aspect, a microfluidic device is provided, wherein the microfluidic device is capable of performing any of the methods described herein, including embodiments. Microfluidic devices are suitable for amplifying, processing and/or detecting samples of analytes of interest in flow cells. In this application, the fluidic system is made with reference to nucleic acid sequencing (i.e., genomic instrumentation), which allows for sequencing of nucleic acid molecules. However, the techniques disclosed herein may be applied to any system that utilizes a reaction vessel, such as a flow-through cell, to detect an analyte of interest, and to introduce a solution into the system during preparation, reaction, detection, or any other process on or within the reaction vessel. The term "microfluidic device" means an integrated system having one or more chambers, ports and channels that are interconnected and in fluid communication and designed for performing an analytical reaction or process, such as sample introduction, fluid and/or reagent driven devices, temperature control, detection systems, data collection and/or integrated systems, alone or in cooperation with an instrument or instrument providing a support function, for determining the nucleic acid sequence of a template polynucleotide. In an embodiment, the device includes a light source that irradiates the sample, an objective lens, and a sensor array (e.g., a Complementary Metal Oxide Semiconductor (CMOS) array or a Charge Coupled Device (CCD) array). The nucleic acid sequencing device may further comprise special functional coatings on valves, pumps and internal walls. For example, the microfluidic device is a nucleic acid sequencing device provided by: singular GenomicsTM (e.g., G4. TM. Sequencing platform), illumina (e.g., hiSeqTM, miSeqTM, nextSeqTM or NovaSeq. TM. System), life technologies (e.g., ABIPRISMTM or SOLIDTM System), pacific bioscience (Pacific Biosciences) (e.g., systems using SMRTM technology, such as the SequelTM or RS IITM systems), or Kanji (Qiagen) (e.g., generader. TM. System).

P example

The present disclosure provides the following illustrative embodiments.

Embodiment p1. A method of detecting a polynucleotide fusion comprising a sequence of a first region fused to a sequence of a second region at a fusion junction, the method comprising: (a) Circularizing one or more linear nucleic acid molecules to form a circular template polynucleotide comprising a contiguous strand lacking free 5 'and 3' ends; (b) Amplifying a circular template polynucleotide comprising the fusion junction in an amplification reaction comprising a first primer, a second primer, a blocking element, and a polymerase to produce a fusion amplification product, wherein: (i) The first region comprises a first strand comprising, from 5 'to 3', a sequence that specifically binds to the blocking element, a sequence that specifically hybridizes to the first primer, and a sequence that is complementary to the sequence that specifically hybridizes to the second primer; (ii) The fusion junction is located between the sequence that specifically binds to the blocking element and the sequence that specifically hybridizes to the first primer; (iii) The blocking element inhibits polymerase extension along the sequence to which it binds; and (iv) the circular template polynucleotide comprising the fusion junction does not comprise the sequence or complement thereof that specifically binds to the blocking element; and (c) detecting the fusion amplification product, thereby detecting the polynucleotide fusion.

Embodiment P2 the method of embodiment P1, wherein the one or more linear nucleic acid molecules comprise DNA, RNA, or cDNA; optionally wherein said DNA or said RNA is cell-free nucleic acid.

Embodiment P3. The method of embodiment P2, wherein the one or more linear nucleic acid molecules comprises RNA or cDNA and the fusion junction is located at an exon junction.

Embodiment P4 the method of any one of embodiments P1-P3, wherein the fusion comprises an inter-or intra-chromosomal translocation.

Embodiment P5. the method of embodiment P4, wherein the intrachromosomal translocation comprises a partially or fully rearranged B cell or T cell antigen receptor.

Embodiment P6 the method of any one of embodiments P1 to P5, wherein the sequence of the first region comprises a sequence of a first gene and the sequence of the second region comprises a sequence of a second gene.

Embodiment P7 the method of any one of embodiments P1 to P6, wherein the blocking element comprises an oligonucleotide, a protein, or a combination thereof.

Embodiment P8 the method of any one of embodiments P1 to P7, wherein the one or more linear nucleic acid molecules are about 20 to about 1000 nucleotides in length, about 100 to about 300 nucleotides in length, about 300 to about 500 nucleotides in length, or about 500 to about 1000 nucleotides in length.

Embodiment P9 the method of any one of embodiments P1-P8, wherein the one or more linear nucleic acid molecules comprise a barcode sequence.

Embodiment P10 the method of any one of embodiments P1 to P9, wherein the circularization comprises intramolecular conjugation of the 5 'and 3' ends of the linear nucleic acid molecules.

Embodiment P11 the method of any one of embodiments P1-P10, wherein the cyclizing comprises a ligation reaction.

Embodiment P12 the method of any one of embodiments P1 to P11, wherein the sequence that specifically binds to the blocking element, the sequence that specifically hybridizes to the first primer, or both is about 1 to about 100 nucleotides from the fusion junction.

Embodiment P13 the method of any one of embodiments P1 to P12, wherein the sequence that specifically hybridizes to the first primer is separated from the sequence that is complementary to the sequence that specifically hybridizes to the second primer by about 1 to about 50 nucleotides.

Embodiment P14 the method of any one of embodiments P1 to P13, wherein the sequence that specifically hybridizes to the first primer and the sequence that is complementary to the sequence that specifically hybridizes to the second primer are located within the same exon of a target gene.

Embodiment P15 the method of any one of embodiments P1 to P14, wherein the linear nucleic acid molecule is single stranded.

Embodiment P16 the method of any one of embodiments P1 to P14, wherein the linear nucleic acid molecule is double stranded.

Embodiment P17 the method of any one of embodiments P1 to P16, wherein (i) the first primer comprises a 5' sequence that does not hybridize to the first strand of the first region under amplification conditions; and/or (ii) the second primer comprises a 5' sequence that does not hybridize under amplification conditions to the complement of the first strand of the first region.

Embodiment P18 the method of any one of embodiments P1 to P17, wherein (i) the amplification reaction further comprises a second blocking element that inhibits polymerase extension along the sequence to which it binds, and (ii) the first region comprises a first strand comprising, from 5 'to 3', a sequence complementary to the sequence that specifically hybridizes to the second primer, and a sequence complementary to the sequence that specifically binds to the second blocking element.

Embodiment P19 the method of embodiment P18, wherein the sequence complementary to the sequence specifically hybridizing to the second primer is separated from the sequence complementary to the sequence specifically binding to the second blocking element by about 100 to about 300 nucleotides.

Embodiment P20 the method of any one of embodiments P1 to P19, wherein the amplifying comprises a plurality of cycles comprising the steps of primer hybridization, primer extension, and denaturation in the presence of the first primer, the blocking element, and the second primer.

Embodiment P21 the method of any one of embodiments P1 to P20, wherein the amplifying comprises exponentially including the circular template polynucleotide of the fusion junction.

Embodiment P22 the method of any one of embodiments P1 to P21, wherein detecting the fusion amplification product comprises detecting the length of the fusion amplification product, detecting one or more probes that bind to the fusion amplification product, or sequencing the fusion amplification product.

Embodiment P23 the method of any one of embodiments P1 to P21, wherein detecting the fusion amplification product comprises sequencing the fusion amplification product to generate sequencing reads of the sequences of the first region and the second region.

Embodiment P24 the method of embodiment P23, wherein the sequencing comprises hybridizing one or more sequencing primers to the fusion amplification product and extending the one or more sequencing primers.

Embodiment P25 the method of embodiment P23, wherein the sequencing comprises sequencing by synthesis, sequencing by hybridization, sequencing by ligation, or pyrosequencing.

Embodiment P26 the method of embodiment P23, wherein the sequencing comprises a plurality of sequencing cycles.

Embodiment P27. The method of embodiment P26, wherein the sequencing results in reads greater than 25bp in read length.

Embodiment P28 the method of embodiment P23, wherein the sequencing comprises extending a sequencing primer by incorporating labeled nucleotides or labeled nucleotide analogs and detecting the label to generate a signal for each incorporated nucleotide or nucleotide analog, wherein the sequencing primer hybridizes to one of the fusion amplification products.

Embodiment P29 the method of any one of embodiments P23 to P28, wherein detecting the fusion amplification product comprises aligning the substring of each sequencing read with a reference sequence and quantifying the number of sequencing reads of the circular template polynucleotide comprising the fusion junction.

Embodiment P30 the method of any one of embodiments P23 to P28, wherein detecting the fusion amplification product comprises comparing the k-mer substring of each sequencing read to a k-mer table of fusion junction references, and quantifying the number of k-mers shared between the sequencing reads and the fusion junction references.

Embodiment P31 the method of any one of embodiments P23 to P28, wherein detecting the fusion amplification product comprises (i) grouping sequencing reads based on a barcode sequence and/or a sequence comprising the fusion junction; and (ii) within each group, aligning reads and forming a consensus sequence of reads having the same barcode sequence and/or sequences comprising the fusion junction.

Embodiment P32 the method of any one of embodiments P23 to P31, wherein the sequencing further comprises generating sequencing reads spanning the circularized junction formed between the 5 'and 3' ends of the linear nucleic acid molecule, and quantifying the number of different circularized junction sequences comprising the fusion junction.

Embodiment P33 the method of any one of embodiments P1 to P32, further comprising quantifying the fusion amplification product.

Embodiment P34 the method of any one of embodiments P1 to P33, wherein the one or more linear nucleic acid molecules are derived from a sample of a subject, optionally wherein the sample is an FFPE sample.

Embodiment P35 the method of any one of embodiments P1 to P34, wherein the polynucleotide fusion is a biomarker for cancer, autoimmune disease, primary immunodeficiency or infectious disease.

Embodiment P36 the method of embodiment P35, wherein the polynucleotide fusion is a biomarker for cancer.

Embodiment P37 the method of embodiment P35, wherein the polynucleotide fusion is a biomarker for lymphoid malignancies.

Embodiment P38 the method of any one of embodiments P1 to P37, wherein the amplification reaction further comprises: (a) One or more different first primers that specifically hybridize to different portions of the first strand of the first region; (b) For each different first primer, a different second primer that specifically hybridizes to a complement of a portion of the first strand of the first region, the complement being in a 3' position relative to the corresponding different first primer specific hybridization; and (c) for each different first primer, a different blocking oligonucleotide that specifically hybridizes to a portion of the first strand of the first region at a position of 5' relative to the specific hybridization of the different first primer.

Embodiment P39 the method of any one of embodiments P1 to P38, further comprising detecting one or more different polynucleotide fusions, each different polynucleotide fusion comprising a fusion between a sequence of a different first region at a different fusion junction with a sequence of a different second region, wherein the amplification reaction further comprises a corresponding first primer, a corresponding second primer, and a corresponding blocking oligonucleotide for each different first region.

Embodiment P40 the method according to any one of embodiments P1 to P39, wherein the polynucleotide fusion comprises AGTRAP-BRAF, AKAP9-BRAF, ATIC-ALK, CCDC6-RET, CD74-NRG1, CD74-ROS1, CEP89-BRAF, CLCN6-BRAF, DCTN1-ALK, EML4-ALK, EZR-ROS1, FAM131B-BRAF, FCHSD1-BRAF, GATM-BRAF, GNAI1-BRAF, GOLGA5-RET, GOPC-ROS1, HIP1-ALK, HOOK3-RET, KIF5B-ALK, KIF5B-RET, KTN1-RET, LRIG3-ROS1 LSM14A-BRAF, MKRN1-BRAF, MSN-ALK, MYO5A-ROS1, NCOA4-RET, PCM1-RET, RANBP2-ALK, RELCH-RET, RNF130-BRAF, SDC4-ROS1, SLC34A2-ROS1, SLC3A2-NRG1, SLC45A3-BRAF, SQSTM1-ALK, STRN-ALK, TFG-ALK, TPM3-ROS1, TPR-ALK, TRIM24-BRAF, TRIM24-RET, TRIM27-RET, TRIM33-RET, VCL-ALK, WDCP-ALK, or ZCCHC8-ROS 1.

Embodiment P41 the method of any one of embodiments P1 to P39, wherein the polynucleotide fusion comprises a gene encoding a kinase domain or a portion thereof.

Embodiment P42 the method of any one of embodiments P1 to P39, wherein the polynucleotide fusion comprises a gene fusion of BCL1-JH, BCL2-JH, or MYC-IGL.

Embodiment P43 the method of any one of embodiments P1 to P39, wherein the polynucleotide fusion comprises a fusion of: a rearranged T cell antigen receptor or fragment thereof, a T cell receptor alpha variable (TRAV) gene or fragment thereof, a T cell receptor alpha junction (TRAJ) gene or fragment thereof, a T cell receptor alpha constant (TRAC) gene or fragment thereof, a T cell receptor beta variable (TRBV) gene or fragment thereof, a T cell receptor beta diversity (TRBD) gene or fragment thereof, a T cell receptor beta junction (TRBJ) gene or fragment thereof, a T cell receptor beta constant (TRBC) gene or fragment thereof, a T cell receptor gamma variable (TRGV) gene or fragment thereof, a T cell receptor gamma constant (TRGC) gene or fragment thereof, a T cell receptor delta variable (TRDV) gene or fragment thereof, a T cell receptor delta diversity (TRDD) gene or fragment thereof, or a T cell receptor delta constant (TRDC) gene or fragment thereof.

Embodiment P44 the method of any one of embodiments P1 to P39, wherein the polynucleotide fusion comprises a fusion of: a rearranged B cell antigen receptor or fragment thereof, an IGHV gene or fragment thereof, an IGHD gene or fragment thereof, or an IGHJ gene or fragment thereof, an IGHJC gene or fragment thereof, an IGKV gene or fragment thereof, an IGKJ gene or fragment thereof, an IGKC gene or fragment thereof, an IGLV gene or portion thereof, an IGLJ gene or portion thereof, an IGLC gene or fragment thereof, an IGK kappa deletion element or portion thereof, an IGK intron enhancer element or portion thereof.

Embodiment P45 the method of any one of embodiments P1 to P39, wherein the polynucleotide fusion comprises a fusion of: ALK gene or a part thereof, BRAF gene or a part thereof, EGFR gene or a part thereof, ERBB2 gene or a part thereof, KRAS gene or a part thereof, MET gene or a part thereof, NRG1 gene or a part thereof, FGFR2 gene or a part thereof, FGFR3 gene or a part thereof, NTRK1 gene or a part thereof, NTRK2 gene or a part thereof, NTRK3 gene or a part thereof, RET gene or a part thereof, or ROS1 gene or a part thereof.

Embodiment P46 the method of any one of embodiments P1 to P39, wherein the polynucleotide fusion comprises a B-cell or T-cell intrachromosomal rearrangement.

Embodiment p47 a method of differentially amplifying a polynucleotide comprising a fusion gene relative to a polynucleotide not comprising the fusion gene, the method comprising: i) Circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises the fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise the fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides; ii) binding a blocking element to the one or more non-fused circular template polynucleotides; and iii) hybridizing a first primer and a second primer to the one or more non-fusion circular template polynucleotides and the one or more fusion circular template polynucleotides and extending with a polymerase to produce a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product, wherein the first amount is detectably less than the second amount; whereby the polynucleotides comprising the fusion gene are differentially amplified.

Embodiment P48 the method of embodiment P47, wherein binding the blocking element comprises binding the blocking element upstream of the first primer.

Embodiment P49 the method of embodiment P47 or embodiment P48, wherein the second amount is about 1%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 75% greater than the first amount.

Embodiment P50 the method of embodiment P47 or embodiment P48, wherein the second amount is about 2 times, at least about 1.5 times, at least about 2.0 times, at least about 2.5 times, at least about 5 times, at least about 10 times, or more than about 10 times the first amount.

Embodiment P51 the method of any one of embodiments P47-P50, further comprising detecting the first amount of non-fusion polynucleotide amplification product and the second amount of fusion polynucleotide amplification product.

Embodiment P52 the method of any one of embodiments P47-P51, wherein the one or more linear nucleic acid molecules comprises DNA, RNA, or cDNA; optionally wherein said DNA or said RNA is a cell-free nucleic acid molecule.

Embodiment P53 the method of any one of embodiments P47-P51, wherein the one or more linear nucleic acid molecules comprises RNA or cDNA and the fusion gene comprises an exon junction.

Embodiment P54 the method of any one of embodiments P47-P51, wherein the one or more linear nucleic acid molecules comprises RNA or cDNA and the fusion gene comprises an exon junction formed by alternative splicing.

Embodiment P55 the method of any one of embodiments P47-P51, wherein the one or more linear nucleic acid molecules comprises RNA or cDNA and the fusion gene comprises an exon junction formed by a splice defect.

Embodiment P56 the method of any one of embodiments P47-P55, wherein the fusion gene comprises an inter-or intra-chromosomal translocation.

Embodiment P57 the method of embodiment P56, wherein the intrachromosomal translocation comprises a partially or fully rearranged B cell or T cell antigen receptor.

Embodiment P58 the method of any one of embodiments P47-P57, wherein the blocking element comprises an oligonucleotide, a protein, or a combination thereof.

Embodiment P59 the method of any one of embodiments P47-P57, wherein the one or more linear nucleic acid molecules are about 20 to about 1000 nucleotides in length, about 100 to about 300 nucleotides in length, about 300 to about 500 nucleotides in length, or about 500 to about 1000 nucleotides in length.

Embodiment P60 the method of any one of embodiments P47-P59, wherein the blocking element binds to about 1 to 150 nucleotides upstream relative to the first primer.

Embodiment P61 the method of any one of embodiments P47-P59, wherein the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 1 to 100 nucleotides, downstream of a fusion junction within the fusion gene.

Embodiment P62. The method of any one of embodiments P47-P59, wherein the first primer and the second primer hybridize to complementary sequences of the one or more fused circular template polynucleotides and the one or more non-fused circular template polynucleotides, wherein the first primer and the second primer are from about 1 to about 50 nucleotides.

Embodiment P63 the method of any one of embodiments P47-P62, further comprising binding a second blocking element to the one or more non-fusion circular template polynucleotides downstream relative to the second primer.

Embodiment P64 the method of embodiment P63, wherein the second blocking element binds to about 100 to about 300 nucleotides downstream relative to the second primer.

Embodiment P65 the method of any one of embodiments P47-P64, further comprising repeating steps ii) and iii).

Embodiment P66 the method of any one of embodiments P47-P65, further comprising: detecting the length of the non-fusion polynucleotide amplification product and the length of the fusion polynucleotide amplification product; detecting one or more probes bound to the non-fusion polynucleotide amplification product and the fusion polynucleotide amplification product; or sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product.

Embodiment P67 the method of embodiment P66, wherein sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product produces one or more sequencing reads.

Embodiment P68 the method of embodiment P67, further comprising aligning the substring of one or more sequencing reads to a reference sequence.

Embodiment P69 the method of embodiment P67, further comprising comparing the k-mer substring of the one or more sequencing reads to a k-mer table of a fusion gene reference.

Embodiment P70 the method of embodiment P67, further comprising: grouping one or more sequencing reads based on the barcode sequence and/or the sequence comprising the fusion gene; and within the set, aligning the reads and forming a consensus sequence of reads having the same barcode sequence and/or sequences comprising the fusion gene.

Embodiment P71 the method of embodiment P66, wherein sequencing further comprises: generating one or more sequencing reads comprising a circularized junction formed between the 5 'and 3' ends of the linear nucleic acid molecule; and quantifying the number of different circularized junction sequences comprising said fusion gene.

Further embodiments

The present disclosure provides the following additional illustrative embodiments.

Example 1. A method of differentially amplifying a polynucleotide comprising a fusion gene relative to a polynucleotide not comprising the fusion gene, the method comprising: i) Circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises the fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise the fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides; ii) binding a blocking element to the one or more non-fused circular template polynucleotides; and iii) hybridizing a first primer and a second primer to the one or more non-fusion circular template polynucleotides and the one or more fusion circular template polynucleotides and extending with a polymerase to produce a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product, wherein the first amount is detectably less than the second amount; whereby the polynucleotides comprising the fusion gene are differentially amplified.

Embodiment 2. The method of embodiment 1, wherein binding the blocking element comprises binding the blocking element upstream of the first primer.

Embodiment 3. The method of embodiment 1 or 2, wherein the second amount is about 1%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 75% greater than the first amount.

Embodiment 4. The method of embodiment 1 or 2, wherein the second amount is about 2 times, at least about 1.5 times, at least about 2.0 times, at least about 2.5 times, at least about 5 times, at least about 10 times, or more than about 10 times the first amount.

Embodiment 5. The method of any one of embodiments 1 to 4, further comprising detecting the first amount of non-fusion polynucleotide amplification product and the second amount of fusion polynucleotide amplification product.

Embodiment 6. The method of any one of embodiments 1 to 5, wherein the one or more linear nucleic acid molecules comprise DNA, RNA, or cDNA; optionally wherein said DNA or said RNA is a cell-free nucleic acid molecule.

Embodiment 7. The method of any one of embodiments 1 to 5, wherein the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction.

Embodiment 8. The method of any one of embodiments 1 to 5, wherein the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction formed by alternative splicing.

Embodiment 9. The method of any one of embodiments 1 to 5, wherein the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction formed by a splice defect.

Embodiment 10. The method of any one of embodiments 1 to 9, wherein the fusion gene comprises an inter-or intra-chromosomal translocation.

Embodiment 11. The method of embodiment 10, wherein the intrachromosomal translocation comprises a partially or fully rearranged B cell or T cell antigen receptor.

Embodiment 12. The method of any one of embodiments 1 to 11, wherein the blocking element comprises an oligonucleotide, a protein, or a combination thereof.

Embodiment 13. The method of any one of embodiments 1 to 11, wherein the one or more linear nucleic acid molecules are about 20 to about 1000 nucleotides in length, about 100 to about 300 nucleotides in length, about 300 to about 500 nucleotides in length, or about 500 to about 1000 nucleotides in length.

Embodiment 14. The method of any one of embodiments 1 to 13, wherein the blocking element binds to about 1 to 150 nucleotides upstream relative to the first primer.

Embodiment 15. The method of any one of embodiments 1 to 13, wherein the first primer hybridizes to the one or more fusion circular template polynucleotides, i.e., about 1 to 100 nucleotides, downstream of a fusion junction within the fusion gene.

Embodiment 16. The method of any one of embodiments 1 to 13, wherein the first primer and the second primer hybridize to complementary sequences of the one or more fused circular template polynucleotides and the one or more non-fused circular template polynucleotides, wherein the first primer and the second primer are about 1 to about 50 nucleotides apart.

Embodiment 17. The method of any one of embodiments 1 to 16, further comprising binding a second blocking element to the one or more non-fusion circular template polynucleotides downstream relative to the second primer.

Embodiment 18. The method of embodiment 17 wherein the second blocking element binds to about 100 to about 300 nucleotides downstream relative to the second primer.

Embodiment 19. The method of any one of embodiments 1 to 18, further comprising repeating steps ii) and iii).

Embodiment 20. The method of any one of embodiments 1 to 19, further comprising iv) amplifying the one or more non-fusion circular template polynucleotides to produce a third amount of non-fusion polynucleotide amplification products; and amplifying the one or more fusion circular template polynucleotides to produce a fourth quantity of fusion polynucleotide amplification products, wherein the third quantity and the fourth quantity are substantially the same.

Embodiment 21. The method of embodiment 20, wherein amplifying the one or more non-fused circular template polynucleotides comprises hybridizing a third primer and a fourth primer to the one or more non-fused circular template polynucleotides and extending both primers with a polymerase, and wherein amplifying the one or more fused circular template polynucleotides comprises hybridizing a third primer and a fourth primer to the one or more fused circular template polynucleotides and extending both primers with a polymerase.

Embodiment 22. The method of embodiment 21 wherein the third primer hybridizes upstream of the target sequence and the fourth primer hybridizes downstream of the target sequence, wherein the target sequence comprises a single nucleotide variant, an insertion, a deletion, an internal tandem repeat, or a copy number variant.

Embodiment 23. The method of any of embodiments 1 to 22, further comprising: detecting the length of the non-fusion polynucleotide amplification product and the length of the fusion polynucleotide amplification product; detecting one or more probes bound to the non-fusion polynucleotide amplification product and the fusion polynucleotide amplification product; or sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product.

Embodiment 24. The method of embodiment 23, wherein sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product produces one or more sequencing reads.

Embodiment 25. The method of embodiment 24, further comprising aligning the substring of one or more sequencing reads to a reference sequence.

Embodiment 26. The method of embodiment 24, further comprising comparing the k-mer substring of the one or more sequencing reads to a k-mer table of a fusion gene reference.

Embodiment 27. The method of embodiment 24, further comprising: grouping one or more sequencing reads based on the barcode sequence and/or the sequence comprising the fusion gene; and within the set, aligning the reads and forming a consensus sequence of reads having the same barcode sequence and/or sequences comprising the fusion gene.

Embodiment 28. The method of embodiment 23, wherein sequencing further comprises: generating one or more sequencing reads comprising a circularized junction formed between the 5 'and 3' ends of the linear nucleic acid molecule; and quantifying the number of different circularized junction sequences comprising said fusion gene.

Example 29 a kit comprising: a circularizing agent, wherein the circularizing agent is capable of binding the 5 'and 3' ends of a linear nucleic acid molecule; a blocking element capable of binding to one or more circular polynucleotides; a first primer and a second primer; and a polymerase.

Embodiment 30. A method of amplifying a polynucleotide comprising a fusion gene, the method comprising: i) Binding a blocking element to a non-fusion circular template polynucleotide, wherein the non-fusion circular template does not include the fusion gene; ii) hybridizing a first primer and a second primer to the non-fusion circular template polynucleotide; and hybridizing the first primer and the second primer to a fusion circular template polynucleotide, wherein the fusion circular template polynucleotide comprises the fusion gene; and iii) extending the first primer and the second primer with a non-strand displacement polymerase to produce a fusion polynucleotide amplification product.

Embodiment 31. The method of embodiment 30, wherein binding the blocking element comprises binding the blocking element upstream of the first primer.

Embodiment 32. The method of any one of embodiments 30 to 31, further comprising detecting the fusion polynucleotide amplification product.

Embodiment 33 the method of any one of embodiments 30 to 32, wherein the circular template polynucleotide (e.g., non-fused circular template polynucleotide and/or the fused circular template polynucleotide) comprises DNA, RNA, or cDNA; optionally wherein said DNA or said RNA is a cell-free nucleic acid molecule.

Embodiment 34. The method of any one of embodiments 30 to 32, wherein the circular template polynucleotide (e.g., non-fused circular template polynucleotide and/or the fused circular template polynucleotide) is RNA or cDNA and the fusion gene comprises an exon junction.

Embodiment 35 the method of any one of embodiments 30-32, wherein the circular template polynucleotide (e.g., non-fused circular template polynucleotide and/or the fused circular template polynucleotide) is RNA or cDNA and the fusion gene comprises an exon junction formed by alternative splicing.

Embodiment 36. The method of any one of embodiments 30 to 32, wherein the circular template polynucleotide (e.g., non-fused circular template polynucleotide and/or the fused circular template polynucleotide) is RNA or cDNA and the fusion gene comprises an exon junction formed by a splice defect.

Embodiment 37 the method of any one of embodiments 30-36, wherein the fusion gene comprises an inter-or intra-chromosomal translocation.

Embodiment 38. The method of embodiment 37, wherein the intrachromosomal translocation comprises a partially or fully rearranged B cell or T cell antigen receptor.

Embodiment 39. The method of any one of embodiments 30 to 38, wherein the blocking element comprises an oligonucleotide, a protein, or a combination thereof.

Embodiment 40. The method of any one of embodiments 30 to 39, wherein the blocking element binds to about 1 to 150 nucleotides upstream relative to the first primer.

Embodiment 41. The method of any one of embodiments 30 to 40, wherein the first primer hybridizes to the fusion circular template polynucleotide, i.e., about 1 to 100 nucleotides, downstream of a fusion junction within the fusion gene.

Embodiment 42. The method of any one of embodiments 30 to 40, wherein the first primer and the second primer hybridize to complementary sequences of the fusion circular template polynucleotide and the non-fusion circular template polynucleotide, wherein the first primer and the second primer are about 1 to about 50 nucleotides apart.

Embodiment 43. The method of any one of embodiments 30 to 42, further comprising binding a second blocking element to the non-fusion circular template polynucleotide downstream relative to the second primer.

Embodiment 44. The method of embodiment 43, wherein the second blocking element binds to about 100 to about 300 nucleotides downstream relative to the second primer.

Embodiment 45. The method of any of embodiments 30 to 44 further comprising repeating steps i), ii) and iii).

Embodiment 46. The method of any of embodiments 30 to 45, further comprising: iv) removing the blocking element and amplifying the non-fused circular template polynucleotide to produce a plurality of non-fused polynucleotide amplification products; and amplifying the fused circular template polynucleotide to produce an additional fused polynucleotide amplification product.

Embodiment 47. The method of embodiment 46, wherein amplifying the non-fused circular template polynucleotide comprises hybridizing a third primer and a fourth primer to the non-fused circular template polynucleotide and extending both primers with a polymerase, and wherein amplifying the fused circular template polynucleotide comprises hybridizing a third primer and a fourth primer to the fused circular template polynucleotide and extending both primers with a polymerase.

Embodiment 48. The method of embodiment 47, wherein the third primer hybridizes upstream of the target sequence and the fourth primer hybridizes downstream of the target sequence, wherein the target sequence comprises a single nucleotide variant, an insertion, a deletion, an internal tandem repeat, or a copy number variant.

Embodiment 49 the method of any one of embodiments 30 to 48, further comprising detecting the length of the fusion polynucleotide amplification product, detecting one or more probes that bind to the fusion polynucleotide amplification product, or sequencing the fusion polynucleotide amplification product.

Embodiment 50. The method of embodiment 49, wherein the sequencing of the fusion polynucleotide amplification product results in one or more sequencing reads.

Embodiment 51. The method of embodiment 50, further comprising aligning the substring of one or more sequencing reads to a reference sequence.

Embodiment 52. The method of embodiment 50, further comprising comparing the k-mer substring of the one or more sequencing reads to a k-mer table of a fusion gene reference.

Embodiment 53 the method of embodiment 49, further comprising: grouping one or more sequencing reads based on the barcode sequence and/or the sequence comprising the fusion gene; and within the set, aligning the reads and forming a consensus sequence of reads having the same barcode sequence and/or sequences comprising the fusion gene.

Embodiment 54. The method of embodiment 49, wherein sequencing further comprises: generating one or more sequencing reads, the one or more sequencing reads comprising a circularized junction; and quantifying the number of different circularized junction sequences comprising said fusion gene.

Embodiment 55. The method of any one of embodiments 30 to 49, wherein prior to step i), the method comprises circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises the fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise the fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides.

Examples

Example 1 fusion detection by template cyclization and multiplex PCR

Fusion is a somatic change that can lead to cancer, is associated with up to 20% of cancer incidence, and has carcinogenesis in blood, soft tissue, and solid tumors (Foltz SM et al Nature Comm 2020; 11:2666). Translocation, copy number changes, and inversion may lead to fusion, deregulation of gene expression, and novel molecular functions. Next Generation Sequencing (NGS) methods for gene fusion detection may employ non-targeted sequencing (e.g., whole genome or whole transcriptome sequencing) or targeted sequencing of the fusion gene of interest. Targeting methods for gene fusion detection can simplify analysis and reduce cost, and thus have become the leading method of clinical application. Popular methods for targeted sequencing of gene fusions include multiplex PCR, in which primer sets are designed to generate PCR amplicons spanning known breakpoint junctions (e.g., maher CA et al Nature 2009;458 7234: 97-101 and Oncomine tests); anchored Multiplex PCR (AMP), wherein one or more targeting primers are used in combination with the ligated universal primer aptamer to enable PCR amplification of the breakpoint of interest (e.g., archerDx); and methods for enriching breakpoint regions of interest using hybridization capture. In targeting methods, multiplex PCR provides high sensitivity and sequencing efficiency, but fails to identify fusions involving novel breakpoints and partners; AMPs enable detection of known and new fusions, but with relatively high input requirements and more complex workflows, typically limited to RNA analysis; hybridization capture has a relatively complex workflow and reduced sensitivity compared to PCR-based methods. For targeted and non-targeted approaches, robustness to sample degradation is often critical due to the widespread use of FFPE preserved tissue and cfDNA as input materials. Thus, there is a need for a method that enables highly sensitive targeted analysis of gene fusion with minimal workflow complexity and input requirements and robustness to highly degraded materials.

The compositions and methods described herein provide an effective solution for sequencing to achieve genetic variations such as SNV, insertion/deletion, and gene fusion, including targeted sequencing involving novel partners and genetic variations derived from novel breakpoints. The method enables high detection sensitivity for degraded materials through a simplified workflow. Importantly, the method can be applied to analyze nucleic acids extracted in bulk from a sample source (e.g., cfDNA from plasma, nucleic acids from FFPE preserved tissue samples, or nucleic acids extracted from peripheral blood leukocytes) or materials derived from common single cell library preparation systems. Described in detail herein in various embodiments, the method consists of the steps of: (1) circularizing nucleic acid derived from the sample; (2) Amplifying circularized nucleic acid derived from one or more targets of interest; and (3) analyzing the amplified fragments by Next Generation Sequencing (NGS).

In one embodiment, a workflow is presented to achieve targeted amplification of nucleic acids for analysis of gene fusions, including gene fusions involving novel partners or breakpoints. Briefly, the workflow begins with the extraction of a large amount of nucleic acids from a sample. RNA, DNA, or total nucleic acid (RNA and DNA) may be extracted using methods known in the art. If RNA is extracted, the RNA can be converted to cDNA using methods known in the art (e.g., oligonucleotide-dT cDNA synthesis, cDNA synthesis via random hexamers, targeted cDNA synthesis via gene specific primers). The DNA molecule may optionally be fragmented into an average length of about 150 base pairs. Fragmentation can be achieved by methods known in the art (e.g., enzymatic fragmentation, acoustic fragmentation). Next, the method known in the art (e.g., circLigase ^TM ) Or the methods described herein, circularizing the ssDNA fragments by enzymatic ligation of the 5 'and 3' ends. In some embodiments, circularization is facilitated by denaturing the double stranded nucleic acid prior to circularization. In an embodiment, the linear DNA fragment is a-tailed (e.g., a-tailed using Taq DNA polymerase) prior to circularization. Residual linear DNA molecules may optionally be digested. This isThis can be accomplished by methods known in the art (e.g., treatment with Exo I and/or Exo III).

Following circularization, nucleic acids are fusion amplified from the gene of interest using an outward-facing oligonucleotide primer that targets the fusion gene partner of interest adjacent to the desired breakpoint location (e.g., similar to an inverse PCR reaction) in combination with a 5' blocking element (e.g., a non-extendible oligonucleotide) that specifically binds to the sequence of the unrearranged fusion gene partner of interest adjacent to and opposite the desired breakpoint junction (fig. 1-3). The blocking element will not bind to the template containing the translocation at the intended breakpoint. Optionally, additional 3' blocking elements targeting the gene of interest distal to the breakpoint junction may be included (fig. 2 and 3). Typically, the blocking element has a Tm similar to or higher than that of the outward facing primer to ensure that it can bind the primer and prevent extension of the primer. The distance of the 5 'block may be within about 50bp of the fusion junction, and in some embodiments, the optional 3' block may be within about 100bp to about 200bp of the fusion junction. Typically, the optional 3 'blocker is farther from the fusion junction than the 5' blocker. PCR achieves preferential amplification of templates containing rearrangements. The resulting amplicon contained a junction derived from template circularization ("circularization junction") and corresponding to the sample breakpoint (fig. 4). The cyclized splice sites can be used to quantify the number of template copies and optionally error correction.

Amplification of unfused genes: as an internal control and to further assess the relative abundance of amplified fusion gene nucleic acids, the amplification of nucleic acids derived from one or more unrearranged (e.g., control) templates of interest can be performed within the same PCR reaction using outward facing primers, but omitting the blocking elements described. Alternatively, in some embodiments, it may be advantageous to include a positive control to avoid false negative results. Furthermore, in some embodiments, outward facing primers are included in a target region of a human genome or cDNA, wherein clinically relevant SNV, insertion/deletion, or copy number variants are known to occur. In some embodiments, the region of interest may comprise a cDNA derived from a gene having a deregulated expression in cancer, and/or a gene whose expression is largely unchanged (e.g., a housekeeping gene), to aid in analysis of gene expression. Analysis of such targets can be performed in the same PCR reaction using outward facing primers, but omitting the blocking oligomers described. In still other embodiments, the outward-facing primer targeting the fusion of interest is used in combination with an inward-facing primer targeting a region of interest of a human genome or cDNA, wherein clinically relevant SNV, insertion/deletion, internal tandem repeat, or copy number variants are known to occur as part of a multiplex PCR set. FIG. 11A shows an example in which two pairs of overlapping inward facing primers (e.g., 1F and 1R and 2F and 2R) are used to amplify a target region, resulting in three amplification products (e.g., three PCR products: amplicon 1 (amplification product of 1F and 1R primer pair), amplicon 2 (amplification product of 2F and 2R primer pair), and largest amplicon (amplification product of 1F and 2R primer pair), as described in U.S. patent publication No. 2016/0340746, which is incorporated herein by reference in its entirety, because of the low efficiency of amplification caused by the stable secondary structure, the minimal amplicon is inhibited from being produced by the 2F and 1R primer pairs.

By "overlapping primers" is meant, for example, that two pairs of primers (e.g., 1F and 1R, and 2F and 2R in fig. 11A) have overlapping target regions of target nucleic acid (e.g., the 1F and 1R amplification products will comprise portions of sequences that are also comprised in the 2F and 2R amplification products). For example, as shown in FIG. 11A, the 2F primer is located upstream of and adjacent to the 1R primer, while the 2R primer is located downstream of the 1R primer, thereby resulting in overlapping amplification products, wherein the region where the 2F and 1R primers are in contact and between will be shared between amplicon 1 and amplicon 2.

FIG. 11B shows expected amplification products from an example of amplification in which internal tandem repeats are performed with the primer pairs of FIG. 11A (e.g., 1F and 1R and 2F and 2R) when using a linear template. The amplification products are identical to those of the non-duplicate template in FIG. 11A (e.g., amplicon 1, amplicon 2, and the largest amplicon), excluding detection of tandem repeat events. FIG. 11C shows the expected amplification products of an example of amplification with the primer pair of FIG. 11A (e.g., 1F and 1R and 2F and 2R) for internal tandem repeat when using a circularized template. The amplification product now comprises duplicate specific amplicons (e.g., the amplification products of the 2R and 1F primer pairs). The duplicate specific amplicon is identified by the presence of a unique primer pair present in the amplicon and a circularized junction within the amplicon (represented by dashed lines). In this scenario, an inverse PCR product can be formed that clearly identifies the replication event.

Inward facing primers: while outward facing primers are particularly useful for determining novel gene fusion partners, it may also be useful to perform targeted gene sequencing to identify somatic mutations (e.g., SNPs associated with disturbed cell status). In particular, inward-facing primers (e.g., standard PCR primers) are used that target a region of interest containing known somatic alterations associated with the diseased state. In embodiments, the outward-facing primers targeting the fusion of interest are used in combination with inward-facing primers targeting regions of the human genome or cDNA, wherein clinically relevant SNVs or SNPs, insertion/deletions or Copy Number Variants (CNVs) are known to occur, e.g., as part of a multiplex PCR set (see, e.g., fig. 10). Like the outward facing primers, the inward facing primers contain target specific sequences, and optionally sequences for downstream library preparation and analysis. In embodiments, the inward-facing primers amplify the region of interest in the absence of the fusion gene (e.g., using the inward-facing primers to target regions other than exon breakpoints and/or fusion gene partners with known somatic mutations). In embodiments, the inward-facing primer targets a region of interest in the fusion gene transcript (e.g., the inward-facing primer targets one or more regions of the fusion gene transcript, wherein the one or more regions may be in different or the same genes). In embodiments, the inward-facing primer targets a different gene than the outward-facing primer (e.g., the inward-facing primer targets one gene of the fusion transcript and the outward-facing primer targets another gene of the fusion transcript). The inward-facing primers and the outward-facing primers may, for example, be contained in the same amplification reaction, or they may be combined into separate reactions (e.g., an amplification reaction consisting of only the inward-facing primers and an amplification reaction consisting of only the outward-facing primers, wherein each amplification reaction uses the same circularized template).

Modification of blocking element: the blocking element selectively binds to the unrearranged template to inhibit extension of the primer sequence by the polymerase. In some embodiments, the blocking element consists of an oligomer with a reverse 3' dt, 3' dideoxycytidine, reversibly terminated 3' modification, or other modification of the 3' strand to prevent 3' extension by a polymerase ("blocking oligomer") and is used in conjunction with a non-strand displacing polymerase. In some embodiments, the blocking oligomer contains one or more non-natural bases (e.g., LNA bases) that facilitate hybridization of the blocking agent to the target sequence. In some embodiments, the blocking oligomer contains additional modified bases to increase resistance to exonuclease digestion (e.g., one or more phosphorothioate linkages). The blocking element need not be an oligomer; in some embodiments, for example, the blocking element is a protein that selectively binds to the target sequence and prevents polymerase extension. In embodiments, the blocking element prevents extension during suitable amplification/extension conditions.

Alternative methods for enriching templates containing fusions: certain amplification reaction conditions can provide variable inhibition of unfused templates with blocking elements described herein, wherein a small proportion of unfused amplification products are generated. Alternative methods that may be implemented or used in addition to blocking elements are contemplated herein, and that selectively eliminate or render any non-fused circular templates prior to amplification.

For example, CRISPR-mediated depletion of unwanted target sequences can be performed, wherein, for example, a CRISPR-Cas9 complex is introduced into a sample containing circularized ssDNA using a guide RNA that specifically targets a non-fusion sequence. The CRISPR-Cas9 complex then targets and cleaves the non-fusion sequences present in any circular ssDNA molecule. After linearizing the non-fused circular ssDNA molecules by CRISPR complexes, an exonuclease digestion can then be performed to digest the linear ssDNA molecules, thereby enriching the circular ssDNA molecules containing the fusion gene (e.g., lacking the non-fused gene sequence targeted by the guide RNA).

Alternatively, biotinylated blocking elements may be employed. After circularization, the biotinylated blocking element is hybridized to a non-fusion gene sequence. The circular ssDNA molecules hybridized to the biotinylated blocking elements are then pulled down using, for example, streptavidin-coated magnetic beads, thereby depleting any sample containing non-fused circular molecules prior to amplification.

As yet another alternative, blocking oligomers may be used as splints to enable restriction enzyme mediated digestion of non-fused circular ssDNA containing molecules into non-amplifiable linear fragments. The methylation-blocking oligomer can be used in combination with a methylation-sensitive restriction enzyme (e.g., notI, naeI, nsbI, salI, hapII or HaeII).

Sequencing of the amplified region of interest was performed by next generation sequencing instruments. In some embodiments, sequencing is accomplished by a single read greater than about 25 base pairs in length. In other embodiments, sequencing is accomplished by paired-end reads, wherein each read within a pair is greater than about 25 bases. After sequencing, error correction can be performed and involves creating a consensus read from sequences with shared circularized junction sequences.

Various suitable sequencing platforms can be used to carry out the methods disclosed herein (e.g., for performing sequencing reactions). Non-limiting examples include SMRT (single molecule real time sequencing), ion semiconductor, pyrosequencing, sequencing by synthesis, combined probe anchoring, SOLiD sequencing (sequencing by ligation), and nanopore sequencing. The sequencing platform comprises a sequencing platform provided by:(e.g., hiSeq ^TM 、MiSeq ^TM And/or Genome Analyzer ^TM Sequencing system), ion Torrent ^TM (e.g., ion PGM) ^TM And/or Ion Proton ^TM Sequencing System), pacific bioscience Co (Pacific Biosciences) (e.g., PACBIO RS II sequencing System), life Technologies ^TM (e.g., SOLiD sequencing system), roche company (Roche) (e.g., 454GS flx+ and/or GS Junior sequencing system). See, for example, U.S. patent 7,211,390, U.S. patent 7,244,559, U.S. patent 7,264,929, U.S. patent 6,255,475, U.S. patent 6,013,445, U.S. patent 8,882,980, U.S. patent 6,664,079, and U.S. patent 9,416,409.

Next, sequence reads are analyzed to assess the presence of variants of interest. In some embodiments, this may involve using public software to detect gene fusions (e.g., geneFuse; chen S et al J.International bioscience (Int. J. Biol. Sci.) 2018;14 (8): 843-848). In other embodiments, this may be accomplished by mapping reads to the genome and analyzing the localization of the reads (e.g., fig. 5). In still other embodiments, this may include mapping-independent and/or mapping-dependent methods, such as methods involving analysis of k-mer substrings (e.g., FIG. 6). Fig. 7 and 8 provide exemplary bioinformatic workflows for analyzing rearrangements, translocations, and CNVs using the same method.

Additional Fusion detection tools known in the art can be used for analytical sequencing reads, such as TRUP (Fernandez-Cuesta, L., sun, R., menon, R., et al, breakpoint assembly using transcriptome sequencing data to identify novel Fusion genes for lung cancer (Identification of novel Fusion genes in lung cancer using breakpoint assembly of transcriptome sequencing data), genome biology (Genome Biol) 16,7 (2015)), chimerascan (Maher CA, palaniamy N, brenner JC, cao X, kalylana-Sundaram S, luo S et al, chimeric transcripts were found by paired end transcriptome sequencing (Chimeric transcript discovery by paired-end transcriptome sequencing), fusion Hur (Li Y, chien J, smith DI, ma J.fusion Hunter); identification of Fusion transcripts in cancer using paired-end RNA-seq (Fusion Hunter: identifying Fusion transcripts in cancer using paired-end RNA-seq) & Bioinformatics (Bioinformatics) & 2011; 27:1708-10), fusion map (Ge H, liu K, juan T, fang F, newman M, hoeck W.fusion map: detection of Fusion genes from next generation sequencing data at base pair resolution (Fusion map: detecting Fusion genes from next-generation sequencing data at base-pair resolution) & Bioinformatics. 2011; 27:1922-8), topHat-Fusion (Kim D, salzberg SL.TopHat Fusion: algorithms for discovery of novel Fusion transcripts (TopHat-Fusion: an algorithm for discovery of novel Fusion transcripts), "Genome biology (Genome biol.)," 2011; 12:R72), deFuse (McPherson A, hormozdiari F, zayed A, giuliana ny R, ha G, sun MGF et al, deFuse: algorithm for finding gene fusion in tumor RNA-Seq data (deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data) & lture science library: computational biology (PLoS Comp biol.)) 2011; 7:e1001138), SOAPfuse (Jia W, qia K, he M, song P, zhou Q, zhou F et al, SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data (SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data) & genome biology 2013; 14:R12), fusion seq (Sboner A, habegger L, pflueger D, terry S, chen DZ, rozowsky JS et al, fusion seq: modular framework for searching for gene fusions by analyzing paired-end RNA sequencing data (fusion seq: a modular framework for finding genefusions by analyzing paired-end RNA-sequencing data) & genome biology 2010; 11:R104) and Breakfusion (Chen K, wallis JW, kandoth C, kalicki-Veizer JM, mungall KL, mungall AJ et al, breakfusion): identification of gene fusions based on targeted assembly in whole transcriptome paired-end sequencing data (BreakFusion: targeted assembly-basedidentification of gene fusions in whole transcriptome paired-end sequencingdata) & biology, 2012; 28:1923-4).

IGH VDJ rearrangement and easy-to-place analysis: as an exemplary use case, a workflow is proposed to achieve targeted amplification of nucleic acids for simultaneous analysis of IGH V (D) J rearrangements and translocations involving the IGH J gene. Unlike conventional multiplex PCR methods for amplifying VDJ rearrangements, the described methods: (1) Clone loss (dropout) due to somatic hypermutation in the variable gene region is avoided; (2) enabling detection of an igh j translocation; (3) reducing the number of primers required; and (4) enable passage through the ringAnalysis of the chemosynthesis sites was performed for error correction and template quantification (fig. 7). Briefly, the workflow begins with extracting sample gDNA using methods known in the art. The gDNA molecule can optionally be fragmented to an average length of about 200 base pairs, for example, if the gDNA is derived from peripheral blood leukocytes or fresh frozen tumor biopsies. After fragmentation, by CircLigase ^TM Or a similar approach, circularizing the template, and then selectively amplifying the IGH rearrangement using an IGH j targeting primer binding blocking oligomer. As an example, a suitable primer design strategy for selectively amplifying an igh j rearrangement is presented in fig. 8.

Analysis: fig. 9 shows an overview of the bioinformatics workflow for analyzing B cell rearrangements by the described method. Amplification of the IGH, IGK and IGL loci is followed by next generation sequencing. The resulting reads were filtered to remove short and off-target products, cyclized junctions were identified, unique sequences collapsed, and then annotated for the presence of V (D) J rearrangements by IgBLAST (Ye et al, 2013doi:10.1093/nar/gkt 382) or similar tools. Reads with effective V (D) J rearrangements are used to determine the frequency of rearrangements and estimate the template count as the number of unique circularized junctions associated with a given rearrangement. A panel of identified V (D) J rearrangements is evaluated using methods known in the art (e.g., lay et al, utility laboratories (Practical Laboratory Medicine), vol. 22, 2020, e 00191) to identify cloned rearrangement markers consistent with the presence of B cell malignancies. Such markers may be used for longitudinal monitoring of residual disease. The presence of a translocation of reads lacking identifiable V (D) J rearrangements is assessed using k-mer analysis or methods known in the art (e.g., geneFuse). Finally, a report is generated indicating the V (D) J clonality and translocation status of the sample, or in the case of residual disease monitoring, whether a marker rearrangement is detected in the sample.

Single cell analysis: the compositions and methods described herein are compatible with common single cell barcode approaches, allowing detection of gene fusion events with single cell resolution to potentially reveal clinically relevant tumor heterogeneity. Single cell fusion assays may be part of a broader analysis pipeline to detect and report other cancer variants, such as CNV and SNV.

Single cell nucleic acid preparation: the target polynucleotides are isolated from a population of cells using methods known in the art. For example, a typical workflow comprises the following steps: 1) Single cells are individually divided into droplets (e.g., sub-nanoliter droplets); 2) Introducing bar code encoded beads and amplification reagents; 3) Cell lysis, protease digestion, cell barcode encoding and targeted amplification occur within the droplet; 4) The droplets are then broken and the DNA encoded by the barcode extracted for additional amplification and/or library preparation steps; 5) The final library was purified and ready for sequencing. Single cell library preparation schemes may also be used, including commercial solutions, for example, those offered by 10X Genomics (10X Genomics) and/or by the Mision Bio (Mision Bio).

Circularization of nucleic acid from the sample: during the circularization, the 5 'end of the nucleic acid molecule is linked to the 3' end of the molecule. In one embodiment, the ligase (e.g., circLigase ^TM Or T4 DNA ligase) for circularization of nucleic acids (DNA or RNA may be circularized). Where RNA (e.g., mRNA) is the target of circularization, the RNA is optionally converted to cDNA by reverse transcription. Optionally, after cyclization, residual linear molecules can be removed by exonuclease treatment. In addition, any circularized fragment containing an undesired sequence can be depleted from a library of circularized fragments, for example, by hybridization-based pull down using a probe targeting the undesired sequence, or CRISPR-mediated linearization of a circularized fragment containing the undesired sequence, followed by exonuclease treatment (see, e.g., U.S. patent publication 2019/0161752). The use of circularized template material may facilitate multiplex PCR even if used in combination with only conventional inward-facing PCR primers, as the circularized material lacks free 3' dna ends that may trigger non-specific amplification. Circularized DNA may enable more targeted amplification when used as templates for inward-facing primers and/or outward-facing primers in a PCR method, as compared to linear DNA.

Sequencing: the amplified nucleic acids are sequenced to determine the presence of one or more gene fusion events. The reading of the sequence may be accomplished using any suitable commercial sequencing mode, for example, in a preferred embodiment, using a next generation sequencing instrument. Reading sequences can also be accomplished using Sanger sequencing or other low throughput methods. The frequency of reads supporting the fusion gene can optionally be compared to the frequency of reads supporting unfused (i.e., wild-type or normal) copies of one or more of the donor or acceptor genes to determine the relative abundance of the gene fusion nucleic acid and whether sufficient read support is present to conclude that the sample contains gene fusion.

Example 2T cell receptor Convergence as biomarkers

An adaptive immune response comprises a selective response of B cells and T cells that recognize an antigen. Immunoglobulin genes encoding antibody (Ab, in B cells) and T cell receptor (TCR, in T cells) antigen receptors comprise complex loci in which extensive receptor diversity occurs due to recombination of the corresponding variable (V), diversity (D) and junction (J) gene fragments, and subsequent somatic hypermutation events during early lymphoid differentiation. After engagement of the TCR by homologous antigen, T lymphocytes upregulate many activation markers and develop a variety of effector functions, including proliferation, cytotoxicity, and cytokine production. Knowledge of TCR amino acid sequences enables tracking of specific T cell clones in circulating and peripheral tissues, which significantly facilitates monitoring, e.g., virus-specific T cell immunity, and enables differential diagnosis and targeted treatment of T cell-related disorders. Thus, a comprehensive assessment of the clonal composition of antigen-specific T cells can provide important information about cellular immunity in the context of vaccination, tumor control, or viral disease, and is of great importance for clinical assessment and management (see, e.g., dziubiana U.S. J.Transmount.) (2013; 13 (11): 2842-54).

Existing NGS methods for identifying TCR sequences include those that rely on comparing each sequencing read to, for example, vβ and jβ reference sequences. Alternatively, antigen-specific TCR convergence can be determined, which does not require the use of a large database to decode TCRs. This approach relies on the observation of TCRs that are similar or identical at the amino acid level but different at the nucleotide level, indicating that multiple T cell clones independently undergo VDJ recombination and expand in response to a common antigen. TCR convergence was observed to be an indication that a given TCR might respond to antigen presented over an extended period of time, giving different T cell clones the opportunity to independently proliferate in response to antigen. In the context of cancer, a converged TCR may be enriched for those that recognize tumor antigens. For example, in one study of dendritic cells to treat melanoma, the frequency of TCR convergence at baseline was observed to be highly predictive of therapeutic response (see Storkus WJ et al, cancer immunotherapy (J. ImmunotherCancer); 2021;9 (11): e 003675), which is incorporated herein by reference in its entirety). Similar findings have been reported (see Naidus E et al Cancer immunology immunotherapy (Cancer immunother.)) 2021;70 (7): 2095-2102) wherein peripheral blood TCR convergence after PD-L1 blockade is directly related to patient outcome in patients with advanced non-small cell lung Cancer. Data from these studies indicate that TCR convergence in peripheral blood T cells can represent an operable biomarker for: (1) Identifying patients most likely to respond to an immunotherapeutic intervention that mechanically requires a T cell response to achieve a preferred clinical outcome; and (2) effective longitudinal monitoring of therapeutically significant T cell responses in a patient receiving the treatment.

As used herein, a "convergent TCR set" is a set of T Cell Receptors (TCRs) that are similar in amino acid sequence and functionally equivalent or identical or hypothesized to be identical in amino acid sequence. Because of amino acid similarity, it is generally assumed that the convergent TCR sets recognize the same antigen. In some embodiments, the converging TCR panel members are identical or assumed to be identical in the variable gene and CDR3 amino acid sequences, despite having different nucleotide sequences. Convergent TCR panel members may be caused by differences in non-templated nucleotide bases at VDJ junctions that occur during the generation of productive TCR gene rearrangements.

Provided herein are methods for performing a multiplex amplification reaction to amplify target immune receptor nucleic acid template molecules (e.g., TCR molecules) derived from a biological sample, wherein the multiplex amplification reaction comprises a plurality of amplification primer pairs comprising a plurality of junction (J) gene primers for a majority of J genes of the target immune receptor, thereby generating target immune receptor amplicon molecules comprising the target immune receptor pool. Using the methods and outward facing J gene targeting primers described herein and in example 1, TCR development at baseline and in response to antigen can be assessed. To assess TCR convergence, for example, it is determined that TCR β chains are identical in amino acid sequence but have different nucleotide sequences.

Such methods further comprise: sequencing the target immune receptor pool amplicon; identifying immunoreceptor clones from said sequencing, and identifying converging immunoreceptor clones in said immunoreceptor clones, wherein said converging immunoreceptor clones have similar or identical amino acid sequences and different nucleotide sequences; and determining the frequency of converging immune receptor clones in the sample. Subsequent clinical decisions may combine the information obtained about TCR convergence with the potential therapeutic approach sought. Additional methods of TCR convergence analysis are described elsewhere, for example, in U.S. patent publication 2021/0108268, which is incorporated herein by reference in its entirety. These methods provide an efficient way to determine TCR convergence using multiplex primers, e.g., outward facing primers as described herein, and allow for determination of T cell clone VDJ recombination and expansion in response to a common antigen across multiple independent T cell clones.

Example 3 fusion detection for Minimal Residual Disease (MRD) monitoring

The use of standardized multi-agent chemotherapy regimens with risk-adapted intensity greatly promotes a gradual increase in survival in children with Acute Lymphoblastic Leukemia (ALL). Initial therapeutic response by continuous quantitative measurement of Minimal Residual Disease (MRD) has been demonstrated to be one of the strongest independent prognostic factors for pediatric ALL, and has been implemented in most of the treatment protocols currently in use. In the netherlands, MRD monitoring forms the main basis of risk component stratification since 2004, and is performed using real-time quantitative polymerase chain reaction (RQ-PCR) analysis of rearranged Immunoglobulin (IG) and T cell receptor (TR) genes. The method is highly standardized in the international union. However, in about 5% of cases, MRD classification is not feasible because PCR detectable targets cannot be identified, or because the targets do not reach the required sensitivity (see Pieters R et al J.Clin. Oncol.) (2016; 34 (22): 2591-601). In addition, IG/TR rearrangements may be oligoclonal and thus may be lost during the disease. Therefore, MRD-based stratification is suboptimal for these patients, with risk of under-or over-treatment (see Szczepanski T et al Blood 2002;99 (7): 2315-23 and van der Velden WHJ et al Leukemia (Leukemia) 2002; 16:928-936). Fusion genes and gene deletions often act as the primary driver of leukemia occurrence and thus may be very stable during disease progression and suitable as alternative genomic MRD PCR targets. In contrast to fusion transcripts, these genomic fusion breakpoints are independent of gene activity and therefore have comparable quantitative kinetics compared to standard IG/TR targets (see Kupper RP et al J.liver disease journal (Br. J. Haemato.)) 2021;194 (5): 888-892, which is incorporated herein by reference in its entirety.

The use of gene fusion or deletion for MRD monitoring requires the identification of genomic breakpoints for these structural variants, which are unique for each patient. These breakpoints can be identified in a straightforward and unbiased manner based on Whole Genome Sequencing (WGS) data. As described in example 1, the targeting method for gene fusion detection can simplify analysis and reduce cost, and thus has become the leading method for clinical application. The compositions and methods described above and herein provide an effective solution for sequencing to achieve genetic variations such as SNV, insertion/deletion and gene fusion, including targeted sequencing of genetic variations involving novel partners and derived from novel breakpoints, particularly for MRD detection. Described in detail herein in various embodiments, the method consists of the steps of: (1) circularizing nucleic acid derived from the sample; (2) Amplifying circularized nucleic acid derived from one or more targets of interest; and (3) analyzing the amplified fragments by Next Generation Sequencing (NGS).

A method called pore occupancy (well occupancy method) has recently been described for estimating the absolute abundance of individual T cell clones or B cell clones and/or nucleic acids encoding individual TCRs and/or IGs in a large number (see U.S. patent No. 10,246,701, which is incorporated herein by reference in its entirety). Briefly, 10,000 PBMCs were allocated to each well of a 96-well plate. In each well, hole specific bar code (which is incorporated into each amplicon by PCR and tail primer) amplification and distribution is performed, then amplified molecules are sequenced together, and sequence reads are matched back to the starting well based on the bar code. Then, it is determined whether each unique sequence (with a specific CDR3 sequence) is present in each well such that each unique CDR3 sequence is assigned a well occupancy pattern. Obtaining maximum likelihood estimates of the number of molecules in the original sample using an occupancy-based method for each individual CDR3 sequence; these estimates are determined based only on the number of wells in which this immunoreceptor sequence was found. Thus, for each individual unique adaptive immune receptor sequence observed, the number of containers in which a particular biological sequence is found is determined.

The methods described herein for detecting gene fusion by circularization and inverse PCR primers can be applied using such pore occupancy methods. Briefly, 10,000 PBMCs (e.g., PBMCs extracted from patients for MRD detection) were dispensed into each well of a 96-well plate. Amplification is performed using inverse PCR primers as described herein, combined with 5' blocking elements (e.g., non-extendable oligonucleotides) that bind specifically to the sequence of the unordered fusion gene partner of interest adjacent to and opposite the intended breakpoint junction, and partitioning of the pore-specific barcodes (which are incorporated into each amplicon by PCR and tailed primers) is performed in each well. The amplified molecules are then sequenced together and sequence reads are matched back to the starting well based on the barcode. Then, it is determined whether each unique sequence (e.g., having a specific gene fusion sequence, such as an IGH locus) is present in each well such that each unique IGH locus sequence is assigned a well occupancy pattern. MRD may be determined based on the presence and/or absence of the unique gene fusion sequence. Combining the methods described herein with occupancy-based methods can achieve significantly higher MRD detection frequencies, e.g., with lower detection limits in conventional practice (e.g., most studies define MRD positives as at 0.01%, which is the detection limit of conventional detection, as described in Rocha JMC et al, journal of mediterranean hematology and infectious diseases (meditermor.j. Heat. Information. Dis.) 2016;8 (1): e2016024, which is incorporated herein by reference).

Fig. 12 shows the time aspect of the MRD test for Acute Lymphoblastic Leukemia (ALL). Each line represents the level of residual disease over time following therapeutic intervention (e.g., radiation and/or chemotherapy) at different time points monitored by different hypothetical patients after treatment. Response curves include DP (disease duration), VEP (very early relapse), ER (early relapse), LR (late relapse), VLR (very late relapse), and NR (no relapse). 10 ^-2 Represents the proportion of leukemia cells, which represents the approximate lower limit of detection of VER. Sub-microscopic disease detection (i.e., MRD) can generally detect the condition of VER, ER, and LR, where the proportion of leukemia cells ranges from about 10 ^-2 To about 10 ^-5 . The prior art methods are largely limited to detecting about 10 in a sample ^-6 Leukemia cells, which may be insufficient for patients who will die from VLR. The methods described herein allow for as low as 10 ^-5 To 10 ^-7 This facilitates detection of all therapeutic scenarios and in all cases.

The methods described herein enable detection of all frequencies in a sequencing efficient manner (e.g., at about 10 ^-2 To about 10 ^-7 Within all ranges of (c) malignancy-associated markers, making them suitable for both disease diagnosis and MRD analysis. The method described herein comprises, relative to existing commercial solutions(i.e., kits provided by adaptive biotechnology Co., ltd. (Adaptive Biotechnology, inc.) and +.>An additional advantage of the kit (provided by InvivoScribe, inc.) is that the methods described herein are capable of assessing IGH, IGK, and IGL locus rearrangements simultaneously in a single reaction. Existing solutions require a separate oneMultiplex PCR reactions, e.g., against IGH, IGK and IGL. The need to isolate PCR reactions increases the test complexity, cost and time associated with each diagnosis.

Example 4 determination of blocking oligomer efficiency

The efficiency of blocking the oligomer to target the region of the unrearranged igh j6 region was determined according to the methods described herein and in example 1. Fig. 13 shows the results of blocking element efficiency, as determined by gel electrophoresis analysis. Synthetic oligomers were generated to represent IGH rearrangements (fusion, F) and unrearranged IGH j6 genes (wild type, W). PCR amplification of each template was performed using reverse PCR primers with or without the presence (indicated by +/-) of a non-extendable blocking oligomer capable of hybridizing to the W template but not to the F template (blocking oligomer as shown in figure 1). The PCR amplification products were then visualized on agarose gels. In the absence of blocking oligomers, equivalent amounts of product were observed for fusion and wild-type templates. As expected, the addition of the blocker selectively reduced the product from the wild-type template.

Example 5 detection of breakpoint regions

Gene fusion is an important type of genetic variation in cancer, is associated with therapy selection, and serves as a marker for Measurable Residual Disease (MRD) monitoring. Conventional multiplex PCR (mPCR) cannot detect gene fusion with a novel partner or breakpoint. A novel mPCR technique is described herein for targeted detection of gene fusions, including gene fusions with unknown partners or breakpoints. Using Singular Genomics G4 ^TM A sequencing platform, employing the methods described herein, simultaneously identifies clinically relevant IGH locus translocations and V (D) J rearrangements from highly degraded material.

DNA fragmentation and cyclization: the method starts with efficient intramolecular ligation of DNA fragments followed by multiplex inverse PCR that preferentially amplifies breakpoint junctions containing fragments. First, the isolated variable length DNA is sheared to a length of about 200bp using enzymatic fragmentation (e.g., NEBNExt dsDNA fragment enzyme, catalog number M0348) or manually sheared using Covaris ME220, followed by QuantaBio sparQ PureMag, bead cleaning. Then, 50ng of the fragmented and bead purified DNA was heat denatured into single stranded DNA, followed by the use of the CircLigase ^TM The circularization was performed with ssDNA ligase (Lucigen Co., ltd. (Lucigen) catalog number CL4111K/CL 4115K) using 10pmol of ssDNA per reaction according to the manufacturer's protocol. The ssDNA was incubated at 60℃for 1 hour to circularize the ssDNA, followed by 10 minutes at 80℃to circularize the CircLigase ^TM Deactivation.

After circularization, some uncycled DNA (both single and double stranded) may remain in each sample. A mixture of exonuclease I (NEB) and exonuclease III (NEB) was used to digest uncyclized DNA by incubation at 37℃for 1 hour. The remaining circular ssDNA was then purified using the Zymo Oligo Clean & Concentrator kit.

Inverse PCR: the purified circular ssDNA template was then amplified using inverse PCR as described herein. PCR conditions were adapted from NEBThe polymerase master mix reaction conditions contained 0.2mM dNTPs (each), 0.1. Mu.M primers (each, e.g., a set of primers 0.1. Mu.M first primer and 0.1. Mu.M second primer), 0.2U/. Mu. L Q5 polymerase, 1. Mu.M blocking oligomer (each) and 500ng to 2ug template. A 2-step amplification protocol was performed in which the initial denaturation step was 96 ℃, followed by cycling between a 96 ℃ denaturation step and an annealing/extension step at 62 ℃. Samples were then obtained by library preparation. For simplicity, the data in Table 1 were generated using a single pair of gene-binding reverse PCR primers and a single blocker. The completed assay (amplified IGH, IGK, IGL locus rearrangement) will have about 22 primers (1F, 6R for IGH locus; 3F, 6R for IGK locus; 1F, 5R for IGL locus) and 18 different blockers.

Sequencing: at G4 ^TM The amplicon library was sequenced by a 2x150bp read on the platform and analyzed to detect translocation. IGH V from fragmented IVS-0010 and IVS-0030 reference control gDNA (Invivoscribe cat. No. 40880550 and 40881750) and healthy donor PBL gDNA were simultaneously detected using the method D) J rearrangement and BCL1-JH and BCL2-JH translocation.

Results: BCL1-JH and BCL2-JH translocations were detected from 50ng of fragmented gDNA (average template length of 200 bp) from IVS-0010 and IVS-0030 reference controls, respectively. Translocation was also detected from 50ng of samples consisting of fragmented reference control material incorporated into the background of fragmented healthy donor PBLs at a frequency of 1%. Preferential amplification of the translocation-containing template was observed, enabling detection from <1M reads/sample under all test conditions. V (D) J rearrangements were successfully detected from PBL gDNA using the same multiplex inverse PCR reaction (see, e.g., FIG. 14). A summary of pooled sequencing reads can be found in table 1.

Table 1. Detection analysis limits from fragmented material. For simplicity, the data in Table 1 were generated using a single pair of gene-binding reverse PCR primers and a single blocker. In an example, a complete assay (amplification IGH, IGK, IGL locus rearrangement) will have about 22 primers (1F, 6R for IGH locus; 3F, 6R for IGK locus; 1F, 5R for IGL locus) and 18 blockers. Healthy donor PBL gDNA and gDNA from IVS-0030 (catalog number: 40881750) were cut by ultrasound to an average length of about 200bp. 50ng of fragmented PBL gDNA or 50ng of PBL gDNA incorporating 0.5ng of IVS-0030 were subjected to circularization and amplification by the assay described herein. At G4 ^TM Amplicons were sequenced using a 1x150bp read. Reads were aligned to the genome by bwa, and then read peaks corresponding to translocation junctions were identified by MACS 2. Unique VDJ rearrangements were identified by IgBLAST. The score on the target reads corresponds to reads that map at least in part to the IGH locus.

Conclusion: the methods described herein enable detection of novel gene fusions from highly degraded materials based on mPCR with sequencing efficiency similar to traditional mPCR. As a first application, these methods were applied to simultaneously detect B cell V (D) J rearrangements and clinically relevant JH translocations from a limited amount of degraded gDNA. In this regard, these approaches represent a significant advance over current mPCR-based approaches to antigen receptor sequencing. The methods are expected to have wide applicability in molecular diagnostics and MRD monitoring of disease states such as cancer.

Claims

1. A method of differentially amplifying a polynucleotide comprising a fusion gene relative to a polynucleotide not comprising the fusion gene, the method comprising:

i) Circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises the fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise the fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides;

ii) binding a blocking element to the one or more non-fused circular template polynucleotides; and

iii) Hybridizing a first primer and a second primer to the one or more non-fusion circular template polynucleotides and the one or more fusion circular template polynucleotides and extending with a polymerase to produce a first amount of non-fusion polynucleotide amplification product and a second amount of fusion polynucleotide amplification product, wherein the first amount is detectably less than the second amount; whereby said polynucleotides comprising said fusion gene are differentially amplified.

2. The method of claim 1, wherein binding the blocking element comprises binding the blocking element upstream of the first primer.

3. The method of claim 1, wherein the second amount is about 1%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 75% greater than the first amount.

4. The method of claim 1, wherein the second number is about 2 times, at least about 1.5 times, at least about 2.0 times, at least about 2.5 times, at least about 5 times, at least about 10 times, or more than about 10 times the first number.

5. The method of claim 1, further comprising detecting the first amount of non-fusion polynucleotide amplification product and the second amount of fusion polynucleotide amplification product.

6. The method of claim 1, wherein the one or more linear nucleic acid molecules comprise DNA, RNA, or cDNA; optionally wherein said DNA or said RNA is a cell-free nucleic acid molecule.

7. The method of claim 1, wherein the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction.

8. The method of claim 1, wherein the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction formed by alternative splicing.

9. The method of claim 1, wherein the one or more linear nucleic acid molecules comprise RNA or cDNA and the fusion gene comprises an exon junction formed by a splice defect.

10. The method of claim 1, wherein the fusion gene comprises an interchhromosomal or intrachromosomal translocation.

11. The method of claim 10, wherein the intrachromosomal translocation comprises a partially or fully rearranged B cell or T cell antigen receptor.

12. The method of claim 1, wherein the blocking element comprises an oligonucleotide, a protein, or a combination thereof.

13. The method of claim 1, wherein the one or more linear nucleic acid molecules are about 20 to about 1000 nucleotides in length, about 100 to about 300 nucleotides in length, about 300 to about 500 nucleotides in length, or about 500 to about 1000 nucleotides in length.

14. The method of claim 1, wherein the blocking element binds to about 1 to 150 nucleotides upstream relative to the first primer.

15. The method of claim 1, wherein the first primer hybridizes to the one or more fusion circular template polynucleotides about 1 to 100 nucleotides downstream relative to a fusion junction within the fusion gene.

16. The method of claim 1, wherein the first primer and the second primer hybridize to complementary sequences of the one or more fused circular template polynucleotides and the one or more non-fused circular template polynucleotides, wherein the first primer and the second primer are about 1 to about 50 nucleotides apart.

17. The method of claim 1, further comprising binding a second blocking element to the one or more non-fusion circular template polynucleotides downstream relative to the second primer.

18. The method of claim 17, wherein the second blocking element binds to about 100 to about 300 nucleotides downstream relative to the second primer.

19. The method of claim 1, further comprising repeating steps ii) and iii).

20. The method as recited in claim 1, further comprising:

iv) amplifying the one or more non-fused circular template polynucleotides to produce a third amount of non-fused polynucleotide amplification products; and amplifying the one or more fusion circular template polynucleotides to produce a fourth quantity of fusion polynucleotide amplification products, wherein the third quantity and the fourth quantity are substantially the same.

21. The method of claim 20, wherein amplifying the one or more non-fused circular template polynucleotides comprises hybridizing a third primer and a fourth primer to the one or more non-fused circular template polynucleotides and extending both primers with a polymerase, and wherein amplifying the one or more fused circular template polynucleotides comprises hybridizing a third primer and a fourth primer to the one or more fused circular template polynucleotides and extending both primers with a polymerase.

22. The method of claim 21, wherein the third primer hybridizes upstream of a target sequence and the fourth primer hybridizes downstream of a target sequence, wherein the target sequence comprises a single nucleotide variant, an insertion, a deletion, an internal tandem repeat, or a copy number variant.

23. The method as recited in claim 1, further comprising: detecting the length of the non-fusion polynucleotide amplification product and the length of the fusion polynucleotide amplification product; detecting one or more probes bound to the non-fusion polynucleotide amplification product and the fusion polynucleotide amplification product; or sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product.

24. The method of claim 23, wherein sequencing the non-fused polynucleotide amplification product and the fused polynucleotide amplification product produces one or more sequencing reads.

25. The method of claim 24, further comprising aligning the substring of one or more sequencing reads with a reference sequence.

26. The method of claim 24, further comprising comparing the k-mer substring of the one or more sequencing reads to a k-mer table of a fusion gene reference.

27. The method as recited in claim 24, further comprising: grouping one or more sequencing reads based on the barcode sequence and/or the sequence comprising the fusion gene; and within the set, aligning the reads and forming a consensus sequence of reads having the same barcode sequence and/or sequences comprising the fusion gene.

28. The method of claim 23, wherein sequencing further comprises: generating one or more sequencing reads comprising a circularized junction formed between the 5 'and 3' ends of the linear nucleic acid molecule; and quantifying the number of different circularized junction sequences comprising said fusion gene.

29. A kit, comprising:

a circularizing agent, wherein the circularizing agent is capable of binding the 5 'and 3' ends of a linear nucleic acid molecule;

a blocking element capable of binding to one or more circular polynucleotides;

a first primer and a second primer; and

a polymerase.

30. A method of amplifying a polynucleotide comprising a fusion gene, the method comprising:

i) Binding a blocking element to a non-fusion circular template polynucleotide, wherein the non-fusion circular template does not include the fusion gene;

ii) hybridizing a first primer and a second primer to the non-fusion circular template polynucleotide; and hybridizing the first primer and the second primer to a fusion circular template polynucleotide, wherein the fusion circular template polynucleotide comprises the fusion gene; and

iii) Extending the first primer and the second primer with a non-strand displacement polymerase to produce a fusion polynucleotide amplification product.

31. The method of claim 30, wherein binding the blocking element comprises binding the blocking element upstream of the first primer.

32. The method of claim 31, wherein prior to step i), the method comprises circularizing a plurality of linear nucleic acid molecules to form a plurality of circular template polynucleotides, wherein one or more of the linear nucleic acid molecules comprises the fusion gene, thereby forming one or more fusion gene circular template polynucleotides, and wherein one or more of the linear nucleic acid molecules does not comprise the fusion gene, thereby forming one or more non-fusion gene circular template polynucleotides.

33. The method of claim 32, further comprising binding a second blocking element to the non-fusion circular template polynucleotide downstream relative to the second primer.

34. The method of claim 33, further comprising detecting the fusion polynucleotide amplification product.

35. The method of claim 33, further comprising sequencing the fusion polynucleotide amplification product.