WO2019236644A1 - Encoded libraries and methods of use for screening nucleic acid targets - Google Patents

Encoded libraries and methods of use for screening nucleic acid targets Download PDF

Info

Publication number
WO2019236644A1
WO2019236644A1 PCT/US2019/035481 US2019035481W WO2019236644A1 WO 2019236644 A1 WO2019236644 A1 WO 2019236644A1 US 2019035481 W US2019035481 W US 2019035481W WO 2019236644 A1 WO2019236644 A1 WO 2019236644A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
rna
ligase
nucleic acid
del
Prior art date
Application number
PCT/US2019/035481
Other languages
French (fr)
Inventor
Jonathan Craig BLAIN
James Gregory BARSOUM
Neil KUBICA
Jennifer C. Petter
Alexandra East SELETSKY
Original Assignee
Arrakis Therapeutics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arrakis Therapeutics, Inc. filed Critical Arrakis Therapeutics, Inc.
Publication of WO2019236644A1 publication Critical patent/WO2019236644A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • C40B40/08Libraries containing RNA or DNA which encodes proteins, e.g. gene libraries
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1068Template (nucleic acid) mediated chemical library synthesis, e.g. chemical and enzymatical DNA-templated organic molecule synthesis, libraries prepared by non ribosomal polypeptide synthesis [NRPS], DNA/RNA-polymerase mediated polypeptide synthesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means

Definitions

  • the present invention relates to encoded libraries and methods of use thereof for screening and identifying candidate compounds for binding to a nucleic acid target of interest.
  • RNAs Ribonucleic acids
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • ncRNA non-coding RNA
  • DNA-encoded chemical libraries are a technology that enables the synthesis and screening, on a massive scale, of libraries of small molecules.
  • DEL technology bridges the fields of combinatorial chemistry and molecular biology and represents a well-validated tool for drug discovery against protein targets.
  • the aim of DEL technology is to enable massive parallel screening in early phase drug discovery efforts such as target validation and hit identification, thereby accelerating and decreasing costs in the drug discovery process.
  • DEL technology generally uses DNA“barcodes” to give each library member a unique identifier.
  • the DNA sequences include segments that direct and control chemical synthesis of small molecule library members from building block precursors.
  • the technique enables massively parallel creation and interrogation of libraries via affinity selection, typically on an immobilized protein target.
  • Homogeneous methods for screening DNA-encoded libraries are also available using, for example, water-in-oil emulsion technology to isolate individual ligand- target complexes that are later identified.
  • FIG. 1 shows cartoons of how a YoctoReactor® (yR) is used to prepare a DEL.
  • BBs chemical building blocks
  • the conserved DNA is designed to self-assemble into a three-way junction (3WJ) or four-way DNA junction (4WJ), thus three or four BBs can be brought into close proximity and allowed to react
  • 3WJ three-way junction
  • 4WJ four-way DNA junction
  • BBs are attached via cleavable or non-cleavable linkers to bispecific DNA oligonucleotides (oligo-BBs) designed to contain a DNA barcode for the attached BB at the distal end of the oligo and an area of conserved DNA sequence that self- assembles the DNA into a 3WJ or 4WJ.
  • oligo-BBs bispecific DNA oligonucleotides
  • Two reactants are brought into close proximity at the cavity at the center of a yR DNA junction, which has a volume of about one yoctoliter (10 24 L).
  • BB and acceptor Representative member of a DEL library prepared from a yR comprising a 3WJ.
  • the size of a DEL library is determined by the number of different BB-oligos as well as yR geometry.
  • FIG. 2 shows a cartoon scheme for preparing a small molecule DEL library member (display product) by the yR approach.
  • Each DNA strand contains a codon region which encodes for the particular BB.
  • repertoires of two DNA strands with individual codons and BB conjugates are mixed together with a complementary DNA strand that assembles the yR. Because of sequence complementarity, these DNA strands self-assemble combinatorially into a stable three-way junction forming the stable double-stranded framework of the yR.
  • the BBs are then coupled in a chemical reaction. Repetition with a third BB-oligo and cleavage of all by one of the BB linkers followed by purification and primer extension leads to the library member.
  • FIG. 3 shows a scheme of the Binder Trap Enrichment® (BTE) method of library screening.
  • BTE Binder Trap Enrichment®
  • FIG. 4 shows cartoons of some model nucleic acid sequences used to determine exemplary conditions for an RNA-DNA ligation.
  • the base of each arrow is the 5 '-end of the nucleic acid, while the head of the arrow is the 3 '-end.
  • the top cartoon shows a setup in which a DNA Ligation Partner and Splint are hybridized with a short overhang that is complementary to the 3 '-end of an target RNA.
  • the 5 '-end of the RNA includes a Cy5 label for facilitating gel analysis.
  • An optional Helper oligo may be included, e.g.
  • the Ligation Partner and Splint may be dsDNA and may include a primer (Primer 1); the RNA may include another primer (Primer 2).
  • the overhang in this example is a TC dinucleotide on the Splint that pairs with an AG on the RNA.
  • FIG. 5 shows PAGE results for a ligation experiment. Successful ligation was observed with the 5 bp splint (DirLig_Splint5bp-l), but not the 2 bp splint (DirLig_Splint2bp-l). The negative control splints (DirLig_CtlSplint2bp-l and DirLig_CtlSplint5bp-l) did not enable ligation.
  • FIG. 6 shows PAGE results for a set of modified ligation conditions for a 2 bp overhang that included 5% PEG4000 or a helper oligo with incubation at 22 °C.
  • FIG. 7 shows PAGE results for a set of modified ligation conditions featuring SplintR or T4 RNA Ligase 2 optionally in the presence of PEG4000 and different temperatures.
  • FIG. 8 shows PAGE results for modified ligation conditions featuring T4 DNA ligase and/or SplintR.
  • the use of DNA ligases and a ligatable helper oligo enabled ligation of the RNA to DNA across a 2 bp splint.
  • FIG. 9 shows (panel A) 10% PAGE results of a ligation reaction with Cy5-labeled RNA in red (appearing as dark grey spots in the picture of the PAGE gel as shown) and SYBR- gold-stained RNA and DNA in green (appearing as light grey spots in the gel picture).
  • Panel B shows 2% agarose E-gel analysis of the RT-PCR reactions, with the Exactgene mini-DNA ladder in lane M and the desired product indicated with an arrow.
  • Panel C provides a description of samples 1-6 from panels A and B.
  • FIG. 10 shows a cartoon of an exemplary DEL screening strategy that uses one or two ligations to capture small molecule-target binding information for later analysis, e.g. by sequencing.
  • FIG. 11 shows a cartoon of an exemplary DEL screening strategy that uses a ligation to capture small molecule-target binding information for later analysis, e.g. by sequencing.
  • FIG. 12 shows a cartoon of an exemplary DEL screening strategy that uses reverse transcription of a partially double-stranded nucleic acid target-DEL sequence to capture small molecule-target binding information for later analysis, e.g. by sequencing of the fully double- stranded product, where the sequence denoted as“RT product” is added by a reverse transcriptase.
  • FIG. 13 shows a cartoon of an exemplary library screen.
  • An encoded small molecule attached to a nucleic acid barcode is allowed to bind to a nucleic acid target, here a stem-loop RNA structure (for example, an miRNA-mRNA featuring a 3WJ and stem-loop structure). After binding, the complex is rapidly diluted and emulsified, then ligated in the emulsion. RT-PCR and sequencing allows counting of hits from the screen.
  • FIG. 14 shows overviews of various methods of assembling DNA-encoded libraries
  • DNA-recorded libraries are constructed through iterative steps of splitting, building block coupling, tag ligation and pooling
  • DTS DNA-templated synthesis
  • c Related methods based on DNA-junctions such as the YoctoReactor® similarly rely on proximity-based reactions but do not necessarily require a pre-existing DNA template
  • e Encoded self-assembling chemical (ESAC) libraries are assembled from sub-libraries by hybridizing oligonucleotides. When highly repetitive cycles are utilized for library assembly, only the first cycles and the final products are illustrated.
  • DEL technology was developed as an alternative to combichem/HTS (combinatorial chemistry/high-throughput screening) to permit the rapid synthesis of millions to billions of drug like compounds and provide a resource efficient method to screen the diversity.
  • combichem/HTS combininatorial chemistry/high-throughput screening
  • By linking small molecules to a DNA code large combinatorial libraries of drug-like compounds can be synthesized and screened against biological targets in a single-pot format. Hits are then deconvoluted by next- generation DNA sequencing. The encoding and directing of chemistry by DNA now make it possible to efficiently generate a much greater chemical diversity and interrogate the diversity robustly.
  • DELs can be subjected to evolutionary selection and become enriched for the small subset of the library which binds the target of interest. This subset is subsequently tested in functional assays. This straightforward method for discovering ligands that modulate targets works because it does not rely on enzymatic turnover. Thus, DELs are particularly versatile tools for discovery in a post-gen
  • MW Molecular weight
  • cLogP Molecular weight
  • BBs building blocks
  • the DEL is assembled by bioorthogonal chemical reactions such as amide formation, reductive amination, Pd-catalyzed cross-couplings, nucleophilic aromatic substitution, cycloaddition, urea formation, and protecting group manipulations.
  • a high-fidelity DEL having a functional size comparable to the nominal or theoretical library size is a library in which all DNA codes are attached to the correct molecule anticipated by the sequence such that no compounds are incorrectly encoded.
  • One advantage of the yR approach is that all truncated and unreacted products are eliminated and each compound in the high fidelity library is adequately represented.
  • a high- fidelity library allows a higher level of information to be extracted directly from a library screen without further data processing and is thus less resource intensive. Examination of related structures identified in a primary screen, for example, immediately provides information on key pharmacophores and regions where chemical modifications are allowed providing a data set to accelerate the hit-to-lead process and a better starting point for further lead optimization.
  • DEL screens the attachment of a small molecule to a unique DNA barcode allows straightforward identification of“hits,” i.e. molecules that bind to the target.
  • DEL libraries are put through affinity selection on a selected, immobilized target protein, after which non-binders are removed by washing steps, and binders may be amplified by polymerase chain reaction (PCR) and identified by reference to their DNA code, for example, by DNA sequencing.
  • PCR polymerase chain reaction
  • hits can be further enriched by performing rounds of selection, PCR amplification and translation in analogy to biological display systems such as antibody phage display. This makes it possible to work with larger compound libraries than previously possible.
  • an“enriched library” refers to either a subset of an encoded library whose members have been selected (enriched) for binding to a particular target of interest, or a library that has been selected (enriched), through one, two, three, four, or more rounds of evolution-based selection, for binding to a target of interest.
  • ESAC technology is notable as being a combinatorial, self-assembling approach that has some similarities to fragment-based drug discovery.
  • DNA annealing enables discrete building block combinations to be sampled, but no chemical reaction takes place between them.
  • evolution-based DEL technologies include DNA-routing developed by Prof. D. R. Halpin and Prof. P. B. Harbury (Stanford ETniversity, Stanford, CA), DNA-templated synthesis developed by Prof. D.
  • the DNA tagged BBs enable the generation of a genetic code for synthesized compounds and artificial translation of the genetic code is possible.
  • Artificial translation is possible because the small molecule candidate compounds (which are reaction products from multiple BBs) can be recalled by the PCR-amplified genetic code, and the library compounds can be regenerated (decoded). This, in turn, enables the principle of Darwinian natural selection and evolution to be applied to small molecule selection in direct analogy to biological display systems through rounds of selection, amplification and translation.
  • DELs are a subset of so-called in vitro display libraries, of which other examples are known in the art and may be used in accordance with the present invention.
  • the term“in vitro display library” as used herein refers to a library comprising numerous different binding entities (small molecules) wherein each binding entity is attached to a nucleic acid molecule and the nucleic acid molecule comprises specific nucleic acid sequence information allowing one to identify the binding entity. More specifically, once one knows the specific nucleic acid sequence information of the nucleic acid molecule one can derive the structure of the specific binding entity attached to the nucleic acid molecule.
  • the DEL is assembled by one of the methods described herein or, e.g., as shown in FIG. 14.
  • the present invention provides DELs of small molecules prepared by a non-evolution-based split-and-pool method.
  • Split-and-pool methods are known in the art and include those described herein. For example, initially a set of unique DNA- oligonucleotides (n), each containing a specific coding sequence, is chemically conjugated to a corresponding set of small organic molecules. Consequently, the oligonucleotide-conjugate compounds are mixed (“Pool”) and divided (“Split”) into a number of groups ( m ).
  • a second set of building blocks ( m ) are coupled to the first one and a further oligonucleotide which codes for the second modification is enzymatically introduced before mixing again.
  • These“split-and-pool” steps can be iterated a number of times (r) and by doing so one increases at each round the library size in a combinatorial manner. By performing r rounds of split-pool synthesis with n alternate chemical groups per round, one achieves a diversity of if compounds.
  • a promising strategy for the construction of DNA-encoded libraries is represented by the use of multifunctional building blocks covalently conjugated to an oligonucleotide serving as a“core structure” for library synthesis.
  • a“pool-and-split” fashion a set of multifunctional scaffolds undergo orthogonal reactions with series of suitable reactive partners.
  • the identity of the modification is encoded by a chemical or enzymatic addition (e.g., by ligation) of a DNA segment to the original DNA“core structure.” See, e.g., Mannocci, L., et al ., “High-throughput sequencing allows the identification of binding molecules isolated from DNA-encoded chemical libraries,” Proc. Natl. Acad. Sci.
  • diene carboxylic acids used as scaffolds for library construction at the 5 '-end of an amino modified oligonucleotide can be subjected to a Diels-Alder reaction with a variety of maleimide derivatives.
  • Diels-Alder reaction with a variety of maleimide derivatives.
  • many other bioorthogonal reactions are known and are being developed to further extend the possible chemical diversity of DELs. See, e.g., Arico-Muendel, Med. Chem. Commun. 2016, 7, 1898-1909 and Goodnow, R. A. Jr. et al, Nat. Rev. Drug Discov. 2017, Feb; l6(2): l3 l-l47, hereby incorporated by reference.
  • the present invention provides DELs of small molecules prepared as a combinatorial self-assembling library.
  • Combinatorial self-assembling libraries include encoded self-assembling chemical libraries (ESAC libraries).
  • Encoded Self-Assembling Chemical (ESAC) libraries rely on the principle that two sublibraries of a size of x members (e.g. 10 3 ) containing a constant complementary hybridization domain can yield a combinatorial DNA- duplex library after hybridization with a complexity of x 2 uniformly represented library members (e.g. 10 6 ).
  • Each sub-library member generally consists of an oligonucleotide containing a variable, coding region flanked by a constant DNA sequence, carrying a suitable chemical modification at the oligonucleotide extremity.
  • the ESAC sublibraries can be used in at least four different embodiments. See, e.g., Melkko, S., et al,“Encoded self-assembling chemical libraries,” Nat. Biotechnol. 2004, 22(5), 568-74; hereby incorporated by reference.
  • a sub-library is paired with a complementary oligonucleotide and used as a DNA encoded library displaying a single covalently linked compound for affinity- based selection experiments.
  • a sub-library is paired with an oligonucleotide displaying a known binder to the target, thus enabling affinity maturation strategies.
  • two individual sublibraries are assembled combinatorially and used for the de novo identification of bidentate binding molecules.
  • three different sublibraries are assembled to form a combinatorial triplex library.
  • preferential binders isolated from an affinity-based selection are PCR-amplified and decoded on complementary oligonucleotide microarrays or by concatenation of the codes, subcloning and sequencing.
  • Such methods are described, for example, in Lovrinovic, M., el al ., “DNA microarrays as decoding tools in combinatorial chemistry and chemical biology,” Angew. Chem. Int. Ed. Engl. 2005, 44(21), 3179-83 and Melkko, S., et al.,“Encoded self-assembling chemical libraries,” Ari/. Biotechnol. 2004, 22(5), 568-74; hereby incorporated by reference.
  • the individual building blocks can eventually be conjugated using suitable linkers to yield a drug-like, high-affinity compound.
  • the characteristics of the linker e.g. length, flexibility, geometry, chemical nature and solubility
  • bio-panning experiments on HSA of a 600- member ESAC library allowed the isolation of the 4-(p-iodophenyl)butanoic moiety.
  • This compound represents the core structure of a series of portable albumin binding molecules and of AlbufluorTM, a recently developed fluorescein angiographic contrast agent currently under clinical evaluation.
  • ESAC technology has been used for the isolation of potent inhibitors of bovine trypsin and for the identification of novel inhibitors of strom elysin-l (MMP- 3), a matrix metalloproteinase involved in disease processes such as arthritis and metastasis.
  • MMP- 3 matrix metalloproteinase involved in disease processes such as arthritis and metastasis.
  • ESAC libraries may be prepared in several variations. Generally, in ESAC libraries small organic molecules are coupled to 5 '-amino modified oligonucleotides, containing a hybridization domain and a unique coding sequence, which allows identification of the coupled molecule.
  • the ESAC library is used in (1) a single pharmacophore format, (2) in affinity maturations of known binders, (3) in de novo selections of binding molecules by self assembling of sublibraries in DNA-double strand format, or (4) in DNA-triplexes.
  • the ESAC library in the selected format is used in a selection and read-out procedure.
  • the oligonucleotide codes of the binding compounds are PCR-amplified and compared with the library without selection on oligonucleotide micro-arrays. Identified binders/binding pairs are validated after conjugation (if appropriate) to suitable scaffolds.
  • the DNA-routing machinery consists of a series of connected columns bearing resin-bound anticodons, which could sequence-specifically separate a population of DNA-templates into spatially distinct locations by hybridization. According to this split-and-pool protocol a peptide combinatorial library DNA-encoded of 10 6 members was generated. Halpin, D.R. and Harbury, P.B., “DNA display II. Genetic manipulation of combinatorial chemistry libraries for small-molecule evolution,” PLoS Biol. 2004, 2(7), E174, hereby incorporated by reference.
  • FIG. 14 (a) The most widely applied method for library synthesis is a combinatorial split-and-pool approach.
  • individual steps of chemical synthesis are encoded by segregating aliquots of the nascent library and conducting one specific chemical step.
  • the ligation of one specific oligonucleotide is then performed within each segregated compartment. Ligation and building block installation occur on a bifunctional oligonucleotide that supports both processes.
  • the resultant libraries may be double-stranded or single-stranded.
  • Multiple chemical and encoding steps are then conducted using a split-and-pool methodology.
  • Most reported work using DNA- recorded chemistry uses enzymatic ligation to catenate oligonucleotide tags. Chemical ligation may also be used.
  • DNA-recording grew out of work at Praecis Pharmceuticals and GlaxoSmithKline (GSK). It has been used, for example, to generate lead compounds at GSK targeting the proteins soluble epoxide hydrolase (sEH) and Receptor interacting protein 1 kinase (RIP1). See, e.g., Arico-Muendel, Mecl. Chem. Commun. 2016, 7, 1898-1909 and Goodnow, R. A. Jr. etal , Nat. Rev. Drug Discov. 2017, Feb; 16(2): 131-147, each of which is hereby incorporated by reference. Accordingly, in some embodiments, the DEL is prepared by the use of DNA-recording.
  • DNA-recording begins with a chimeric DNA-linker library starting material termed a headpiece.
  • a headpiece Two short, complementary DNA sequences are stabilized as a duplex by a PEG-based reverse turn that displays an amino-PEG linker.
  • Double- stranded tags can then be ligated to the headpiece DNA (whose ends contain a 2 base overhang) while the amino group attached to the PEG-based portion is derivatized as a small molecule warhead. ETsing multiple cycles of split-and-pool synthesis, large and diverse libraries are generated.
  • the DEL is prepared by DNA-templated synthesis.
  • DNA- templated synthesis was first reported in 2001 by David Liu and co-workers. According to this method, complementary DNA oligonucleotides are used to assist certain synthetic reactions that do not efficiently take place in solution at low concentration.
  • Gartner, Z.J. et al . “The generality of DNA-templated synthesis as a basis for evolving non-natural small molecules,” J Am. Chem. Soc. 2001, 123(28), 6961-3; and Calderone, C.T. et al, “Directing otherwise incompatible reactions in a single solution by using DNA-templated organic synthesis,” Angew. Chem. Int. Ed. Engl. 2002, 41(21), 4104-8; hereby incorporated by reference.
  • DNA-templated synthesis is based on codon specific recognition of DNA sequences where a library of encoded DNA templates is used to direct the synthesis of library members using BBs conjugated to complementary codon specific DNA sequences.
  • a DNA- heteroduplex is used to accelerate the reaction between BBs displayed at the extremities of the two DNA strands.
  • the“proximity effect,” which accelerates the bimolecular reaction was shown to be distance-independent (at least within a distance of 30 nucleotides).
  • oligonucleotides carrying one chemical reactant were hybridized to complementary oligonucleotide derivatives carrying a different reactive chemical group.
  • DNA-templated synthesis has been used in preparing libraries of macrocyclic compounds.
  • the YoctoReactor® (yR) is a combinatorial synthetic approach that exploits the self- assembling nature of DNA oligonucleotides into 3, 4, or 5-way junctions to direct small molecule synthesis at the center of the junction by bringing reactants into close proximity. Synthesis of encoded libraries using the yR approach and variations thereof is described in, for example, U.S. Patent 8,202,823, U.S. 7,928,211, U.S. Patent Application Publication No. US 2017/0233726, and Hansen, M. H., el al. ,“A yoctoliter-scale DNA reactor for small molecule evolution,” J Am Chem Soc 2009, 737, 1322-1327, hereby incorporated by reference.
  • the cavity at the center of the yR DNA junction has a volume of about one yoctoliter (10 24 L). Such a minute volume is on the order of that required for a chemical reaction between two single molecules.
  • the effective concentration of the reactants is in the high-mM range, resulting in high reaction rates.
  • the high reaction rate facilitated by the DNA junction effects chemical reactions that otherwise would not take place at practically feasible rates at the actual concentrations of the reactants in solution, which would be multiple orders of magnitude lower.
  • BBs small-molecule chemical building blocks
  • oligo-BBs bispecific DNA oligonucleotides
  • 3WJ three-way junction
  • each yR arm is an 18 bp stem with a 4 nt loop, and the whole oligo used in the combinatorial synthesis is 40 nt. In some embodiments, each yR arm is about 10-30, 12-28, 14-26, 16-24, or 18-22 bp stem with a loop of about 2-14, 3-13, 4-12, 5-11, 6-10, or 8-9 nt.
  • the BBs are attached to three different, bispecific DNA oligonucleotides (oligo-BBs), which then interact so as to form a YoctoReactor® (yR) comprising a three-way junction (3WJ).
  • oligo-BBs bispecific DNA oligonucleotides
  • yR YoctoReactor®
  • the BBs are attached to four different, bispecific DNA oligonucleotides, which then interact so as to form a yR comprising a four-way junction (4WJ).
  • the oligo-BBs are designed such that the oligo contains (a) the code for an attached BB at the distal end of the oligo and (b) areas of constant DNA sequence that self-assemble the DNA into a 3WJ or 4WJ regardless of the identity of the BB and the subsequent chemical reaction.
  • library preparation is carried out in a stepwise combinatorial fashion, for example as shown in FIG. 2.
  • an area of constant DNA sequence capable of self-assembly with another such area is about 5 to about 200 nucleotides in length. In some embodiments, an area of constant DNA sequence is about 10 to about 150 nucleotides in length, or about 10-100, 15-100, 20-100, 10-80, 10-60, 15-50, or 20-40 nucleotides in length. In some embodiments, each yR arm comprises two hybridization regions, wherein each hybridization region is of about 10 nt each. In some embodiments, each yR arm comprises one, two, three, or four hybridization regions.
  • the length of the DNA barcodes may also vary.
  • the DNA barcodes may comprise at least 4 nucleotides in length, at least 5 nucleotides in length, at least 6 nucleotides in length, or at least 7, 8, 9, 10, 11, or 12 nucleotides in length.
  • each barcode sequence independently comprises from about 4 nucleotides in length to about 20 nucleotides in length.
  • Barcodes are typically comprised of a relatively short sequence of nucleotides attached to a sample sequence, where the barcode sequence is either known, or identifiable by its location or sequence elements. In some embodiments, a unique identifier is useful for sample indexing and/or identification of the small molecule library member.
  • barcodes may also be useful in other contexts.
  • a barcode may serve to track samples throughout processing (e.g., location of sample in a lab, location of sample in plurality of reaction vessels, etc.); provide manufacturing information; track barcode performance over time (e.g., from barcode manufacturing to use) and in the field; track barcode lot performance over time in the field; provide product information during sequencing and perhaps trigger automated protocols (e.g., automated protocols initiated and executed with the aid of a computer) when a barcode associated with the product is read during sequencing; track and troubleshoot problematic barcode sequences or product lots; serve as a molecular trigger in a reaction involving the barcode, and combinations thereof.
  • automated protocols e.g., automated protocols initiated and executed with the aid of a computer
  • barcode sequence segments as described herein are used to provide linkage information as between two discrete determined nucleic acid sequences.
  • This linkage information may include, for example, linkage to a common sample, a common reaction vessel, e.g., a well or partition, or even a common starting nucleic acid molecule.
  • a common reaction vessel e.g., a well or partition
  • a common starting nucleic acid molecule e.g., a common starting nucleic acid molecule.
  • the barcode can be PNA, LNA, RNA, DNA or combinations thereof.
  • oligonucleotides incorporating barcode sequence segments may also include additional sequence segments.
  • additional sequence segments may include functional sequences, such as primer sequences, primer annealing site sequences, immobilization sequences, or other recognition or binding sequences useful for subsequent processing, e.g., a sequencing primer or primer binding site for use in sequencing of samples to which the barcode containing oligonucleotide is attached.
  • functional sequences such as primer sequences, primer annealing site sequences, immobilization sequences, or other recognition or binding sequences useful for subsequent processing, e.g., a sequencing primer or primer binding site for use in sequencing of samples to which the barcode containing oligonucleotide is attached.
  • the reference to specific functional sequences as being included within the barcode containing sequences also envisions the inclusion of the complements to any such sequences, such that upon complementary replication will yield the specific described sequence.
  • barcodes or partial barcodes may be generated from oligonucleotides obtained from or suitable for use in an oligonucleotide array, such as a microarray or bead array.
  • oligonucleotides of a microarray may be cleaved, (e.g., using cleavable linkages or moieties that anchor the oligonucleotides to the array (such as photocleavable, chemically cleavable, or otherwise cleavable linkages)) such that the free oligonucleotides are capable of serving as barcodes or partial barcodes.
  • barcodes or partial barcodes are obtained from arrays are of known sequence.
  • a microarray may provide at least about 10,000,000, at least about 1,000,000, at least about 900,000, at least about 800,000, at least about 700,000, at least about 600,000, at least about 500,000, at least about 400,000, at least about 300,000, at least about 200,000, at least about 100,000, at least about 50,000, at least about 10,000, at least about 1,000, at least about 100, or at least about 10 different sequences that may be used as barcodes or partial barcodes.
  • the length of a barcode sequence may be any suitable length, depending on the application (e.g., for homogeneous screening methods vs. bound to beads). In some embodiments, a barcode sequence is about 2 to about 500 nucleotides in length, about 2 to about 100 nucleotides in length, about 2 to about 50 nucleotides in length, about 2 to about 20 nucleotides in length, about 6 to about 20 nucleotides in length, or about 4 to 16 nucleotides in length.
  • a barcode sequence is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 85, 90, 95, 100, 150, 200, 250, 300, 400, or 500 nucleotides in length. In some embodiments, a barcode sequence is greater than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
  • a barcode sequence is less than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30,
  • barcodes with different sequences are assembled or, e.g., attached to beads, in separate steps.
  • barcodes with unique sequences are attached to beads such that each bead has multiple copies of a first barcode sequence on it.
  • the beads can be further functionalized with a second sequence.
  • the combination of first and second sequences may serve as a unique barcode, or unique identifier, attached to a bead.
  • the process may be continued to add additional sequences that behave as barcode sequences (in some cases, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more barcode sequences are sequentially added to each bead).
  • the additional sequences that behave as barcode sequences are ligated together in solution to assemble a complete barcode.
  • the barcode is assembled from two or more shorter barcodes (single BB barcodes), each of which encode a particular BB.
  • the barcode results from ligation of the two or more shorter barcodes.
  • the barcode encodes two or more BBs and the identity of a small molecule library member may be determined from the combination of the shorter barcodes.
  • bifunctional BBs are used that comprise one functionality for linking to the DNA barcode and one functionality capable of undergoing a chemical reaction in the yR.
  • the BBs are linked covalently to their DNA barcodes via cleavable or non-cleavable linkers.
  • a DNA code unique to each BB is located at the distal end of the oligo. This ultimately enables the synthetic route of each assembled compound to be determined by its unique DNA barcode, which is a combination of the barcodes of its constituent BBs.
  • BBs bifunctional BBs and the ability to link them directly to their encoding DNA before the library synthesis provides several advantages. Firstly, since the code is intimately attached to the BB there is no chance of mismatch during the library synthesis resulting in a high fidelity library. Secondly, the DNA provides an excellent purification handle enabling incomplete reactions and truncated products to be eliminated from the yR library through systematic purification steps, resulting in an ultra-high purity library and significantly facilitating the interpretation of screening results.
  • the BBs are then allowed to react.
  • the library of compounds is prepared by contacting the BBs under appropriate conditions to facilitate chemical reactions between BBs, thus increasing the number of compounds in the library and their chemical diversity.
  • chemical reactions are performed one step at a time.
  • the BBs are allowed to participate in multicomponent reactions in which three or four BBs participate.
  • the DNA is ligated and the product purified by suitable means, such as by polyacryamide gel electrophoresis or the like.
  • suitable means such as by polyacryamide gel electrophoresis or the like.
  • cleavable linkers are used for all but one oligo-BB.
  • the cleavable linker is an amino-thiol linker such as those described in Hoejfeldt, J. W., etal. ,“A cleavable amino-thiol linker for reversible linking of amines to DNA,” J. Org. Chem. 2006, 71, 9556-9559, hereby incorporated by reference.
  • the products are generally purified before proceeding further. Because of the increase in size of the DNA after the chemical reaction between the BBs, the product is easily purified by polyacrylamide gel electrophoresis (PAGE) under denaturating conditions, thus permitting only the desired reaction product to be recovered.
  • PAGE polyacrylamide gel electrophoresis
  • the information regarding which BBs have reacted is now stored permanently by ligating the two DNA strands containing the codons. One of the BBs (linked to the DNA with a cleavable linker) is then cleaved.
  • a repertoire of BBs linked to a third DNA strand which encodes for each individual BB is now added and the sequence is repeated, resulting in the transfer of the third BB.
  • the DNA contains a priming site so the yR is dismantled by forming the complementary DNA strand in a single round polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • YoctoReactor® library size is a function of the number of different functionalized oligos used in each position and the number of positions in the
  • the yR design approach provides an unvarying reaction site with regard to both (a) distance between reactants and (b) sequence environment surrounding the reaction site. Furthermore, the intimate connection between the code and the BB on the oligo-BB moieties which are mixed combinatorially in a single pot confers a high fidelity to the encoding of the library.
  • the code of the synthesized products furthermore, is not preset, but rather is assembled combinatorially and synthesized in synchronicity with the innate product.
  • McGregor et al. developed an advanced selection method called interaction-dependent PCR (IDPCR) relying on a proximity-dependent binding signal.
  • IDPCR interaction-dependent PCR
  • Binder Trap Enrichment® (BTE) and Other Emulsion-Based Screening Methods
  • a homogeneous method for screening YoctoReactor® libraries (yR) (and which is applicable to libraries generated by other means, such as those described herein) has been developed which uses water-in-oil emulsion technology to isolate individual ligand-target complexes.
  • Called Binder Trap Enrichment® (BTE), it identifies ligands to a protein target by trapping binding pairs (DNA-labeled protein target and yR ligand) in emulsion droplets during dissociation dominated kinetics. Hansen, N. J. V. et al,“Fidelity by design: Yoctoreactor and binder trap enrichment for small-molecule DNA-encoded libraries and drug discovery,” Curr. Opin. Chem.
  • the present invention provides a method of screening a DEL, comprising screening the DEL by an emulsion-based screening method such as BTE and wherein the target is a nucleic acid such as an RNA or fragment thereof.
  • BTE is performed as described in ETS 2017/0233726, hereby incorporated by reference.
  • both the DEL and the target include DNA barcodes.
  • BTE has thus far been limited to soluble proteins or fragments thereof conjugated to a DNA barcode.
  • Several methods of conjugating the target to the DNA barcode have been developed which offer target-dependent versatility, but typically the well-established chemical process used for biotinylation of proteins can be applied. Biotinylation is efficient and tolerated by most proteins.
  • NHS-ester conjugation to lysine and maleimide conjugation to cysteine, or variations thereof are used to conjugate the small molecule or target to its barcode.
  • BTE screening The steps of BTE screening are shown in FIG. 3.
  • a DEL mixed with the DNA labeled target is allowed to reach equilibrium in solution where the target concentration can be controlled.
  • a rapid dilution is then performed, during which the binding kinetics become dominated by dissociation.
  • This is then followed by a rapid emulsion formation which traps the bound ligands with the target within aqueous emulsion droplets.
  • the partitions typically contain on average at most one oligonucleotide sequence per partition. This frequency of distribution at a given sequence dilution follows a Poisson distribution. Thus, in some embodiments, about 6%, 10%, 18%, 20%, 30%, 36%, 40%, or 50% of the droplets or partitions comprise one or fewer oligonucleotide sequences.
  • the number of droplets is at least 2, 3, 4, 5, 10, 100, 1,000, or 10,000 times greater than the number of DEL library members. In some embodiments, the number of droplets is at least 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , or 10 9 times greater than the number of DEL library members.
  • the compartment volume distribution is modeled as a log-normal distribution, also called a Galton distribution. By assuming a log-normal distribution and performing measurements of the actual droplet sizes the expected value (mean) and the standard deviation can be calculated for a specific experiment. According to this distribution, 95% of the compartment volumes will be within L logarithmic units from the mean (log) volume, where L is 1.96 times the standard deviation of the log-volumes.
  • the average compartments size, the variation, and the standard deviation are taken into account when analyzing the data.
  • compartments with a volume larger than 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 times the average compartment size are removed from the experiment.
  • compartments with a volume smaller than 1/100, 1/90, 1/80, 1/70, 1/60, 1/50, 1/40, 1/30, 1/20, 1/10, 1/9, 1/8, 1/7, 1/6, 1/5, 1/4, 1/3, or 1/2 times the average compartment size are removed from the experiment.
  • the target and the DEL DNA barcode are ligated inside the droplets, thus preserving the information of co-trapping.
  • the emulsion is then disrupted, the material recovered and the DNA amplified by PCR.
  • Methods of breaking emulsions of this type are known in the art; for example, centrifuging at 13,000 x g for 5 min at 25 °C.
  • the oil phase is discarded, and residual mineral oil and surfactants are removed from the emulsion by performing the following extraction twice: adding 1 mL of water-saturated diethyl ether, vortexing, and disposing of the upper (solvent) phase.
  • Amplification of DNA codes for co-trapped species is assured as only the ligated DNA will be exponentially amplified in the PCR as each DNA fragment (target DNA tag and library member DNA) contributes a PCR priming site.
  • the amplified DNA is then subjected to DNA sequencing and the DNA codes translated into compounds and counted.
  • Identification of hits after screening and ligation is essentially a counting exercise: information on binding events is deciphered by sequencing and counting the ligated DNA. Selective binders are counted with a much higher frequency than random binders. This is possible because random trapping of target and ligand is“diluted” by the high number of water droplets in the emulsion. Aqueous drops dispersed in oil act as compartments to trap binding complexes on a single target molecule basis, a method that allows screening of tens of millions of compounds with very low assay noise.
  • BTE mimics the non-equilibrium nature of in vivo ligand-target interactions and offers the unique possibility to screen for target specific ligands based on ligand-target residence time because the emulsion, which traps the binding complex, is formed during a dynamic dissociation phase.
  • yR and BTE technologies allow vast drug-like small molecule libraries to be efficiently synthesized in a combinatorial fashion and screened for target binding in a single tube method. As described below, these technologies are compatible with an assay readout enabled by advances in next-generation sequencing technology. This approach has increasingly been applied as a viable technology for the identification of small-molecule modulators to protein targets and as precursors to drugs in the past decade.
  • yR and emulsion-based (e.g., BTE) screening technology has been limited to screening against protein targets to date.
  • BTE emulsion-based screening technology
  • BTE followed by RT-PCR and next-generation sequencing will provide small molecule hits in the DEL that bind to the nucleic acid target.
  • screening of the DEL is performed using an emulsion-less proximity-enabled screening method.
  • the emulsion-less proximity-enabled screen is performed by the steps of: contacting a DEL with a target nucleic acid for a sufficient period of time and under conditions to allow the DEL and target nucleic acid to equilibrate and bind; performing a dilution; encoding or trapping information about binding of DEL library members to the target nucleic acid by, e.g., ligation or reverse transcription; and, optionally, decoding the results of the screen, for example by PCR.
  • the target nucleic acid is an RNA.
  • decoding strategy for the fast and efficient identification of the specific binding compounds is crucial for the further development of the DEL technology.
  • a variety of decoding methods may be used in accordance with the present invention, including microarray-based methodology and high-throughput sequencing techniques. Alternatively, Sanger-based sequencing methods may be used.
  • a sample such as a nucleic acid sample is processed prior to introduction to a sequencing machine.
  • a sample may be processed, for example, by amplification or by attaching a unique identifier.
  • a method of sequencing is used that does not rely on reverse transcription.
  • sensitive and highly multiplex methods to directly measure RNA sequence abundance without requiring reverse transcription are available for a number of biomedical applications, including high-throughput small molecule screening, pathogen transcript detection, and quantification of short/degraded RNAs.
  • These methods include RNA Annealing, Selection and Ligation (RASL) assays, which are based on RNA template-dependent oligonucleotide probe ligation. See, e.g., Larman, H. B., et al, Nucleic Acids Research , 2014, 42(1), 9146-9157; and Li, H.
  • RASL assays can use a DNA or RNA ligase, such as Rnl2, which can join a fully DNA donor probe to a 3 '-diribonucleotide-terminated acceptor probe with high efficiency on an RNA template strand.
  • Rnl2-based RASL exhibits sub- femtomolar transcript detection sensitivity, and permits the rational tuning of probe signals for optimal analysis by massively parallel DNA sequencing (RASL-seq).
  • RT-PCR is used.
  • the PCR reagents may include any suitable PCR reagents.
  • dUTPs may be substituted for dTTPs during the primer extension or other amplification reactions, such that oligonucleotide products comprise uracil containing nucleotides rather than thymine containing nucleotides. This uracil-containing section of the universal sequence may later be used together with a polymerase that will not accept or process uracil-containing templates to mitigate undesired amplification products.
  • Amplification reagents may include a universal primer, universal primer binding site, sequencing primer, sequencing primer binding site, universal read primer, universal read binding site, or other primers compatible with a sequencing device, e.g., an Illumina sequencer, Ion Torrent sequencer, etc.
  • the amplification reagents may include P5, non cleavable 5' acrydite-P5, a cleavable 5' acrydite-SS-P5, Rlc, Biotin Rlc, sequencing primer, read primer, P5-Universal, P5- U, 52-BioRl-rc, a random N-mer sequence, a universal read primer, etc.
  • a primer comprises a modified nucleotide, a locked nucleic acid (LNA), an LNA nucleotide, a uracil containing nucleotide, a nucleotide containing a non-native base, a blocker oligonucleotide, a blocked 3' end, or 3 ' ddCTP.
  • LNA locked nucleic acid
  • a DNA microarray is a device for high-throughput investigations widely used in molecular biology and in medicine. It consists of an arrayed series of microscopic spots (“features” or“locations”) containing few picomoles of oligonucleotides carrying a specific DNA sequence. This can be a short section of a gene or other DNA element that are used as probes to hybridize a DNA or RNA sample under suitable conditions. Probe-target hybridization is usually detected and quantified by fluorescence-based detection of fluorophore-labeled targets to determine relative abundance of the nucleic acid target sequences. Microarrays have been used for the successfully decoding of ESAC DNA-encoded libraries.
  • the coding oligonucleotides representing the individual chemical compounds in the library are spotted and chemically linked onto the microarray slides, for example by using a BioChip Arrayer robot. Subsequently, the oligonucleotide tags of the binding compounds isolated from the selection are PCR amplified using a fluorescent primer and hybridized onto the DNA-microarray slide. Afterwards, microarrays are analyzed using a laser scan and spot intensities detected and quantified. The enrichment of the preferential binding compounds is revealed by comparing the intensity of the spots on the DNA-microarray slide before and after selection.
  • Sequencing may involve basic methods including Maxam-Gilbert sequencing and chain-termination methods, or de novo sequencing methods including shotgun sequencing and bridge PCR, or next-generation methods including polony sequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, HeliScope single molecule sequencing, SMRT® sequencing, and others.
  • basic methods including Maxam-Gilbert sequencing and chain-termination methods, or de novo sequencing methods including shotgun sequencing and bridge PCR, or next-generation methods including polony sequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, HeliScope single molecule sequencing, SMRT® sequencing, and others.
  • the present invention provides methods and kits for screening an encoded library against a nucleic acid target, such as a target RNA.
  • the present invention further provides methods of producing enriched encoded libraries and processing samples from such libraries, as well as compositions comprising such enriched encoded libraries.
  • the encoded library is a DNA-encoded library (DEL) of small molecules.
  • the target RNA is selected from a naturally occurring RNA or chimera, homolog, isoform, mutant, fragment, or analog thereof such as those described in detail herein.
  • the target RNA is associated with or implicated in a disease, such as those diseases described herein.
  • the target RNA is one of those listed in Table 1, 2, 3, or 4 herein.
  • the DEL is prepared using a split-and-pool, DNA-recording, or YoctoReactor® (yR) method of combinatorial synthesis.
  • the DEL is screened using Binder Trap Enrichment® (BTE) or emulsion-free, proximity-enhanced ligation conditions.
  • the present invention allows screening of the target without the need to conjugate the target to a DNA label (barcode).
  • the target is a nucleic acid that is not conjugated to, or does not comprise as part of its sequence, a non-natural barcode for use in identifying which DEL library members bind to the target during a screen.
  • the nucleic acid target acts as its own barcode.
  • the nucleic acid target comprises an identifier sequence that allows the nucleic acid to be identified during library decoding, for example by sequencing.
  • the present invention provides a proximity-driven (ligand/target interaction-driven) method to ligate the nucleic acid target to a bound small molecule DEL member.
  • the identifier sequence is an RNA nucleotide sequence of a naturally occurring RNA, or a chimera of two or more naturally occurring RNAs, or a homolog, isoform, fragment, or analog thereof.
  • the identifier sequence is an RNA nucleotide sequence associated with or implicated in a disease, such as those diseases and associated RNAs described herein. In some embodiments, the identifier sequence is at least a portion of one of those listed in Table 1, 2, 3, or 4 herein.
  • the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
  • the present invention provides a partially double-stranded, hybrid RNA- DNA ligation product comprising:
  • RNA strand comprising at least a portion of a biologically relevant target RNA
  • an at least partially double- stranded synthetic DNA molecule comprising a DNA strand that is at least partially hybridized to a single-stranded DNA oligonucleic acid and wherein the DNA oligonucleic acid comprises a 3 '- or 5'-overhang of 1-20 nucleotides; wherein the RNA strand and the DNA strand have been ligated between the 3 '-end of the
  • the biologically relevant target RNA is an RNA implicated in or a cause of a disease or disorder, such as one of those recited in Table 1, 2, 3, or 4.
  • the present invention provides a method of ligating an RNA strand to a DNA strand, comprising:
  • the partially double-stranded DNA molecule is produced by ligating together shorter fragments of dsDNA; or is produced by the steps of:
  • the partially double-stranded DNA molecule is produced by ligating together shorter fragments of dsDNA; or is produced by the steps of:
  • the partially double-stranded DNA molecule is produced by hybridizing the DNA strand to a single-stranded DNA splint having at least partial sequence complementarity to the DNA strand.
  • step (ii) further comprises contacting the RNA strand with a helper oligonucleic acid, wherein the helper oligonucleic acid has at least partial sequence complementarity to a region of the RNA strand adjacent to the portion of the RNA strand that has sequence complementarity to the 3 '-overhang.
  • the helper oligonucleic acid is an oligonucleotide.
  • the helper oligonucleotide is DNA, RNA, or LNA.
  • the helper oligonucleotide further comprises one or more nucleotide modifications selected from: (i) a sugar modification selected from 2'-OMe, 2'-F, 2',2'-difluoro, 2 '-Me, 2'- methoxyethyl, 2'-propyl, or replacement of a ribose or deoxyribose with an arabinose sugar, a BNA (Bridged Nucleic Acid) sugar, an LNA (Locked Nucleic Acid) sugar, or an ENA (2'-0,4'-C-ethylene-bridged nucleic acid) sugar;
  • a sugar modification selected from 2'-OMe, 2'-F, 2',2'-difluoro, 2 '-Me, 2'- methoxyethyl, 2'-propyl, or replacement of a ribose or deoxyribose with an arabinose sugar, a BNA (Bridged Nucleic Acid)
  • a base modification selected from pseudouracil, 2-methyladenine, 2,6- diaminopurine, 2-C1 adenine, 2-F adenine, 5-azauracil, 5-azacytidine, N2- methylguanine, N7-methyl guanine, N6-methyladenine, or a C-nucleobase (7- deazapurine or l-deazapyrimidine); or
  • nucleoside modification selected from 2'-Deoxypseudouridine, 2'-Deoxyuridine,
  • the helper oligonucleic acid further comprises a modificationo the phosphate backbone selected from boranophosphate, methylphosphonate, P-ethoxy, phosphonoacetate, phosphorothioate, or phosphorodithioate.
  • the helper oligonucleic acid is PNA or a morpholino oligomer.
  • the at least one ligase catalyzes ligation of the 3 '-overhang to the 5 '-end of the helper oligonucleic acid.
  • a first ligase and a second ligase are used in step (ii); the first ligase catalyzes ligation of the 3 '-overhang and the 5 '-end of the helper oligonucleic acid; and the second ligase catalyzes ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand.
  • the helper oligonucleic acid hybridizes to the RNA strand and facilitates the ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand.
  • the method further comprises, before or after step (i), hybridizing the helper oligonucleic acid to the RNA strand under appropriate conditions to effect the hybridization.
  • the DNA strand comprises at the 5 '-end a phosphate or analog thereof capable of participating in ligation.
  • the RNA strand comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
  • the single- stranded DNA splint comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
  • the helper oligonucleotide comprises at the 5 '-end a phosphate or analog thereof capable of participating in ligation.
  • the phosphate or analog thereof capable of participating in ligation has been added by chemical synthesis.
  • the phosphate or analog thereof capable of participating in ligation is a phosphate group.
  • the phosphate group has been added from phosphorylation by a kinase.
  • the phosphorylation is performed before step (ii) is performed.
  • the kinase is allowed to contact the DNA strand during step (ii).
  • the phosphate or analog thereof capable of participating in ligation is a 5 '-adenosine diphosphate group.
  • the ligated product is a template for reverse transcription.
  • the ligated product comprises at least one binding site for a primer sequence for a reverse transcriptase.
  • the ligated product is a template for PCR.
  • the ligated product comprises at least one binding site for a primer for PCR.
  • the method further comprises the step of ligating a single- stranded oligonucleic acid to the 5 '-end of the RNA strand, or to the DNA strand, thereby producing an extended ligated product.
  • the extended ligated product is a template for reverse transcription.
  • the extended ligated product comprises at least one binding site for a primer sequence for a reverse transcriptase.
  • the extended ligated product is a template for PCR.
  • the extended ligated product comprises at least one binding site for a primer for PCR.
  • the partially double-stranded DNA molecule is a member of a DNA-encoded library (DEL).
  • DEL DNA-encoded library
  • the partially double-stranded DNA molecule comprises a sequence that encodes the identity of a small molecule member of a DNA-encoded library (DEL).
  • DEL DNA-encoded library
  • the RNA strand is 30-1,000 nucleotides in length.
  • the 3 '-overhang is 1-20 nucleotides.
  • the 3'-overhang is 2-10 nucleotides.
  • the 3 '-overhang is 2, 3, 4, or 5 nucleotides.
  • the 3 '-overhang is 2 or 3 nucleotides.
  • the 3 '-overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
  • GC-content guanosine-cytosine content
  • the 3 '-overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
  • the helper oligonucleic acid is at least 5 nucleotides in length.
  • the helper oligonucleic acid is about 10 to about 75 nucleotides in length. [00147] In some embodiments, the helper oligonucleic acid is 10-50, 12-30, 14-25, 16-22, 17- 20, 18-19, or 18 nucleotides in length.
  • the helper oligonucleic acid is about 10 to about 50, about 12 to about 30, about 14 to about 25, about 16 to about 22, about 17 to about 20, about 18 to about 19, or about 18 nucleotides in length.
  • the helper oligonucleic acid has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
  • GC-content guanosine-cytosine content
  • the at least one ligase is selected from T4 RNA ligase 2, SplintR, ElectroLigase®, T4 DNA ligase, T3 DNA ligase, T4 RNA ligase 1, PBCV-l ligase, RtcB Ligase, bacteriophage TS2126 ligase, PBCV-l ligase, Taq DNA Ligase, HiFi Taq DNA Ligase, or 9°NTM DNA Ligase, CircLigase RNA ligase, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof.
  • the at least one ligase is selected from T4 RNA ligase 2, SplintR, T4 DNA ligase, or T3 DNA ligase.
  • the at least one ligase is T4 RNA ligase 2.
  • the at least one ligase is SplintR.
  • the at least one ligase is T4 DNA ligase or T3 DNA ligase.
  • the first ligase is selected from T4 RNA ligase 2, SplintR, T4
  • DNA ligase T3 DNA ligase, T4 RNA ligase 1, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof; and the second ligase is selected from T4 RNA ligase 2, SplintR, T4 DNA ligase, T4 RNA ligase I, RtcB Ligase, PBCV-l ligase, CircLigase RNA ligase, 5' App DNA/RNA ligase, or a truncated version thereof.
  • the at least one ligase is a combination of T4 DNA ligase and SplintR.
  • step (ii) further comprises adding a crowding agent such as a polyethylene glycol (PEG) (e.g., PEG4000), Ficoll, dextran, or albumin.
  • a crowding agent such as a polyethylene glycol (PEG) (e.g., PEG4000), Ficoll, dextran, or albumin.
  • step (ii) is performed at about 2-50 °C.
  • step (ii) is performed at about 4, 12, 16, 22, or 37 °C.
  • step (ii) is performed in a reaction buffer comprising about 25- 300 mM salt.
  • the present invention provides a ligation product prepared by any one of the foregoing methods.
  • the present invention provides a composition comprising:
  • RNA strand comprising at least a portion of a biologically relevant target RNA
  • an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 3 '-overhang of 2-5 nucleotides.
  • the 3 '-overhang has sequence complementarity to the 3 '-end of the RNA strand.
  • the composition further comprises one or more ligases capable of ligating the RNA strand to the DNA molecule.
  • the present invention provides a partially double-stranded RNA- DNA ligation product comprising:
  • RNA strand comprising at least a portion of a biologically relevant target RNA, or a homolog, isoform, or analog thereof;
  • an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 3 '-overhang of 2-5 nucleotides;
  • the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
  • the present invention provides a method of producing an enriched DNA-encoded library (DEL), comprising:
  • step (iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture;
  • step (iv) contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
  • step (v) ligating the DEL and the nucleic acid target of the at least one complex in the at least one aqueous emulsion droplet of step (iv) to form at least one ligated product;
  • step (vi) optionally, disrupting the at least one aqueous emulsion droplet of step (v);
  • the present invention provides a method of producing an enriched product
  • DEL DNA-encoded library
  • step (iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture;
  • the method is performed using an emulsion-free screen that employs proximity-based ligation.
  • the high factor dilution of step (iii) is 1 :2 to 1 : 10,000.
  • the time period of step (iii) is 1 minute to 48 hours.
  • the temperature of step (iii) is 4 °C to 65 °C.
  • the ligation is performed according to a method described above.
  • step (vi) is performed by contacting with at least one reagent selected from a surfactant, an alcohol, or a halogenated hydrocarbon solvent.
  • the aqueous emulsion preparation in step (iv) is a water-in-oil emulsion.
  • the surfactant is an anionic surfactant, a cationic surfactant, a zwitterionic surfactant, or a nonionic surfactant.
  • the surfactant is selected from Triton X-100, or Tween 80.
  • the nucleic acid target is an RNA, or a homolog, isoform, chimera, fragment, mutant, or analog thereof.
  • step (iv) is performed using Binder Trap Enrichment® (BTE).
  • the compartmentalization in step (iv) creates more compartments than there are members of the DEL in the sample.
  • the small molecules are covalently bound to their DNA barcodes by an amino-thiol linkage.
  • the present invention provides a method of processing a sample from an enriched DEL, comprising:
  • step (iii) sequencing the amplified products from step (ii) to produce a DEL library screen result.
  • the method further comprises, before step (ii), ligating a single- stranded oligonucleic acid to the 5 '-end of the RNA strand of the enriched DEL or to the DNA strand of the enriched DEL.
  • the method further comprises, before step (ii), contacting the enriched DEL with a reverse transcriptase (RT) to form an enriched DEL cDNA of the nucleic acid target.
  • RT reverse transcriptase
  • the present invention provides a composition comprising a plurality of enriched DEL cDNAs produced according to the foregoing method.
  • the sequencing in step (iii) is selected from microarray-based sequencing or high-throughput sequencing.
  • the present invention provides a method of screening an encoded library against a nucleic acid target, comprising:
  • v optionally, contacting the ligated nucleic acid with a reverse transcriptase (RT) under conditions selected such that the RT synthesizes a complementary DNA strand to the nucleic acid target to produce a double-stranded, ligated nucleic acid.
  • RT reverse transcriptase
  • the method further comprises the step of:
  • the method further comprises the step of:
  • the nucleic acid target is an RNA, or a homolog, isoform, mutant, chimera, fragment, or analog thereof.
  • the in vitro compartmentalization is an emulsion technique.
  • the in vitro compartmentalization is Binder Trap Enrichment®
  • the in vitro compartmentalization creates more compartments than there are members of the DEL in the sample.
  • a high-factor dilution is performed instead of an in vitro compartmentalization technique, and the dilution enables proximity-based ligation.
  • the DNA barcode comprises a nucleotide overhang complementary to the 3 '-end of the nucleic acid target.
  • the overhang is 1-20 nucleotides.
  • the overhang is about 2-10, 2-7, 2-5, 2-4, 3-5, 3-4, 2-3, about 2, or 2 nucleotides.
  • the overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
  • GC-content guanosine-cytosine content
  • the overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
  • the small molecules are covalently bound to their DNA barcodes by an amino-thiol linkage.
  • the method further comprises phosphorylating the 3' end of the nucleic acid target prior to the ligation of step (iv).
  • step (iv) further comprises breaking up the compartments created by an in vitro compartmentalization.
  • step (ii) is performed in an aqueous solution.
  • the method further comprises purifying the ligated nucleic acid and/or double-stranded, ligated nucleic acid.
  • the present invention provides an enriched DEL cDNA produced by the methods described above.
  • the present invention provides a composition comprising a plurality of enriched DEL cDNA molecules that encode an enriched DEL, wherein the enriched DEL is produced according to the methods described above.
  • the present invention provides a method of performing a multiplexed DEL screen, comprising:
  • RT reverse transcriptase
  • At least 10 nucleic acid targets of different sequence are screened in parallel.
  • At least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 150, 350, 500, 1,000, 5,000, 10,000, 100,000, or 1,000,000 nucleic acid targets of different sequence are screened in parallel.
  • about 10-100, 10- 1,000, 10-100,000, or 10-1,000,000 nucleic acid targets of different sequence are screened in parallel.
  • the ligation is performed according to an RNA-DNA ligation disclosed herein.
  • the present invention provides a kit for producing a ligated product comprising an RNA strand ligated to a DNA strand, comprising:
  • a buffer comprising buffering molecule, a chloride salt of a divalent cation, and ATP;
  • the present invention provides a method of producing an enriched DEL cDNA of a nucleic acid target, comprising:
  • the primer is about 10-30 nucleotides in length.
  • the primer is about 15-25 nucleotides in length.
  • the enriched DEL cDNA comprises at least one PCR primer binding site.
  • the DNA molecule is a member of a DNA-encoded library (DEL).
  • DEL DNA-encoded library
  • the DNA molecule comprises a sequence that encodes the identity of a small molecule member of a DNA-encoded library (DEL).
  • DEL DNA-encoded library
  • the nucleic acid target is an RNA molecule that is about 30- 1,000 ribonucleotides in length.
  • the 5'-overhang of step (i) is 2-5 nucleotides.
  • the 5 '-overhang of step (i) is 2, 3, 4, or 5 nucleotides.
  • the 5 '-overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
  • GC-content guanosine-cytosine content
  • the 5 '-overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
  • the DNA strand comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
  • the RNA strand comprises at the 5 '-end a phosphate or analog thereof capable of participating in ligation.
  • the phosphate or analog thereof capable of participating in ligation has been added by chemical synthesis.
  • the phosphate or analog thereof capable of participating in ligation is a phosphate group.
  • the phosphate group has been added from phosphorylation by a kinase.
  • the phosphorylation is performed before step (ii) is performed.
  • the kinase is allowed to contact the DNA strand during step (ii).
  • the phosphate or analog thereof capable of participating in ligation is a 5 '-adenosine diphosphate group.
  • the at least one ligase is selected from T4 RNA ligase 2, SplintR, ElectroLigase®, T4 DNA ligase, T3 DNA ligase, T4 RNA ligase 1, PBCV-l ligase, RtcB Ligase, bacteriophage TS2126 ligase, PBCV-l ligase, Taq DNA Ligase, HiFi Taq DNA Ligase, or 9°NTM DNA Ligase, CircLigase RNA ligase, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof.
  • the at least one ligase is selected from T4 DNA ligase, T3 DNA ligase, T7 DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or T4 DNA ligase, T4 RNA ligase 2, or SplintR.
  • step (ii) further comprises crowding agents such as polyethylene glycol (PEG) (e.g., PEG4000), Ficoll, dextran, or albumin.
  • PEG polyethylene glycol
  • step (ii) is performed at 2-50 °C.
  • step (ii) is performed at 4, 12, 16, 22, or 37 °C.
  • step (ii) is performed in a reaction buffer comprising 25-300 mM of a dissolved salt.
  • the present invention provides an RNA/DNA hybrid, prepared by any one of the methods described above.
  • the temperature in step (iv) is 4-60°C.
  • the temperature in step (iv) is 4, 10, 15, 20, 35, 30, 35, 40, 42,
  • the time period in step (iv) is at least about 5 minutes.
  • the time period in step (iv) is at least 10 minutes, at least 30 minutes, at least 1 hour, at least 2 hours, at least 10 hours, at least 24 hours, or at least 48 hours.
  • the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
  • the reverse transcriptase is Superscript III.
  • step (iii) further comprises contacting with a RNase inhibitor.
  • the RNase inhibitor is selected from SUPERase-In, RNaseOUT, or RNAsecure.
  • the RNase inhibitor is SUPERase-In.
  • the method further comprises:
  • the temperature in step (v) is at least 60°C.
  • the temperature in step (v) is about 75°C.
  • the time period in step (v) is at least 5 minutes.
  • the time period in step (v) is at least 15 minutes.
  • the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
  • the present invention provides a partially double-stranded RNA- DNA ligation product comprising:
  • RNA strand comprising at least a portion of a biologically relevant target RNA, or a homolog, isoform, mutant, or analog thereof;
  • an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 5 '-overhang of 2-5 nucleotides; wherein the 5 '-end of the RNA strand and the 3 '-end of the DNA strand have been ligated to form a contiguous sequence.
  • the present invention provides a method of producing an enriched DNA-encoded library (DEL), comprising: (i) providing a DEL of small molecules covalently conjugated to DNA barcodes, wherein each DNA barcode has a 5 '-overhang comprising at least one nucleotide;
  • step (ii) contacting the DEL of step (i) with a nucleic acid target under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes, and wherein the 5 '-overhang has sequence complementarity to the 5 '-end of the nucleic acid target;
  • step (iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
  • step (iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
  • step (v) ligating the DEL and the nucleic acid target of step (iii) to form a ligated complex
  • step (vii) optionally, disrupting the at least one aqueous emulsion droplet of step (vi);
  • steps (iv) and (vii) are omitted.
  • reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
  • the present invention provides a method of performing a multiplexed DEL screen, comprising:
  • each DNA barcode has a 5 '-overhang comprising at least one nucleotide
  • step (ii) contacting the DEL of step (i) with a plurality of nucleic acid targets of different sequences under conditions selected to allow binding between the small molecules and the nucleic acid targets to form a mixture of complexes, and wherein the 5'- overhang has sequence complementarity to the 5 '-end of the nucleic acid targets; (iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
  • step (iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
  • step (vii) optionally, disrupting the at least one aqueous emulsion droplet of step (vi);
  • steps (iv) and (vii) are omitted.
  • the ligation and/or reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
  • At least 10 nucleic acid targets of different sequence are screened in parallel.
  • the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
  • At least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 150, 350, 500, 1,000, 5,000, 10,000, 100,000, or 1,000,000 nucleic acid targets of different sequence are screened in parallel.
  • about 10-100, 10- 1,000, 10-100,000, or 10-1,000,000 nucleic acid targets of different sequence are screened in parallel.
  • the DEL comprises at least 1 x 10 3 library members. In some embodiments, the DEL comprises at least 1 x 10 4 library members. In some embodiments, the DEL comprises at least 1 x 10 5 library members. In some embodiments, the DEL comprises at least 1 x 10 6 library members. In some embodiments, the DEL comprises at least 1 x 10 7 , 1 x 10 8 , 1 x 10 9 , 1 x 10 10 , 1 x 10 11 , or 1 x 10 12 library members. In some embodiments, the DEL comprises from about 1 x 10 3 to about 1 x 10 12 library members. In some embodiments, the DEL comprises from about 1 x 10 4 to about 1 x 10 11 library members.
  • the DEL comprises from about 1 x 10 5 to about 1 x 10 10 library members. In some embodiments, the DEL comprises from about 1 x 10 6 to about 1 x 10 9 library members. In some embodiments, the DEL comprises about 1 x 10 3 , about 1 x 10 4 , about 1 x 10 5 , about 1 x 10 6 , about 1 x 10 7 , about 1 x 10 8 , about 1 x 10 9 , about 1 x 10 10 , about 1 x 10 11 , or about 1 x 10 12 library members. In some embodiments, the DEL comprises approximately a number of library members shown in Table A above or in FIG. 3.
  • the DEL comprises about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1,000 BB-oligos per position; or about 50-1,000, 50-900, 50-800, 50-700, 50-600, 50- 500, 50-400, 100-800, 100-600, 100-500, 100-300, 150-250, or 100-200, 200-300, 300-500, or 150-450 BB-oligos per position, for example per position in a YoctoReactor® used to prepare the DEL.
  • an enriched library or partially double stranded, ligated nucleic acid prepared according to the presently described methods lacks a primer binding site.
  • a nucleic acid comprising a primer binding site is optionally ligated onto the nucleic acid target, such as at the 5 '-end.
  • the primer binding site is about 10-40, 10-30, 10-20, 20-40, 20-30, or 15-30 nucleotides.
  • the nucleic acid comprising a primer binding site is optionally ligated to cDNA produced after a binding screen and reverse transcription.
  • the target nucleic acid comprises a primer binding site.
  • the primer binding site is present at or near the 3 '-end or the 5 '-end of the nucleic acid.
  • the present invention provides enriched libraries, compositions comprising such libraries, as well as methods of preparing such enriched libraries and processing samples of such enriched libraries, wherein reverse transcription of a bound-together library member-nucleic acid target (bound complex) is used to capture information about the binding event.
  • a ligation of the nucleic acid target to the DEL library member’s barcode is not used. In some embodiments, such a ligation is optionally included.
  • the present invention provides methods and kits for screening an encoded library against a nucleic acid target, such as a target RNA, wherein reverse transcription is used to capture information about the binding event.
  • the present invention further provides methods of producing enriched encoded libraries and processing samples from such libraries, as well as compositions comprising such enriched encoded libraries, wherein reverse transcription is used to capture information about the binding event, e.g. capture binding information about which encoded compounds are hits (bind to the nucleic acid target).
  • the encoded library is a DNA-encoded library (DEL) of small molecules.
  • the present invention provides a method of producing an enriched DEL cDNA of a nucleic acid target, comprising:
  • RT reverse transcriptase
  • the DEL encodes a small molecule candidate compound and the nucleic acid target is a target to which the small molecule binds.
  • the nucleic acid target is a target RNA.
  • step (ii) is performed after a high-factor dilution that optionally comprises in vitro compartmentalization.
  • the in vitro compartmentalization is an aqueous emulsion based technique.
  • the in vitro compartmentalization is Binder Trap Enrichment® (BTE).
  • the in vitro compartmentalization creates more compartments than there are members of the DEL in the sample.
  • the 3'-overhang is 1-10 nucleotides. [00277] In some embodiments, the 3 '-overhang is 2, 3, 4, or 5 nucleotides.
  • the 3 '-overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
  • GC-content guanosine-cytosine content
  • the 3 '-overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
  • the temperature is about 4-60 °C.
  • the temperature is about 4, 10, 15, 20, 35, 30, 35, or 40 °C.
  • the time period is at least 5 minutes.
  • the time period is at least 10 minutes, at least 30 minutes, at least 1 hour, about 10 minutes to about 1 hour, about 5 minutes to 2 hours, about 5 minutes to 10 hours, about 10 minutes to 24 hours, or about 15 minutes to 48 hours.
  • the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
  • the reverse transcriptase is Superscript III.
  • the method further comprises carrying out step (i) or (ii), or both, in the presence of an RNase inhibitor.
  • the RNase inhibitor is selected from SUPERase-In, RNaseOUT, or RNAsecure.
  • the RNase inhibitor is SUPERase-In.
  • the method further comprises:
  • the temperature in step (iii) is at least about 60 °C.
  • the temperature in step (iii) is about 75 °C.
  • the time period in step (iii) is at least 5 about minutes.
  • the time period in step (iii) is at least about 15 minutes.
  • the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
  • the present invention provides a method of producing an enriched DNA-encoded library (DEL), comprising:
  • step (ii) contacting the DEL of step (i) with a nucleic acid target, wherein the 3 '-overhang has sequence complementarity to the 3 '-end of the nucleic acid target, under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes;
  • step (iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
  • step (iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
  • step (vi) optionally, disrupting the at least one aqueous emulsion droplet of step (v);
  • steps (iv) and (vi) are omitted.
  • reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
  • the present invention provides a method of performing a multiplexed DEL screen, comprising: (i) providing a DEL of small molecules covalently conjugated to DNA barcodes, wherein the DEL has a 3 '-overhang comprising at least two nucleotides;
  • step (ii) contacting the DEL of step (i) with a plurality of nucleic acid targets that have different sequences, wherein the 3 '-overhang has sequence complementarity to the 3 '-end of each nucleic acid target, under conditions selected to allow binding between the small molecules and the nucleic acid targets to form a mixture of complexes;
  • step (iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
  • step (iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
  • step (vi) optionally, disrupting the at least one aqueous emulsion droplet of step (v);
  • steps (iv) and (vi) are omitted.
  • reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
  • At least 10 nucleic acid targets of different sequence are screened in parallel.
  • the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
  • At least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 150, 350, 500, 1,000, 5,000, 10,000, 100,000, or 1,000,000 nucleic acid targets of different sequence are screened in parallel.
  • about 10-100, 10- 1,000, 10-100,000, or 10-1,000,000 nucleic acid targets of different sequence are screened in parallel.
  • the nucleic acid target is a target RNA, e.g. an RNA selected from a naturally occurring RNA or chimera, homolog, isoform, mutant, fragment, or analog thereof such as those described in detail herein.
  • the target RNA is associated with or implicated in a disease, such as those diseases described herein.
  • the target RNA is one of those listed in Table 1, 2, 3, or 4 herein.
  • the DEL is prepared using a split-and-pool, DNA-recording, or YoctoReactor® (yR) method of combinatorial synthesis.
  • the DEL is screened using Binder Trap Enrichment® (BTE) or emulsion-free, proximity-enhanced ligation conditions.
  • the DEL comprises at least 1 x 10 3 library members. In some embodiments, the DEL comprises at least 1 x 10 4 library members. In some embodiments, the DEL comprises at least 1 x 10 5 library members. In some embodiments, the DEL comprises at least 1 x 10 6 library members. In some embodiments, the DEL comprises at least 1 x 10 7 , 1 x 10 8 , 1 x 10 9 , 1 x 10 10 , 1 x 10 11 , or 1 x 10 12 library members. In some embodiments, the DEL comprises from about 1 x 10 3 to about 1 x 10 12 library members. In some embodiments, the DEL comprises from about 1 x 10 4 to about 1 x 10 11 library members.
  • the DEL comprises from about 1 x 10 5 to about 1 x 10 10 library members. In some embodiments, the DEL comprises from about 1 x 10 6 to about 1 x 10 9 library members. In some embodiments, the DEL comprises about 1 x 10 3 , about 1 x 10 4 , about 1 x 10 5 , about 1 x 10 6 , about 1 x 10 7 , about 1 x 10 8 , about 1 x 10 9 , about 1 x 10 10 , about 1 x 10 11 , or about 1 x 10 12 library members. In some embodiments, the DEL comprises approximately a number of library members shown in Table A above or in FIG. 3.
  • the DEL comprises about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1,000 BB-oligos per position; or about 50-1,000, 50-900, 50-800, 50-700, 50-600, 50- 500, 50-400, 100-800, 100-600, 100-500, 100-300, 150-250, or 100-200, 200-300, 300-500, or 150-450 BB-oligos per position in a YoctoReactor® used to prepare the DEL.
  • an enriched library or partially double stranded, ligated nucleic acid prepared according to the presently described methods lacks a primer binding site.
  • a nucleic acid comprising a primer binding site is optionally ligated onto the nucleic acid target, such as at the 5 '-end.
  • the primer binding site is about 10-40, 10-30, 10-20, 20-40, 20-30, or 15-30 nucleotides.
  • the nucleic acid comprising a primer binding site is optionally ligated to cDNA produced after a binding screen and reverse transcription.
  • the target nucleic acid comprises a primer binding site.
  • the primer binding site is present at or near the 3 '-end or the 5 '-end of the nucleic acid.
  • the ligase is an RNA ligase capable of ligating an RNA strand to a DNA strand.
  • a combination of two or more ligases is used.
  • a DNA ligase is used.
  • an RNA ligase is used.
  • an RNA ligase is used in combination with a DNA ligase.
  • the RNA ligase ligates the RNA strand to the DNA strand, and the DNA ligase ligates the DNA splint to the helper oligonucleotide.
  • the ligase is generally added or included in the assay mixture before the emulsion forming step or concurrently with it. In some embodiments, the ligase is added during the dilution step.
  • RNA Ligase 1 from bacteriophage T4-infected E. coli catalyzes the adenosine triphosphate (ATP)-dependent formation of a 3 ' to 5' phosphodiester bond between an RNA molecule with a 3 '-hydroxyl group (the acceptor molecule) and another molecule bearing a 5'-phosphoryl group (the donor molecule).
  • the reaction occurs in three steps, involving covalent intermediates (see, e.g ., Silverman, S.“Practical and general synthesis of 5'-adenylated RNA (5'- AppRNA),” RNA 2004, 10, 731-746; England, T. E. et al,“Dinucleoside pyrophosphates are substrates for T4-induced RNA ligase,” Proc. Natl. Acad. Sci. USA, 1977, 74, 4839-4842.):
  • T4 RNA Ligase 1 reacts with ATP to form a covalent enzyme-AMP intermediate (“adenylated enzyme”).
  • adenylated enzyme covalent enzyme-AMP intermediate
  • the adenyl group is transferred from the adenylated enzyme to the 5 '-phosphoryl end of a RNA molecule, to form a 5',5'-phosphoanhydride bond (5'-App-RNA) with the elimination of adenosine monophosphate (AMP).
  • the 5 App-RNA donor reacts with the 3 '-hydroxyl group of another acceptor RNA molecule, in the absence of ATP, to form a standard 3' to 5' phosphodiester bond between the acceptor and donor RNA molecules.
  • an adenosine 5 '-monophosphate (AMP) group is transferred from the cofactor NAD + or ATP to a lysine residue in the adenylation motif KXDG (in single-letter amino- acid code where X denotes any amino acid) through a phosphoamide linkage.
  • the AMP group is transferred to the 5 '-phosphate at the nick through a pyrophosphate linkage to form a DNA- or RNA-adenylate intermediate (AppDNA or AppRNA, respectively).
  • a phosphodiester bond is formed to seal the nick and release AMP.
  • DNA ligase is an essential component involved in various DNA transactions, including replication, repair and recombination.
  • DNA ligases can be classified into two families based on adenylation cofactor dependence.
  • ATP-dependent ligases are found in bacterial and eukaryotic viruses, Archaea, yeast, mammals and eubacteria.
  • NAD + -dependent ligases are found almost exclusively in eubacteria with the exception of the sequenced entomopoxvirus genomes Melanoplus sanguinipes and Amsacta moorei.
  • Some simple eubacteria genomes encode both NAD + - and ATP-dependent ligases, whereas many eukaryotic organisms encode multiple ATP-dependent ligases to fulfil diverse biological functions.
  • a chemically adenylated or pre-5 '-adenylated DNA or RNA molecule may also be used as the donor molecule in step (iii) above.
  • This approach has proven useful in 3 '-ligation-tagging of RNA molecules.
  • a 5 '-adenylated donor oligonucleotide (5'- App-DNA) is ligated to the 3 ' end of a miRNA acceptor using T4 RNA Ligase 1 in the absence of ATP (Ebhardt, H. A. et al. ,“Extensive 3' modification of plant small RNAs is modulated by helper component-proteinase expression,” Proc. Natl. Acad. Sci.
  • the 5 '-adenylated donor oligonucleotide additionally contains a blocking group at its 3' end (5'-App-DNA-X), thereby preventing self-ligation of the donor oligonucleotide (Hafner, M. etal.,“Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing,” Methods 2008, 44, 3-12.); the reaction is catalyzed by T4 RNA Ligase 2.
  • Such 5'-adenylated, 3'-blocked oligonucleotides are available commercially (US Patent Application 2009/0011422 and Vigneault, F.
  • the ligase is selected from T7 DNA Ligase, Thermostable 5' AppDNA/RNA Ligase, T3 DNA Ligase, ElectroLigase®, T4 RNA Ligase 1, T4 RNA Ligase 2, truncated T4 RNA Ligase 2, a mixture comprising T4 RNA Ligase 1 and truncated T4 RNA Ligase 2, T4 RNA Ligase 2 (Truncated K227Q), T4 RNA Ligase 2 (Truncated KQ), RtcB Ligase, SplintR, bacteriophage TS2126 ligase, CircLigase I ssDNA Ligase, or CircLigase II ssDNA Ligase (EPICENTRE), E. coli DNA Ligase, Taq DNA Ligase, HiFi Taq DNA Ligase, or 9°NTM DNA Ligase.
  • the ligase is a truncated ligase, such as a truncated version of any of the foregoing full length ligases.
  • Truncated T4 RNA Ligase 2 (T4 Rnl2tr) specifically ligates the pre-adenylated 5'-end of DNA or RNA to the 3 '-end of RNA.
  • the enzyme does not require ATP for ligation but does need the pre-adenylated substrate.
  • T4 Rnl2tr is expressed from a plasmid in E. coli which encodes the first 249 amino acids of the full length T4 RNA Ligase 2.
  • T4 Rnl2tr cannot ligate the phosphorylated 5 '-end of RNA or DNA to the 3 '-end of RNA.
  • This enzyme also known as Rnl2 (1-249) has been used for optimized linker ligation for the cloning of microRNAs. This enzyme reduces background ligation because it can only use pre-adenylated linkers.
  • Crowding agents may be included to increase ligation efficiency or obtain other desired results.
  • a crowding agent such as a polymer is present during the ligation reaction at a concentration of about 1%, 2%, 5%, 8%, 10%, 12%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, or 75% w/w.
  • the crowding agent is present during the ligation reaction at a concentration of more than about 6%, 10%, 18%, 20%, 25%, 30%, 36%, 40%, 50% w/w or more.
  • the crowding agent is present during the ligation reaction at a concentration of less than about 6%, 10%, 18%, 20%, 30%, 36%, 40%, or 50% w/w.
  • the crowding agent is a polymer or protein.
  • the crowding agent is a water-soluble or hydrophilic polymer, such as polyethylene glycol (PEG).
  • PEG polyethylene glycol
  • the crowding agent is selected from PEG4000, Ficoll, an albumin protein such as bovine serum albumin (BSA) or ovalbumin, hemoglobin, or dextran.
  • BSA bovine serum albumin
  • Noncoding RNAs such as microRNA (miRNA) and long noncoding RNA (lncRNA) regulate transcription, splicing, mRNA stability/decay, and translation.
  • the noncoding regions of mRNA such as the 5' untranslated regions (5' ETTR), the 3 ' ETTR, and introns can play regulatory roles in affecting mRNA expression levels, alternative splicing, translational efficiency, and mRNA and protein subcellular localization.
  • RNA secondary and tertiary structures are critical for these regulatory activities.
  • the target RNA is a non-coding RNA or non-coding region of an RNA that includes both non-coding and coding regions.
  • the target RNA is a coding RNA, such as an mRNA or coding region of an RNA that includes both non-coding and coding regions.
  • Targeting mRNA allows modulation of downstream production of proteins. This provides a new approach to modulating the function of otherwise intractable protein targets as well as proteins that are capable of being targeted by conventional drug discovery methods (e.g., by small molecules or biologies).
  • the target RNA is an mRNA or the coding or non-coding region of an mRNA.
  • GW AS studies have shown that there are far more single nucleotide polymorphisms (SNPs) associated with human disease in the noncoding transcriptome relative to the coding transcripts (Maurano et al., Science 337: 1190-1195; 2012). Therefore, the therapeutic targeting of noncoding RNAs and noncoding regions of mRNA can yield novel agents to treat to previously intractable human diseases.
  • SNPs single nucleotide polymorphisms
  • oligonucleotides as therapeutics include unfavorable pharmacokinetics, lack of oral bioavailability, and lack of blood-brain-barrier penetration, with the latter precluding delivery to the brain or spinal cord after parenteral drug administration for the treatment of neurological diseases.
  • oligonucleotides are not taken up effectively into solid tumors without a complex delivery system such as lipid nanoparticles.
  • oligonucleotides that are taken up into cells and tissues remain in a non-functional compartment such as endosomes, and only a small fraction of the material escapes to gain access to the cytosol and/or nucleus where the target is located.
  • Small molecules can be optimized to exhibit excellent absorption from the gut, excellent distribution to target organs, and excellent cell penetration.
  • “conventional” e.g.,“Lipinski-compliant” (Lipinski et al., Adv. Drug Deliv. Rev. 2001, 46, 3-26) small molecules with favorable drug properties that bind and modulate the activity of a target RNA would solve many of the problems noted above.
  • the present invention provides a method of identifying the identity or structure of a binding or active site to which a small molecule binds in a target RNA, comprising the steps of (i) contacting the target RNA with a disclosed small molecule DEL library member and (ii) capturing information about binding of the DEL library member to the target RNA by a method disclosed herein, optionally in combination with sequencing and/or a computational method to process the information about binding and thus identify hits.
  • the target RNA is selected from an mRNA or a noncoding RNA.
  • the target RNA is an aptamer or riboswitch.
  • the RNA is the FMN riboswitch, PreQi, or Aptamer 21.
  • the assay identifies the location in the primary sequence of the binding site(s) on the target RNA.
  • the target RNA is a full-length transcript or may be a truncated version thereof.
  • the polyA tail present in the full-length mRNA is optionally omitted from the target to simplify or streamline the encoded library screen.
  • the target RNA contributes to or causes a triplet repeat expansion disease (TRED) such as a CAG repeat, the number of repeats may be reduced to make the screen simpler, streamlined, or more tractable, while still yielding useful binding data.
  • TRED triplet repeat expansion disease
  • the nucleic acid target is an analog of the corresponding naturally occuring nucleic acid target, e.g. target RNA.
  • An“analog,” as used herein, includes a nucleic acid modified at one or more positions. Such modifications include, but are not limited to, replacing a nucleotide with a nucleotide analog, replacing a sugar with a modified sugar, replacing a nucleobase with a modified nucleobase, conjugating a fluorophore or reporter group, or the like.
  • the nucleic acid target is a chimera, for example a chimeric sequence that combines portions of the sequences of two or more nucleic acid targets, such as two target RNAs.
  • the nucleic acid target is a homolog or isoform of a naturally occurring nucleic acid target, such as a bacterial or murine homolog or isoform of a corresponding human RNA. This may be advantageous where the target of interest is too long, of unknown sequence, or not amenable to study in a model system or assay.
  • the nucleic acid target may be modified by appending primer binding sites, a fluorophore, a radioactive isotope, a pull-down group such as a hapten (e.g. fluorescein, biotin, digoxigenin, or dinitrophenol), or an artificial sequence.
  • a primer binding sequence is appended to the 3 '-end or 5 '-end of the nucleic acid target.
  • an oligonucleic acid region is appended to the 3 '-end or 5'-end of the nucleic acid target that is at least partially complementary to a 3 '-overhang (or 5 '-overhang, respectively) present in a DEL library member and/or at least partially complementary to the helper oligonucleic acid used in certain embodiments of the present invention.
  • the nucleic acid target such as a target RNA
  • the nucleic acid target is single-stranded.
  • the nucleic acid target is double-stranded or partially double-stranded.
  • the nucleic acid target is a pair of nucleic acids engaged in an interaction, such as an miRNA-mRNA hybridized (or partially hybridized) pair.
  • the nucleic acid target comprises one, two, or more miRNAs bound to an mRNA.
  • the nucleic acid target is an mRNA, miRNA, premiRNA, or a viral or fungal RNA.
  • the nucleic acid target includes structural features such as at least some intramolecular base pairing, a junction (e.g., c/s or trans three-way junctions (3WJ)), quadruplex, hairpin, triplex, bulge loop, pseudoknot, or internal loop, etc., and any transient forms or structures adopted by the nucleic acid.
  • the nucleic acid target includes a bound protein, such as a chaperone, RNA-binding protein (RBP), or other nucleic acid-binding protein.
  • the assay conditions are selected such that the structure and/or structural dynamics of the nucleic acid target in the assay conditions match, as closely as possible, the native ⁇ in vivo) structure and/or structural dynamics of the nucleic acid target, at least during the step in which the small molecule DEL library member is allowed to bind to the target.
  • Nucleic acid targets such as target RNAs, of various lengths are compatible with the present invention.
  • the target may be from 20-10,000 nucleotides in length.
  • the target is a relatively short sequence of, e.g., less than 250, less than 100, or less than 50 nucleotides in length.
  • the target is 100 or more nucleotides in length.
  • the target is 250 or more nucleotides in length.
  • the target is up to about 350, 450, 500, 600, 750, or 1,000, 2,000, 3,000, 4,000, 5,000, 7,500, 10,000, 15,000, 25,000, 50,000, or more than 50,000 nucleotides in length.
  • the target is between about 30 and about 500 nucleotides in length. In some embodiments, the target is between about 250 and about 1,000 nucleotides in length. In some embodiments, the target is between about 20-50, 30-60, 40-70, 50-80, 20-100, 30-100, 40-100, 50- 100, 20-200, 30-200, 40-200, 50-200, 20-300, 50-300, 75-300, 100-300, 20-400, 50-400, 100-400, 200-400, 20-500, 50-500, 100-500, 250-500, 20-750, 50-750, 100-750, 250-750, 500-750, 20- 1,000, 100-1,000, 250-1,000, 500-1,000, 20-2,000, 100-2,000, 500-2,000, 1,000-2,000, 20-5,000, 100-5,000, 1,000-5,000, 20-10,000, 100-10,000, 1,000-10,000, or 20-25,000 nucleotides in length.
  • the target or other referenced nucleic acicd is an RNA
  • “nucleotides” refers to ribonucleotides
  • the target or other referenced nucleic acid is DNA
  • “nucleotides” refers to 2'-deoxyribonucleotides.
  • the target is an RNA such as a pre-mRNA, pre-miRNA, or pretranscript.
  • the RNA is a non-coding RNA (ncRNA), messenger RNA (mRNA), micro-RNA (miRNA), a ribozyme, riboswitch, lncRNA, lincRNA, snoRNA, snRNA, scaRNA, piRNA, rRNA, ceRNA, or pseudo-gene, wherein each of the foregoing may be selected from a human or non-human RNA, such as viral RNA, fungal RNA, or bacterial RNA.
  • one nucleic acid target is screened in a disclosed method. However, screening of more than one target is also contemplated. For example, in some embodiments 2, 3, 4, 5, 6, 7, 8, 9, 10, or more targets are screened at one time. In some embodiments, different or partially identical portions of a single nucleic acid target are screened at once. For example, nucleotides 1-50 of a hypothetical target comprising 300 nucleotides may form one target, nucleotides 10-60 may form a second target, nucleotides 40-100 may form a third target, and so on. Without wishing to be bound by theory, this might yield information about the influence of different portions of the sequence on a putative or known binding site on the full- length target. In some embodiments, the targets are different nucleic acids, e.g. they have little or no sequence homology and/or are from distict portions of the genome and/or have unrelated biological roles.
  • a screen of a small molecule DEL is performed against 2-10, 2- 100, 2-1,000, 2-10,000, 2-100,000, or 2-1,000,000 different nucleic acid targets, which have only partial, minimal, or no sequence homology.
  • the different nucleic acid targets have some sequence homology, for example if they are nucleic acids of a similar function or group of functions. In other embodiments, the different nucleic acid targets have little or no sequence homology, for example if they are nucleic acids that have no particular relationship to one another.
  • the nucleic acid target is selected from a panel of natural or artificial RNAs such as those derived from in vitro transcription (IVT) or cell lysates, or may be an artificially generated library of nucleic acid targets, such as aptamers.
  • sample and“biological sample” are used in their broadest sense and encompass samples or specimens obtained from any source, including biological and environmental sources.
  • sample when used to refer to biological samples obtained from organisms, includes bodily fluids, isolated cells, fixed cells, cell lysates and the like. The organisms include bacteria, viruses, fungi, plants, animals, and humans.
  • sample refers to a mixture of encoded library compounds or other test compounds being screened for activity against a target RNA.
  • the sample may be taken from any step along the process of screening the compounds, including the final step comprising isolated, PCR- amplified fragments encoding hits from a screen.
  • these examples are not to be construed as limiting the types of samples or organisms that find use with the present invention.
  • the term“incubating” and variants thereof mean contacting one or more components of a reaction with another component or components, under conditions and for sufficient time such that a desired reaction product is formed.
  • a“nucleoside” refers to a molecule consisting of a guanine (G), adenine (A), thymine (T), uridine (U), or cytidine (C) base covalently linked to a pentose sugar
  • “nucleotide” or“mononucleotide” refers to a nucleoside phosphorylated at one of the hydroxyl groups of the pentose sugar.
  • Nucleoside also encompasses analogs of G, A, T, C, or U and natural or non-natural nucleic acid components wherein the base, sugar, and/or phosphate backbone have been modified or replaced. Nucleoside analogs are known in the art and include those described herein. Also included are endogenous, post-transcriptionally modified nucleosides, such as methylated nucleosides.
  • Linear nucleic acid molecules are said to have a“5' terminus” (5 '-end) and a“3 ' terminus” (3 '-end) because, except with respect to adenylation (as described elsewhere herein), mononucleotides are joined in one direction via a phosphodiester linkage (or analog thereof) to make oligonucleotides, in a manner such that a phosphate (or analog thereof) on the 5' carbon of one mononucleotide sugar is joined to an oxygen on the 3 ' carbon of the sugar of its neighboring mononucleotide.
  • an end of an oligonucleotide is referred to as the“5' end” if its 5' phosphate (or analog thereof) is not linked to the oxygen of the 3 ' carbon of a mononucleotide sugar, and as the“3' end” if its 3 ' oxygen is not linked to a 5' phosphate (or analog thereof) of a subsequent mononucleotide sugar.
  • A“terminal nucleotide,” as used herein, is the nucleotide at the end position of the 3' or 5' terminus. The 3' or 5' terminus may alternatively end in a 3'-OH or 5 '-OH if the terminal nucleotide is not phosphorylated.
  • nucleic acid refers to a covalently linked sequence of nucleotides in which the 3' position of the sugar of one nucleotide is joined by a phosphodiester bond to the 5' position of the sugar of the next nucleotide (i.e., a 3' to 5' phosphodiester bond), and in which the nucleotides are linked in specific sequence; i.e., a linear order of nucleotides.
  • Nucleic acid includes analogs of the foregoing wherein one or more nucleotides are modified at the base, sugar, or phosphodiester. Such analogs are known in the art and include those described elsewhere herein.
  • “polynucleotide” or“polynucleic acid” refers to a long nucleic acid sequence (or analog thereof) of many nucleotides.
  • a polynucleotide or polynucleic acid
  • an“oligonucleotide” or“oligonucleic acid” is a short polynucleotide or a portion of a polynucleotide.
  • an oligonucleotide may be between 5-10, 10-60, or 10-200 nucleotides in length.
  • a nucleic acid, oligonucleotide, or polynucleotide consists of, consists primarily of, or is mostly 2'-deoxyribonucleotides (DNA) or ribonucleotides (RNA).
  • an oligonucleotide consists of or comprises 2'-deoxyribonucleotides (DNA).
  • the oligonucleotide consists of or comprises ribonucleotides (RNA).
  • the oligonucleotide is a DNA-RNA hybrid, such as a DNA sequence of contiguous nucleotides linked to an RNA sequence of contiguous nucleotides, or with some regions of RNA and some regions of DNA.
  • RNA ribonucleic acid
  • RNA ribonucleic acid
  • biological context e.g., the RNA may be in the nucleus, circulating in the blood, in vitro , cell lysate, or isolated or pure form
  • physical form e.g., the RNA may be in single-, double-, or triple- stranded form (including RNA-DNA hybrids)
  • the RNA is 100 or more nucleotides in length. In some embodiments, the RNA is 250 or more nucleotides in length. In some embodiments, the RNA is 350, 450, 500, 600, 750, or 1,000, 2,000, 3,000, 4,000, 5,000, 7,500, 10,000, 15,000, 25,000, 50,000, or more ribonucleotides in length. In some embodiments, the RNA is between 250 and 1,000 ribonucleotides in length. In some embodiments, the RNA is a pre-RNA, pre-miRNA, or pretranscript.
  • the RNA is a non-coding RNA (ncRNA), messenger RNA (mRNA), micro-RNA (miRNA), a ribozyme, riboswitch, lncRNA, lincRNA, snoRNA, snRNA, scaRNA, piRNA, ceRNA, or pseudo- gene, wherein each of the foregoing may be selected from a human or non-human RNA, such as viral RNA, fungal RNA, or bacterial RNA.
  • target RNA or“RNA target” as used herein means any type of RNA having a secondary or tertiary structure capable of binding a small- molecule ligand described herein.
  • the target RNA may be inside a cell, in a cell lysate, or in isolated form prior to contacting the compound.
  • RNA ligase means an enzyme that is capable of catalyzing the joining or ligating of an RNA acceptor molecule, which has a hydroxyl group on its 3 ' or 5' terminus, to an RNA or DNA donor molecule.
  • “DNA ligase” means an enzyme that is capable of catalyzing the joining or ligating of a DNA acceptor molecule, which has a hydroxyl group on its 3' or 5' terminus, to an RNA or DNA donor molecule.
  • the donor molecule has a 5' phosphate group on its 5' terminus and/or a 3' phosphate on its 3' terminus.
  • the invention is not limited with respect to the RNA ligase, and any RNA ligase from any source can be used in an embodiment of the methods and kits of the present invention that is capable of effecting the required ligation reaction.
  • the RNA ligase is a polypeptide (gp63) encoded by bacteriophage T4 gene 63; this enzyme, which is commonly referred to simply as“T4 RNA Ligase,” is more correctly now called“T4 RNA Ligase 1” since a second RNA ligase (gp24. l) that is encoded by bacteriophage T4 gene 24.1 is known, which is now called“T4 RNA Ligase 2” (Ho, C. K.
  • a“single-strand ligase” is a DNA or RNA ligase enzyme that is active on single-stranded DNA or RNA molecules.
  • the ligase is a single-strand ligase.
  • the ligase is a double-stranded ligase.
  • the ligase is T4 RNA ligase 2 (non-truncated), a dsRNA ligase, or SplintR, also a double-stranded ligase.
  • the terms“buffer” or“buffering agents” refer to materials that, when added to a solution, cause the solution to resist changes in pH.
  • reaction buffer refers to a buffering solution in which an enzymatic or chemical reaction is performed.
  • in“isolated RNA” or“purified RNA” refers to a nucleic acid that is identified and separated from at least one contaminant with which it is ordinarily associated in its source.
  • an isolated or purified nucleic acid e.g., DNA and RNA
  • a given DNA sequence e.g., a gene
  • a specific RNA e.g., a specific mRNA encoding a specific protein
  • the isolated or purified polynucleotide or nucleic acid may be present in single-stranded or double- stranded form.
  • RNA-mediated disorders, diseases, and/or conditions means any disease or other deleterious condition in which RNA, such as an overexpressed, underexpressed, mutant, misfolded, expanded, pathogenic, or oncogenic RNA, is known to play a role.
  • an inhibitor is defined as a compound that binds to and/or modulates or inhibits a target RNA with measurable affinity.
  • an inhibitor has an IC50 and/or binding constant of less than about 100 mM, less than about 50 pM, less than about 1 mM, less than about 500 nM, less than about 100 nM, less than about 10 nM, or less than about 1 nM.
  • measurable affinity and“measurably inhibit,” as used herein, mean a measurable change in a downstream biological effect between a sample comprising a compound of the present invention, or composition thereof, and a target RNA, and an equivalent sample comprising the target RNA, in the absence of said compound, or composition thereof.
  • Modulating the function of a target RNA includes enhancing or increasing the function of the RNA and decreasing or agonizing the function of the RNA.
  • the target RNA is an mRNA.
  • a provided small molecule binds to a coding region of the mRNA.
  • a provided small molecule binds to a non-coding region of the mRNA.
  • noncoding regions can affect the level of mRNA and protein expression. Briefly, these include IRES and upstream open reading frames (uORF) that affect translation efficiency, intronic sequences that affect splicing efficiency and alternative splicing patterns, 3' UTR sequences that affect mRNA and protein localization, and elements that control mRNA decay and half-life. Therapeutic modulation of these RNA elements can have beneficial effects.
  • the target RNA is the 3 ' UTR or 5' UTR of an mRNA.
  • mRNAs may contain expansions of simple repeat sequences such as trinucleotide repeats. These repeat expansion containing RNAs can be toxic and have been observed to drive disease pathology, particularly in certain neurological and musculoskeletal diseases (see Gatchel & Zoghbi, Nature Rev. Gen. 2005, 6 , 743-755, hereby incorporated by reference).
  • splicing can be modulated to skip exons having mutations that introduce stop codons in order to relieve premature termination during translation.
  • Small molecules can be used to modulate splicing of pre-mRNA for therapeutic benefit in a variety of settings.
  • One example is spinal muscular atrophy (SMA).
  • SMA is a consequence of insufficient amounts of the survival of motor neuron (SMN) protein.
  • Humans have two versions of the SMN gene, SMN1 and SMN2.
  • SMA patients have a mutated SMN1 gene and thus rely solely on SMN2 for their SMN protein.
  • the SMN2 gene has a silent mutation in exon 7 that causes inefficient splicing such that exon 7 is skipped in the majority of mature SMN2 transcripts, leading to the generation of a defective protein that is rapidly degraded in cells, thus limiting the amount of SMN protein produced from this locus.
  • a small molecule that promotes the efficient inclusion of exon 7 during the splicing of SMN2 transcripts would be an effective treatment for SMA (Palacino et al, Nature Chem. Biol., 2015, 11, 511-517, hereby incorporated by reference).
  • the present invention provides a method of identifying a small molecule that modulates the splicing of a target pre-mRNA to treat a disease or disorder, comprising the steps of: screening one or more encoded small molecules for binding to the target pre-mRNA (with or without any RBPs that may normally bind to the target); and analyzing the results by an RNA binding assay disclosed herein.
  • the pre-mRNA is an SMN2 transcript.
  • the disease or disorder is spinal muscular atrophy (SMA).
  • Nonsense mutations leading to premature translational termination can be eliminated by exon skipping if the exon sequences are in-frame. This can create a protein that is at least partially functional.
  • exon skipping is the dystrophin gene in Duchenne muscular dystrophy (DMD).
  • DMD Duchenne muscular dystrophy
  • a variety of different mutations leading to premature termination codons in DMD patients can be eliminated by exon skipping promoted by oligonucleotides (reviewed in Fairclough et al., Nature Rev. Gen., 2013, 14, 373-378, hereby incorporated by reference).
  • the present invention provides a method of identifying a small molecule that modulates (up or down) the splicing pattern of a target pre-mRNA to treat a disease or disorder, comprising the steps of: screening one or more encoded small molecules for binding to the target pre-mRNA; and analyzing the results by an RNA binding assay disclosed herein.
  • the pre-mRNA is a dystrophin gene transcript.
  • the small molecule promotes exon skipping to eliminate premature translational termination.
  • the disease or disorder is Duchenne muscular dystrophy (DMD).
  • RNA structures in the 5' UTR can affect translational efficiency.
  • RNA structures such as hairpins in the 5' UTR have been shown to affect translation.
  • RNA structures are believed to play a critical role in translation of mRNA. Two examples of these are internal ribosome entry sites (IRES) and upstream open reading frames (uORF) that can affect the level of translation of the main open reading frame (Komar and Hatzoglou, Frontiers Oncol.
  • the present invention provides a method of producing a small molecule that modulates the expression or translation efficiency of a target pre-mRNA or mRNA to treat a disease or disorder, comprising the steps of: screening one or more encoded small molecules for binding to the target pre-mRNA or mRNA; and analyzing the results by an RNA binding assay disclosed herein.
  • the small molecule binding site is a 5' UTR, internal ribosome entry site, or upsteam open reading frame.
  • RNA targets The largest set of RNA targets is RNA that is transcribed but not translated into protein, termed“non-coding RNA”.
  • Non-coding RNA is highly conserved and the many varieties of non coding RNA play a wide range of regulatory functions.
  • miRNA micro-RNA
  • lncRNA long non-coding RNA
  • lincRNA long intergenic non-coding RNA
  • piRNA Piwi-interacting RNA
  • ceRNA competing endogenous RNA
  • pseudo-genes Each of these sub-categories of non-coding RNA offers a large number of RNA targets with significant therapeutic potential.
  • the target RNA is a non-coding
  • miRNAs are short double-strand RNAs that regulate gene expression (see Elliott & Ladomery, Molecular Biology of RNA, 2 nd Ed., hereby incorporated by reference). Each miRNA can affect the expression of many human genes. There are nearly 2,000 miRNAs in humans. These RNAs regulate many biological processes, including cell differentiation, cell fate, motility, survival, and function. miRNA expression levels vary between different tissues, cell types, and disease settings. They are frequently aberrantly expressed in tumors versus normal tissue, and their activity may play significant roles in cancer (for reviews, see Croce, Nature Rev. Genet. 10:704-714, 2009; Dykxhoom Cancer Res. 70:6401-6406, 2010, hereby incorporated by reference).
  • miRNAs have been shown to regulate oncogenes and tumor suppressors and themselves can act as oncogenes or tumor suppressors. Some have been shown to promote epithelial-mesenchymal transition (EMT) and cancer cell invasiveness and metastasis. In the case of oncogenic miRNAs, their inhibition could be an effective anti-cancer treatment.
  • EMT epithelial-mesenchymal transition
  • the target miRNA regulates an oncogene or tumor suppressor, or acts as an oncogene or tumor suppressor.
  • the disease is cancer. In some embodiments, the cancer is a solid tumor.
  • miR-l55 plays pathological roles in inflammation, hypertension, heart failure, and cancer.
  • miR-l55 triggers oncogenic cascades and apoptosis resistance, as well as increasing cancer cell invasiveness.
  • Altered expression of miR- 155 has been described in multiple cancers, reflecting staging, progress and treatment outcomes. Cancers in which miR-l55 over- expression has been reported are breast cancer, thyroid carcinoma, colon cancer, cervical cancer, and lung cancer.
  • miR-l7 ⁇ 92 (also called Oncomir-l) is a polycistronic 1 kb primary transcript comprising miR-l7, 20a, l8a, l9a, 92-1 and l9b-l . It is activated by MYC. miR-l9 alters the gene expression and signal transduction pathways in multiple hematopoietic cells, and it triggers leukemogenesis and lymphomagenesis. It is implicated in a wide variety of human solid tumors and hematological cancers. miR-2l is an oncogenic miRNA that reduces the expression of multiple tumor suppressors.
  • the target miRNA is selected from miR-l55, miR-l7 ⁇ 92, miR-l9, miR-2l, or miR-lOb.
  • the target miRNA mediates or is implicated in a cancer selected from breast cancer, ovarian cancer, cervical cancer, thyroid carcinoma, colon cancer, liver cancer, brain cancer, esophageal cancer, prostate cancer, lung cancer, leukemia, or lymph node cancer.
  • the cancer is a solid tumor.
  • miRNAs play roles in many other diseases including cardiovascular and metabolic diseases (Quiant and Olson, J Clin. Invest. 123 : 11-18, 2013; Olson, Science Trans. Med. 6: 239ps3, 2014; Baffy, J. Clin. Med. 4: 1977-1988, 2015, hereby incorporated by reference).
  • the target miRNA is a primary transcript or pre-miRNA.
  • lncRNA are RNAs of over 200 nucleotides (nt) that do not encode proteins (see Rinn & Chang, Ann. Rev. Biochem. 2012, 87, 145-166, hereby incorporated by reference; (for reviews, see Morris and Mattick, Nature Reviews Genetics 15:423-437, 2014; Mattick and Rinn, Nature Structural & Mol. Biol. 22:5-7, 2015; Iyer et al, Nature Genetics 47(: 199-208, 2015), hereby incorporated by reference). They can affect the expression of the protein-encoding mRNAs at the level of transcription, splicing and mRNA decay.
  • lncRNA can regulate transcription by recruiting epigenetic regulators that increase or decrease transcription by altering chromatin structure (e.g., Holoch and Moazed, Nature Reviews Genetics 16:71-84, 2015, hereby incorporated by reference).
  • lncRNAs are associated with human diseases including cancer, inflammatory diseases, neurological diseases and cardiovascular disease (for instance, Presner and Chinnaiyan, Cancer Discovery 1 :391-407, 2011; Johnson, Neurobiology of Disease 46:245-254, 2012; Gutscher and Diederichs, RNA Biology 9:703-719, 2012; Kumar et al, PLOS Genetics 9:el00320l, 2013; van de Vondervoort et al, Frontiers in Molecular Neuroscience , 2013; Li et al, Int. J. Mol. Sci. 14: 18790-18808, 2013, hereby incorporated by reference).
  • lncRNA The targeting of lncRNA could be done to up-regulate or down-regulate the expression of specific genes and proteins for therapeutic benefit (e.g., Wahlestedt, Nature Reviews Drug Discovery 12:433-446, 2013; Guil and Esteller, Nature Structural & Mol. Biol. 19: 1068-1075, 2012, hereby incorporated by reference).
  • lncRNA are expressed at a lower level relative to mRNAs. Many lncRNAs are physically associated with chromatin (Werner et al, Cell Reports 12, 1-10, 2015, hereby incorporated by reference) and are transcribed in close proximity to protein-encoding genes.
  • the target non-coding RNA is a lncRNA.
  • the lncRNA is associated with a cancer, inflammatory disease, neurological disease, or cardiovascular disease.
  • lncRNAs regulate the expression of protein-encoding genes, acting at multiple different levels to affect transcription, alternative splicing and mRNA decay.
  • lncRNA has been shown to bind to the epigenetic regulator PRC2 to promote its recruitment to genes whose transcription is then repressed via chromatin modification.
  • lncRNA may form complex structures that mediate their association with various regulatory proteins. A small molecule that binds to these lncRNA structures could be used to modulate the expression of genes that are normally regulated by an individual lncRNA.
  • HOTAIR a lncRNA expressed from the HoxC locus on human chromosome 12. Its expression level is low (-100 RNA copies per cell). Unlike many lncRNAs, HOTAIR can act in trans to affect the expression of distant genes. It binds the epigenetic repressor PRC2 as well as the LSDl/CoREST/REST complex, another repressive epigenetic regulator (Tsai et al, Science 329, 689-693, 2010, hereby incorporated by reference). HOTAIR is a highly structured RNA with over 50% of its nucleotides being involved in base pairing.
  • HOTAIR has been reported to be involved in the control of apoptosis, proliferation, metastasis, angiogenesis, DNA repair, chemoresi stance and tumor cell metabolism. It is highly expressed in metastatic breast cancers. High levels of expression in primary breast tumors are a significant predictor of subsequent metastasis and death.
  • HOTAIR also has been reported to be associated with esophageal squamous cell carcinoma, and it is a prognostic factor in colorectal cancer, cervical cancer, gastric cancer and endometrial carcinoma. Therefore, HOTAIR-binding small molecules are novel anti-cancer drug candidates. Accordingly, in some embodiments of the methods described above, the target non-coding RNA is HOTAIR.
  • the disease or disorder is breast cancer, esophageal squamous cell carcinoma, colorectal cancer, cervical cancer, gastric cancer, or endometrial carcinoma.
  • MALAT-l metastasis-associated lung adenocarcinoma transcript 1
  • NEAT2 nuclear-enriched abundant transcript 2
  • MALAT-l is a predictive marker for metastasis development in multiple cancers including lung cancer.
  • MALAT-l knockout mice have no phenotype, indicating that it has limited normal function. However, MALAT-l -deficient cancer cells are impaired in migration and form fewer tumors in a mouse xenograft tumor models. Antisense oligonucleotides (ASO) blocking MALAT-l prevent metastasis formation after tumor implantation in mice. Some mouse xenograft tumor model data indicates that MALAT-l knockdown by ASOs may inhibit both primary tumor growth and metastasis. Thus, a small molecule targeting MALAT-l is exptected to be effective in inhibiting tumor growth and metastasis. Accordingly, in some embodiments of the methods described above, the target non coding RNA is MALAT-l or a fragment thereof. In some embodiments, the disease or disorder is a cancer in which MALAT-l is upregulated, such as lung cancer.
  • Triplet repeats are abundant in the human genome, and they tend to undergo expansion over generations. Approximately 40 human diseases are associated with the expansion of repeat sequences. Diseases caused by triplet expansions are known as Triplet Repeat Expansion Diseases (TRED).
  • TRED Triplet Repeat Expansion Diseases
  • Healthy individuals have a variable number of triplet repeats, but there is a threshold beyond which a higher repeat number causes disease.
  • the threshold varies in different disorders.
  • the triplet repeat can be unstable. As the gene is inherited, the number of repeats may increase, and the condition may be more severe or have an earlier onset from generation to generation. When an individual has a number of repeats in the normal range, it is not expected to expand when passed to the next generation. When the repeat number is in the premutation range (a normal, but unstable repeat number), then the repeats may or may not expand upon transmission to the next generation. Normal individuals who carry a premutation do not have the condition, but are at risk of having a child who has inherited a triplet repeat in the full mutation range and who will be affected.
  • TREDs can be autosomal dominant, autosomal recessive or X- linked. The more common triplet repeat disorders are autosomal dominant.
  • the repeats can be in the coding or noncoding portions of the mRNA. In the case of repeats within noncoding regions, the repeats may lie in the 5' UTR, introns, or 3' UTR sequences. Some examples of diseases caused by repeat sequences within coding regions are shown in Table 1
  • the target RNA is one of those listed in Table 1, or a fragment or analog thereof.
  • the target RNA is one of those listed in Table 2, or a fragment or analog thereof.
  • the toxicity that results from the repeat sequence can be direct consequence of the action of the toxic RNA itself, or, in cases in which the repeat expansion is in the coding sequence, due to the toxicity of the RNA and/or the aberrant protein.
  • the repeat expansion RNA can act by sequestering critical RNA-binding proteins (RBP) into foci.
  • RBP critical RNA-binding proteins
  • One example of a sequestered RBP is the Muscleblind family protein MBNL1. Sequestration of RBPs leads to defects in splicing as well as defects in nuclear-cytoplasmic transport of RNA and proteins. Sequestration of RBPs also can affect miRNA biogenesis. These perturbations in RNA biology can profoundly affect neuronal function and survival, leading to a variety of neurological diseases.
  • RNA RNA form secondary and tertiary structures that bind RBPs and affect normal RNA biology.
  • myotonic dystrophy DM1; dystrophia myotonica
  • DMPK dystrophia myotonica protein kinase
  • This repeat-containing RNA causes the misregulation of alternative splicing of several developmentally regulated transcripts through effects on the splicing regulators MBNL1 and the CUG repeat binding protein (CELF1) (Wheeler et al, Science 325:336-339, 2009, hereby incorporated by reference).
  • Small molecules that bind the CUG repeat within the DMPK transcript would alter the RNA structure and prevent focus formation and alleviate the effects on these spicing regulators.
  • Fragile X Syndrome FXS
  • FXS Fragile X Syndrome
  • FMRP is critical for the regulation of translation of many mRNAs and for protein trafficking, and it is an essential protein for synaptic development and neural plasticity. Thus, its deficiency leads to neuropathology.
  • a small molecule targeting this CGG repeat RNA may alleviate the suppression of FMR1 mRNA and FMRP protein expression.
  • Another TRED having a very high unmet medical need is Huntington’s disease (HD).
  • HD is a progressive neurological disorder with motor, cognitive, and psychiatric changes (Zuccato et al, Physiol Rev. 90:905-981, 2010, hereby incorporated by reference).
  • the HTT CAG repeat RNA itself also demonstrates toxicity, including the sequestration of MBNL1 protein into nuclear inclusions.
  • GGGGCC repeat expansion in the C9orf72 chromosome 9 open reading frame 72 gene that is prevalent in both familial frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS)
  • FTD familial frontotemporal dementia
  • ALS amyotrophic lateral sclerosis
  • the repeat RNA structures form nuclear foci that sequester critical RNA binding proteins.
  • the GGGGCC repeat RNA also binds and sequesters RanGAPl to impair nucleocytoplasmic transport of RNA and proteins (Zhang et al, Nature 525:56-61, 2015, hereby incorporated by reference). Selectively targeting any of these repeat expansion RNAs could add therapeutic benefit in these neurological diseases.
  • the present invention contemplates a method of treating a disease or disorder wherein aberrant RNAs themselves cause pathogenic effects, rather than acting through the agency of protein expression or regulation of protein expression.
  • the target RNA is a repeat RNA, such as those described herein or in Tables 1 or 2.
  • the repeat RNA mediates or is implicated in a repeat expansion disease in which the repeat resides in the coding regions of mRNA.
  • the disease or disorder is a repeat expansion disease in which the repeat resides in the noncoding regions of mRNA.
  • the disease or disorder is selected from Huntington’s disease (HD), dentatorubral-pallidoluysian atrophy (DRPLA), spinal-bulbar muscular atrophy (SBMA), or a spinocerebellar ataxia (SCA) selected from SCA1, SCA2, SCA3, SCA6, SCA7, or SCA17.
  • the disease or disorder is selected from Fragile X Syndrome, myotonic dystrophy (DM1 or dystrophia myotonica), Friedreich’s Ataxia (FRDA), a spinocerebellar ataxia (SCA) selected from SCA8, SCA10, or SCA12, or C9FTD (amyotrophic lateral sclerosis or ALS).
  • the disease is amyotrophic lateral sclerosis (ALS), Huntington’s disease (HD), frontotemporal dementia (FTD), myotonic dystrophy (DM1 or dystrophia myotonica), or Fragile X Syndrome.
  • ALS amyotrophic lateral sclerosis
  • HD Huntington’s disease
  • FTD frontotemporal dementia
  • DM1 or dystrophia myotonica myotonic dystrophy
  • Fragile X Syndrome Fragile X Syndrome.
  • RNA binding assay disclosed herein.
  • the repeat expansion RNA causes a disease or disorder selected from HD, DRPLA, SBMA, SCA1, SCA2, SCA3, SCA6, SCA7, or SCA17.
  • the disease or disorder is selected from Fragile X Syndrome, DM1, FRDA, SCA8, SCA10, SCA12, or C9FTD.
  • the target RNA is selected from those in Table 3.
  • the target RNA mediates or is implicated in a disease or disorder selected from one of those in Table 3.
  • Encoded compounds of the present invention are capable of binding to an active site or allosteric site(s) and/or the tertiary structure of a nucleic acid target, such as a target RNA.
  • Libraries of compounds which may be produced as described herein or using other methods known in the art, are similarly useful in drug discovery, probing RNA structure, and discovering new RNA targets for treatment of disease. In some embodiments, such libraries are used to generate lead compound structures for further optimization. In some embodiments, hit compounds from a first compound library are used to generate further libraries.
  • encoded compounds are synthesized in a combinatorial manner using randomly or semi-randomly selected building blocks as starting points.
  • building blocks may be selected according to principles of combinatorial library construction as are known in the art.
  • a building block is selected because it is a known binder to nucleic acids, or a fragment of a known binder.
  • Exemplary known nucleic acid binders include those described herein.
  • the small molecule is selected from a compound known to bind to RNA, such as a heteroaryldihydropyrimidine (HAP), a macrolide (e.g., erythromycin, azithromycin), alkaloid (e.g., berberine, palmatine), aminoglycoside (e.g., paromomycin, neomycin B, kanamycin A), tetracycline (e.g., doxycycline, oxytetracycline), a theophylline, ribocil, clindamycin, chloramphenicol, LMI070, a triptycene-based scaffold, an oxazolidinone (e.g., linezolid, tedizolid), or CPNQ.
  • HAP heteroaryldihydropyrimidine
  • macrolide e.g., erythromycin, azithromycin
  • alkaloid e.g., berberine, palmatine
  • the small molecule is ribocil, which has the following structure:
  • Ribocil is a a drug-like ligand that binds to the FMN riboswitch (PDB 5KX9) and inhibits riboswitch function ( Nature 2015, 526 , 672-677 , hereby incorporated by reference).
  • the small-molecule ligand is an oxazolidinone such as linezolid, tedizolid, eperezolid, or PNU 176798.
  • CPNQ has the following structure:
  • the small molecule is selected from CPNQ or a pharmaceutically acceptable salt thereof.
  • the small molecule is selected from a quinoline compound related to CPNQ; or a pharmaceutically acceptable salt thereof.
  • Organic dyes, amino acids, biological cofactors, metal complexes, and peptides also show RNA binding ability.
  • the terms“small molecule that binds a target,”“small molecule RNA binder,”“affinity moiety,”“ligand,”“small-molecule RNA ligand,” or“ligand moiety,” as used herein, include all compounds generally classified as small molecules that are capable of binding to a nucleic acid target such as an RNA with sufficient affinity and specificity for use in a disclosed method.
  • Small molecules that bind a nucleic acid for use in the present invention may bind to one or more secondary or tertiary structure elements of a nucleic acid target. These sites include triplexes, hairpins, bulge loops, pseudoknots, internal loops, junctions, G-quadruplexes, and other higher- order structural motifs described or referred to herein.
  • the small molecule is selected from a heteroaryldihydropyrimidine (HAP), a macrolide, alkaloid, aminoglycoside, a member of the tetracycline family, an oxazolidinone, a SMN2 pre-mRNA ligand such as LMI070 (NVS-SM1), ribocil or an analog thereof, clindamycin, chloramphenicol, an anthracene, a triptycene, theophylline or an analog thereof, or CPNQ or an analog thereof.
  • HAP heteroaryldihydropyrimidine
  • macrolide alkaloid
  • aminoglycoside a member of the tetracycline family
  • an oxazolidinone such as LMI070 (NVS-SM1)
  • ribocil or an analog thereof clindamycin
  • chloramphenicol an anthracene
  • triptycene theophylline or an analog thereof
  • the small molecule is selected from paromomycin, a neomycin (such as neomycin B), a kanamycin (such as kanamycin A), linezolid, tedizolid, pleuromutilin, ribocil, anthracene, triptycene, or CPNQ or an analog thereof; wherein each small molecule may be optionally substituted with one or more “optional substituents” as defined below, such as 1, 2, 3, or 4, for example 1 or 2, optional substituents.
  • the compound or DNA-encoded library member thereof binds to a junction, stem-loop, or bulge in a target RNA. In some embodiments, the compound or DNA- encoded library member thereof binds to a nucleic acid three-way junction (3WJ) or four-way junction (4WJ). In some embodiments, the 3WJ is a trans 3WJ that is capable of forming between a miRNA and mRNA in vivo or in vitro.
  • aliphatic or“aliphatic group,” as used herein, means a straight-chain (i.e., unbranched) or branched, substituted or unsubstituted hydrocarbon chain that is completely saturated or that contains one or more units of unsaturation, or a monocyclic hydrocarbon or bicyclic hydrocarbon that is completely saturated or that contains one or more units of unsaturation, but which is not aromatic (also referred to herein as“carbocycle,”“cycloaliphatic” or“cycloalkyl”), that has a single point of attachment to the rest of the molecule.
  • aliphatic groups contain 1-6 aliphatic carbon atoms.
  • aliphatic groups contain 1-5 aliphatic carbon atoms. In other embodiments, aliphatic groups contain 1-4 aliphatic carbon atoms. In still other embodiments, aliphatic groups contain 1-3 aliphatic carbon atoms, and in yet other embodiments, aliphatic groups contain 1-2 aliphatic carbon atoms.
  • “cycloaliphatic” (or“carbocycle” or“cycloalkyl”) refers to a monocyclic C 3 -C 6 hydrocarbon that is completely saturated or that contains one or more units of unsaturation, but which is not aromatic, that has a single point of attachment to the rest of the molecule.
  • Suitable aliphatic groups include, but are not limited to, linear or branched, substituted or unsubstituted alkyl, alkenyl, alkynyl groups and hybrids thereof such as (cycloalkyl)alkyl, (cycloalkenyl)alkyl or (cycloalkyl)alkenyl.
  • bridged bicyclic refers to any bicyclic ring system, i.e. carbocyclic or heterocyclic, saturated or partially unsaturated, having at least one bridge.
  • a“bridge” is an unbranched chain of atoms or an atom or a valence bond connecting two bridgeheads, where a“bridgehead” is any skeletal atom of the ring system which is bonded to three or more skeletal atoms (excluding hydrogen).
  • a bridged bicyclic group has 7-12 ring members and 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
  • bridged bicyclic groups are well known in the art and include those groups set forth below where each group is attached to the rest of the molecule at any substitutable carbon or nitrogen atom. Unless otherwise specified, a bridged bicyclic group is optionally substituted with one or more substituents as set forth for aliphatic groups. Additionally or alternatively, any substitutable nitrogen of a bridged bicyclic group is optionally substituted. Exemplary bridged bicyclics include:
  • lower alkyl refers to a Ci- 4 straight or branched alkyl group.
  • exemplary lower alkyl groups are methyl, ethyl, propyl, isopropyl, butyl, isobutyl, and tert-butyl.
  • lower haloalkyl refers to a Ci- 4 straight or branched alkyl group that is substituted with one or more halogen atoms.
  • heteroatom means one or more of oxygen, sulfur, nitrogen, phosphorus, or silicon (including, any oxidized form of nitrogen, sulfur, phosphorus, or silicon; the quaternized form of any basic nitrogen or; a substitutable nitrogen of a heterocyclic ring, for example N (as in 3 ,4-di hydro-2//-pyrrol yl ), NH (as in pyrrolidinyl) or NR + (as in N-substituted pyrrolidinyl)).
  • the term“unsaturated,” as used herein, means that a moiety has one or more units of unsaturation.
  • Ci -8 (or Ci -6 ) saturated or unsaturated, straight or branched, hydrocarbon chain,” refers to bivalent alkylene, alkenylene, and alkynylene chains that are straight or branched as defined herein.
  • alkylene refers to a bivalent alkyl group.
  • An“alkylene chain” is a polymethylene group, i.e., -(CH 2 ) n- , wherein n is a positive integer, preferably from 1 to 6, from 1 to 4, from 1 to 3, from 1 to 2, or from 2 to 3.
  • a substituted alkylene chain is a polymethylene group in which one or more methylene hydrogen atoms are replaced with a substituent. Suitable substituents include those described below for a substituted aliphatic group.
  • alkenylene refers to a bivalent alkenyl group.
  • a substituted alkenylene chain is a polymethylene group containing at least one double bond in which one or more hydrogen atoms are replaced with a substituent. Suitable substituents include those described below for a substituted aliphatic group.
  • halogen means F, Cl, Br, or I.
  • aryl used alone or as part of a larger moiety as in“aralkyl,”“aralkoxy,” or “aryloxyalkyl,” refers to monocyclic or bicyclic ring systems having a total of 6 to 14 ring members, wherein at least one ring in the system is aromatic and wherein each ring in the system contains 3 to 7 ring members.
  • the term“aryl” may be used interchangeably with the term“aryl ring.”
  • “aryl” refers to an aromatic ring system which includes, but not limited to, phenyl, biphenyl, naphthyl, anthracyl and the like, which may bear one or more substituents.
  • aryl is a group in which an aromatic ring is fused to one or more non-aromatic rings, such as indanyl, phthalimidyl, naphthimidyl, phenanthridinyl, or tetrahydronaphthyl, and the like.
  • heteroaryl and“heteroar-,” used alone or as part of a larger moiety e.g., “heteroaralkyl,” or“heteroaralkoxy,” refer to groups having 5 to 10 ring atoms, preferably 5, 6, or 9 ring atoms; having 6, 10, or 14 p electrons shared in a cyclic array; and having, in addition to carbon atoms, from one to five heteroatoms.
  • heteroatom refers to nitrogen, oxygen, or sulfur, and includes any oxidized form of nitrogen or sulfur, and any quatemized form of a basic nitrogen.
  • Heteroaryl groups include, without limitation, thienyl, furanyl, pyrrolyl, imidazolyl, pyrazolyl, triazolyl, tetrazolyl, oxazolyl, isoxazolyl, oxadiazolyl, thiazolyl, isothiazolyl, thiadiazolyl, pyridyl, pyridazinyl, pyrimidinyl, pyrazinyl, indolizinyl, purinyl, naphthyridinyl, and pteridinyl.
  • heteroaryl and“heteroar-”, as used herein, also include groups in which a heteroaromatic ring is fused to one or more aryl, cycloaliphatic, or heterocyclyl rings, where the radical or point of attachment is on the heteroaromatic ring.
  • Nonlimiting examples include indolyl, isoindolyl, benzothienyl, benzofuranyl, dibenzofuranyl, indazolyl, benzimidazolyl, benzthiazolyl, quinolyl, isoquinolyl, cinnolinyl, phthalazinyl, quinazolinyl, quinoxalinyl, AH quinolizinyl, carbazolyl, acridinyl, phenazinyl, phenothiazinyl, phenoxazinyl, tetrahydroquinolinyl, tetrahydroisoquinolinyl, and pyri do[2,3 -/?]- 1 ,4-oxazin-3(4//)-one.
  • a heteroaryl group may be mono- or bicyclic.
  • the term“heteroaryl” may be used interchangeably with the terms“heteroaryl ring,”“heteroaryl group,” or“heteroaromatic,” any of which terms include rings that are optionally substituted.
  • the term“heteroaralkyl” refers to an alkyl group substituted with a heteroaryl, wherein the alkyl and heteroaryl portions independently are optionally substituted.
  • heterocycle As used herein, the terms“heterocycle,”“heterocyclyl,”“heterocyclic radical,” and “heterocyclic ring” are used interchangeably and refer to a stable 5- to 7-membered monocyclic or 7-lO-membered bicyclic heterocyclic moiety that is either saturated or partially unsaturated, and having, in addition to carbon atoms, one or more, preferably one to four, heteroatoms, as defined above.
  • nitrogen includes a substituted nitrogen.
  • the nitrogen may be N (as in 3,4- di hydro-2// pyrrol yl), NH (as in pyrrolidinyl), or + NR (as in N substituted pyrrolidinyl).
  • a heterocyclic ring can be attached to its pendant group at any heteroatom or carbon atom that results in a stable structure and any of the ring atoms can be optionally substituted.
  • saturated or partially unsaturated heterocyclic radicals include, without limitation, tetrahydrofuranyl, tetrahydrothiophenyl, pyrrolidinyl, piperidinyl, pyrrolinyl, tetrahydroquinolinyl, tetrahydroisoquinolinyl, decahydroquinolinyl, oxazolidinyl, piperazinyl, dioxanyl, dioxolanyl, diazepinyl, oxazepinyl, thiazepinyl, morpholinyl, and quinuclidinyl.
  • heterocycle refers to an alkyl group substituted with a heterocyclyl, wherein the alkyl and heterocyclyl portions independently are optionally substituted.
  • the term“partially unsaturated” refers to a ring that includes at least one double or triple bond.
  • the term“partially unsaturated” is intended to encompass rings having multiple sites of unsaturation, but is not intended to include aryl or heteroaryl moieties, as herein defined.
  • compounds of the invention may contain“optionally substituted” moieties.
  • the term“substituted,” whether preceded by the term“optionally” or not, means that one or more hydrogens of the designated moiety are replaced with a suitable substituent.
  • an“optionally substituted” group may have a suitable substituent (“optional substituent”) at each substitutable position of the group, and when more than one position in any given structure may be substituted with more than one substituent selected from a specified group, the substituent may be either the same or different at every position.
  • Combinations of substituents envisioned by this invention are preferably those that result in the formation of stable or chemically feasible compounds.
  • the term“stable,” as used herein, refers to compounds that are not substantially altered when subjected to conditions to allow for their production, detection, and, in certain embodiments, their recovery, purification, and use for one or more of the purposes disclosed herein.
  • Suitable monovalent substituents on R° are independently halogen, -(CH 2 )o- 2 R*, -(haloR*), -(CH 2 ) O-2 OH, -(CH 2 ) O-2 OR*, -(CH 2 ) O-2 CH(OR*) 2 ; -0(haloR*), -CN, -N 3 , -(CH 2 ) 0- 2 C(0)R ⁇ , -(CH 2 ) O-2 C(0)OH, -(CH 2 ) O-2 C(0)OR ⁇ , -(CH 2 ) O-2 SR*, -(CH 2 ) O-2 SH, -(CH 2 ) O-2 NH 2 , - (CH 2 ) O-2 NHR ⁇ , -(CH 2 ) O-2 NR* 2 , -N0 2 , -SiR* 3 , -OSi
  • Suitable divalent substituents that are bound to vicinal substitutable carbons of an“optionally substituted” group include: -0(CR * 2 ) 2- 3 0— , wherein each independent occurrence of R * is selected from hydrogen, Ci- 6 aliphatic which may be substituted as defined below, or an unsubstituted 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
  • Suitable substituents on the aliphatic group of R * include halogen, -R*, -(haloR*), -OH, -OR*, -0(haloR*), -CN, -C(0)0H, -C(0)0R*, -NH 2 , NHR*, -NR* 2 , or -N0 2 , wherein each R* is unsubstituted or where preceded by“halo” is substituted only with one or more halogens, and is independently Ci- 4 aliphatic, -CH 2 Ph, -0(CH 2 )o-iPh, or a 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
  • Suitable substituents on a substitutable nitrogen of an“optionally substituted” group include -R ⁇ , -NR ⁇ 2 , -C(0)R ⁇ , -C(0)OR ⁇ , -C(0)C(0)R ⁇ ,
  • each R' is independently hydrogen, Ci- 6 aliphatic which may be substituted as defined below, unsubstituted -OPh, or an unsubstituted 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur, or, notwithstanding the definition above, two independent occurrences of R ⁇ , taken together with their intervening atom(s) form an unsubstituted 3-l2-membered saturated, partially unsaturated, or aryl mono- or bicyclic ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
  • Suitable substituents on the aliphatic group of R' are independently halogen, - R*, -(haloR*), -OH, -OR*, -0(haloR*), -CN, -C(0)OH, -C(0)OR*, -NH 2 , -NHR*, -NR* 2 , or -N0 2 , wherein each R* is unsubstituted or where preceded by“halo” is substituted only with one or more halogens, and is independently Ci- 4 aliphatic, -CH 2 Ph, -0(CH 2 )o-iPh, or a 5-6- membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
  • the term“pharmaceutically acceptable salt” refers to those salts which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of humans and lower animals without undue toxicity, irritation, allergic response and the like, and are commensurate with a reasonable benefit/risk ratio.
  • Pharmaceutically acceptable salts are well known in the art. For example, S. M. Berge et ah, describe pharmaceutically acceptable salts in detail in J. Pharmaceutical Sciences, 1977, 66, 1-19, incorporated herein by reference.
  • Pharmaceutically acceptable salts of the compounds of this invention include those derived from suitable inorganic and organic acids and bases.
  • Examples of pharmaceutically acceptable, nontoxic acid addition salts are salts of an amino group formed with inorganic acids such as hydrochloric acid, hydrobromic acid, phosphoric acid, sulfuric acid and perchloric acid or with organic acids such as acetic acid, oxalic acid, maleic acid, tartaric acid, citric acid, succinic acid or malonic acid or by using other methods used in the art such as ion exchange.
  • inorganic acids such as hydrochloric acid, hydrobromic acid, phosphoric acid, sulfuric acid and perchloric acid
  • organic acids such as acetic acid, oxalic acid, maleic acid, tartaric acid, citric acid, succinic acid or malonic acid or by using other methods used in the art such as ion exchange.
  • salts include adipate, alginate, ascorbate, aspartate, benzenesulfonate, benzoate, bisulfate, borate, butyrate, camphorate, camphor sulfonate, citrate, cyclopentanepropionate, digluconate, dodecyl sulfate, ethanesulfonate, formate, fumarate, glucoheptonate, glycerophosphate, gluconate, hemisulfate, heptanoate, hexanoate, hydroiodide, 2- hydroxy-ethanesulfonate, lactobionate, lactate, laurate, lauryl sulfate, malate, maleate, malonate, methanesulfonate, 2-naphthalenesulfonate, nicotinate, nitrate, oleate, oxalate, palmitate, pamoate, pec
  • Salts derived from appropriate bases include alkali metal, alkaline earth metal, ammonium and N + (Ci ⁇ alkyl) 4 salts.
  • Representative alkali or alkaline earth metal salts include sodium, lithium, potassium, calcium, magnesium, and the like.
  • Further pharmaceutically acceptable salts include, when appropriate, nontoxic ammonium, quaternary ammonium, and amine cations formed using counterions such as halide, hydroxide, carboxylate, sulfate, phosphate, nitrate, loweralkyl sulfonate and aryl sulfonate.
  • structures depicted herein are also meant to include all isomeric (e.g., enantiomeric, diastereomeric, and geometric (or conformational)) forms of the structure; for example, the R and S configurations for each asymmetric center, Z and E double bond isomers, and Z and E conformational isomers. Therefore, single stereochemical isomers as well as enantiomeric, diastereomeric, and geometric (or conformational) mixtures of the present compounds are within the scope of the invention. Unless otherwise stated, all tautomeric forms of the compounds of the invention are within the scope of the invention.
  • structures depicted herein are also meant to include compounds that differ only in the presence of one or more isotopically enriched atoms.
  • compounds having the present structures including the replacement of hydrogen by deuterium or tritium, or the replacement of a carbon by a 13 C- or 14 C-enriched carbon are within the scope of this invention.
  • Such compounds are useful, for example, as analytical tools, as probes in biological assays, or as therapeutic agents in accordance with the present invention.
  • Tethering Groups (Linkers) [00417]
  • the present invention contemplates the use of a wide variety of tethering groups (“tethers” or“linkers”) for covalent conjugation of the small molecule compound or building block to the DNA barcode.
  • tethers Either a cleavable or non-cleavable tether is used depending on the context.
  • the tether is a non-cleavable group such as a polyethylene glycol (PEG) group of, e.g., 1-10 ethylene glycol subunits.
  • the tether is a non-cleavable group such as an optionally substituted C1-12 aliphatic group or a peptide comprising 1-8 amino acids.
  • the tether comprises a click reaction product resulting from the so- called“click” reaction between two click-ready groups.
  • a small molecule or building block is conjugated to a DNA barcode by a single covalent linker.
  • a small molecule assembled from two or more building blocks will be attached to the resulting DNA barcode by two or more linkers (i.e., the number of linkers will be the same as the number of building blocks used to prepare the small molecule).
  • all of the linkers but one is cleavable, so that the small molecule is attached to the DNA barcode by a single, non-cleavable linker.
  • the remaining, cleavable linkers are cleaved before the small molecule is screened against a target.
  • the linker is selected so as to avoid interfering with amplification (such as RT-PCR) of screening hits, for example by selecting a sufficiently long linker or by conjugating it to a position on the DNA barcode to avoid interference.
  • bioorthogonal reaction partners may be used in the present invention to tether a chemical building block or compound to its DNA barcode.
  • a“bioorthogonal reaction partner” is a chemical group capable of undergoing a bioorthogonal reaction with an appropriate reaction partner to couple a compound described herein to its DNA barcode.
  • the bioorthogonal reaction partner is selected from a click-ready group or a group capable of undergoing a nitrone/cyclooctyne reaction, oxime/hydrazone formation, a tetrazine ligation, an isocyanide-based click reaction, or a quadricyclane ligation.
  • the bioorthogonal reaction partner is a click-ready group.
  • click-ready group refers to a chemical moiety capable of undergoing a click reaction, such as an azide or alkyne.
  • Click reactions tend to involve high-energy (“spring-loaded”) reagents with well- defined reaction coordinates, that give rise to selective bond-forming events of wide scope.
  • Examples include nucleophilic trapping of strained-ring electrophiles (epoxide, aziridines, aziridinium ions, episulfonium ions), certain carbonyl reactivity (e.g., the reaction between aldehydes and hydrazines or hydroxylamines), and several cycloaddition reactions.
  • the azide- alkyne l,3-dipolar cycloaddition and the Diels- Alder cycloaddition are two such reactions.
  • Such click reactions i.e., dipolar cycloadditions
  • a copper catalyst is routinely employed in click reactions.
  • the presence of copper can be detrimental (See Wolbers, F. et al.; Electrophoresis 2006, 27, 5073, hereby incorporated by reference).
  • methods of performing dipolar cycloaddition reactions were developed without the use of metal catalysis.
  • Such“metal free” click reactions utilize activated moieties in order to facilitate cycloaddition. Therefore, the present invention provides click-ready groups suitable for metal- free click chemistry.
  • Certain metal-free click moieties are known in the literature. Examples include 4- dibenzocyclooctynol (DIBO) (from Ning et al; Angew Chem Int Ed, 2008, 47, 2253); gem- difluorinated cyclooctynes (DIFO or DFO) (from Codelli, et al.; J Am. Chem. Soc. 2008, 130, 11486-11493.); biarylazacyclooctynone (BARAC) (from Jewett et al.; J. Am. Chem. Soc. 2010, 132, 3688.); or bicyclononyne (BCN) (From Dommerholt, et al.; Angew Chem Int Ed, 2010, 49, 9422-9425); each of which is hereby incorporated by reference.
  • DIBO 4- dibenzocyclooctynol
  • DIFO or DFO from Codelli, et al.; J Am. Chem. So
  • the phrase“a moiety suitable for metal-free click chemistry” refers to a functional group capable of dipolar cycloaddition without use of a metal catalyst.
  • moieties include an activated alkyne (such as a strained cyclooctyne), an oxime (such as a nitrile oxide precursor), or oxanorbornadiene, for coupling to an azide to form a cycloaddition product (e.g., triazole or isoxazole).
  • the click-ready group is selected from an azide, an alkyne, 4-dibenzocyclooctynol (DIBO) gem- difluorinated cyclooctynes (DIFO or DFO), biarylazacyclooctynone (BARAC), bicyclononyne (BCN), a strained cyclooctyne, an oxime, or oxanorb ornadi ene .
  • DIBO 4-dibenzocyclooctynol
  • DIFO or DFO difluorinated cyclooctynes
  • BARAC biarylazacyclooctynone
  • BCN bicyclononyne
  • a strained cyclooctyne an oxime
  • oxanorb ornadi ene a strained cyclooctyne
  • the compounds of this invention may be prepared or isolated in general by synthetic and/or semi-synthetic methods known to those skilled in the art for analogous compounds and by methods described in detail in the Examples and Figures, herein.
  • LG includes, but is not limited to, halogens (e.g. fluoride, chloride, bromide, iodide), sulfonates (e.g. mesylate, tosylate, benzenesulfonate, brosylate, nosylate, triflate), diazonium, and the like.
  • halogens e.g. fluoride, chloride, bromide, iodide
  • sulfonates e.g. mesylate, tosylate, benzenesulfonate, brosylate, nosylate, triflate
  • diazonium and the like.
  • the phrase“oxygen protecting group” includes, for example, carbonyl protecting groups, hydroxyl protecting groups, etc. Hydroxyl protecting groups are well known in the art and include those described in detail in Protecting Groups in Organic Synthesis , T. W. Greene and P. G. M. Wuts, 3 rd edition, John Wiley & Sons, 1999, the entirety of which is incorporated herein by reference. Examples of suitable hydroxyl protecting groups include, but are not limited to, esters, allyl ethers, ethers, silyl ethers, alkyl ethers, arylalkyl ethers, and alkoxyalkyl ethers.
  • esters include formates, acetates, carbonates, and sulfonates.
  • Specific examples include formate, benzoyl formate, chloroacetate, trifluoroacetate, methoxyacetate, triphenylmethoxyacetate, p-chlorophenoxyacetate, 3-phenylpropionate, 4- oxopentanoate, 4,4-(ethylenedithio)pentanoate, pivaloate (trimethylacetyl), crotonate, 4-methoxy- crotonate, benzoate, p-benzylbenzoate, 2,4,6-trimethylbenzoate, carbonates such as methyl, 9- fluorenylmethyl, ethyl, 2,2,2-trichloroethyl, 2-(trimethylsilyl)ethyl, 2-(phenylsulfonyl)ethyl, vinyl, allyl, and p-nitrobenzyl.
  • silyl ethers examples include trimethylsilyl, triethylsilyl, t-butyldimethylsilyl, t-butyldiphenylsilyl, triisopropylsilyl, and other trialkylsilyl ethers.
  • Alkyl ethers include methyl, benzyl, p-methoxybenzyl, 3,4-dimethoxybenzyl, trityl, t-butyl, allyl, and allyloxycarbonyl ethers or derivatives.
  • Alkoxyalkyl ethers include acetals such as methoxymethyl, methylthiomethyl, (2-methoxyethoxy)methyl, benzyloxymethyl, beta-
  • arylalkyl ethers include benzyl, p-methoxybenzyl (MPM), 3,4-dimethoxybenzyl, O-nitrobenzyl, p-nitrobenzyl, p-halobenzyl, 2,6-dichlorobenzyl, p-cyanobenzyl, and 2- and 4-picolyl.
  • Amino protecting groups are well known in the art and include those described in detail in Protecting Groups in Organic Synthesis , T. W. Greene and P. G. M. Wuts, 3 rd edition, John Wiley & Sons, 1999, the entirety of which is incorporated herein by reference.
  • Suitable amino protecting groups include, but are not limited to, aralkylamines, carbamates, cyclic imides, allyl amines, amides, and the like.
  • Examples of such groups include t-butyloxycarbonyl (BOC), ethyloxycarbonyl, methyloxycarbonyl, trichloroethyloxycarbonyl, allyloxycarbonyl (Alloc), benzyloxocarbonyl (CBZ), allyl, phthalimide, benzyl (Bn), fluorenylmethylcarbonyl (Fmoc), formyl, acetyl, chloroacetyl, dichloroacetyl, trichloroacetyl, phenylacetyl, trifluoroacetyl, benzoyl, and the like.
  • RNA species to ligate directly to dsDNA with a short overhang of, e.g., 2 bp, thus enabling RT-PCR of the whole product.
  • the dsDNA mimics the DNA barcode on DNA-encoded libraries such as those used by Vipergen, and the RNA mimics a target RNA that is being screened for binding to a small- molecule ligand.
  • Direct ligation of the RNA to the DNA followed by RT-PCR would create a DNA molecule containing both the sequence encoding the small-molecule ligand in the DNA- encoded library and the target RNA to which it bound, enabling convenient multiplexed screening.
  • Example 5 described below demonstrates feasibility of this approach using T4 DNA ligase and a helper oligo.
  • the“splint” and“ligation partner” form a DNA duplex that is meant to mimic the DNA tag encoding a compound structure in a DNA-encoded library (DEL).
  • the splint oligo forms a 3 '-overhang that is designed to pair with the RNA.
  • the overhang is either 2 bp (DirLig_Splint2bp-l) or 5 bp (DirLig_Splint5bp-l).
  • the 2 bp overhang is designed to mimic the DNA tags in the Vipergen library design.
  • the 5 bp overhang is designed as a positive control, since RNA ligases are reported to ligate across 5 bp splints.
  • the control splints (DirLig_CtlSplint2bp-l and DirLig_CtlSplint5bp-l) have overhang sequences that do not match the RNA.
  • the RNA has a 3 '-end that is complementary to the splints.
  • the 5 '-end of the RNA is tagged with a Cy5 fluorescent dye for easy imaging by gel electrophoresis.
  • the RNA has a binding site for a helper oligo next to the region that binds the splint.
  • the helper oligo is designed to enhance stacking interactions with the splint. If the helper oligo is phosphorylated (DirLig_pHelp2bp- 1 and DirLig_pHelp5bp- 1 ) then it can be ligated to the splint to extend the splint and increase the affinity to the RNA.
  • the splint (or splint + helper) acts as a primer for reverse transcription to copy the RNA sequence.
  • the splint acts as a primer for reverse transcription to copy the RNA sequence.
  • primer binding sites for PCR At the 5 '-end of the splint and the 5 '-end of the RNA there are primer binding sites for PCR.
  • T4 RNA ligase 2 (T4 Rnl2) can ligate the RNA to the dsDNA complex.
  • T4 Rnl2 has been reported to ligate the 5 '-phosphate of DNA to the 3 '-hydroxyl of RNA across from a DNA splint (Nandakumar and Shuman, 2004, Mol Cell, 16: 211-221), but not for a 2 bp splint.
  • the ligation partner oligo (DirLig_LigPartner-l) was mixed with the splint oligo (DirLig_Splint2bp-l, DirLig_Splint5bp-l, DirLig_CtlSplint2bp-l, or DirLig_CtlSplint5bp-l) at a concentration of 2 mM of each oligo in water.
  • the solution was heated to 60 °C for 5 min and then cooled on ice for 5 min to anneal the two oligos.
  • RNA oligo DirLig RNA-l
  • T4 Rnl2 from New England Biolabs (catalog # M0239)
  • 2.67 pM DNA 13.3 pM RNA
  • IX T4 Rnl2 buffer 0.1% RNA
  • 1 U/pL T4 Rnl2 1 U/pL T4 Rnl2 at a total volume of 7.5 pL.
  • the reaction was incubated at 37 °C for 30 min and then quenched by adding 7.5 pL 2X TBE-urea sample loading buffer (Bio- Rad catalog # 1610768) and heating to 95 °C for 5 min.
  • the ligation partner oligo (DirLig_LigPartner-l) was mixed with the splint oligo (DirLig_Splint2bp-l, DirLig_Splint5bp-l, or DirLig_CtlSplint2bp-l) at a concentration of 4 pM each oligo in water. The solution was heated to 70 °C for 5 min and then cooled on ice for 5 min to anneal the two oligos.
  • RNA oligo (DirLig_RNA-l) using T4 Rnl2 from New England Biolabs (catalog # M0239) with 1 pM DNA, 2 pM RNA, IX T4 Rnl2 buffer, and 1 U/pL T4 Rnl2 at a total volume of 10 pL.
  • some reactions contained PEG or a helper oligo according to the table below.
  • the reactions were incubated at 22 or 37 °C, according to the table below, for 2 h and then quenched by adding 10 pL of 2X TBE- urea sample loading buffer (Bio-Rad catalog # 1610768) and heating to 95 °C for 5 min.
  • the ligation partner oligo (DirLig_LigPartner-l) was mixed with the splint oligo (DirLig_Splint2bp-l, DirLig_Splint5bp-l, or DirLig_CtlSplint2bp-l) at a concentration of 4 mM each oligo in water. The solution was heated to 70 °C for 5 min and then cooled on ice for 5 min to anneal the two oligos.
  • the pre-annealed mock DNA barcode at a final concentration of 1 mM, was mixed with 2 pM RNA oligo (DirLig_RNA-l) and 2 pM helper oligo (DirLig_pHelp2bp- 1 ) in IX T4 DNA ligase buffer with 20 U/pL T4 DNA ligase from New England Biolabs (catalog # M0202) at a total volume of 10 pL.
  • the reaction was incubated at 22 °C for 2 h.
  • a sample for gel analysis was prepared by taking 7.5 pL of the reaction, adding 7.5 pL of 2X TBE-urea sample loading buffer (Bio-Rad catalog # 1610768), and then heating to 95 °C for 5 min.
  • Negative control reactions were prepared as above, except with no helper oligo, the mispaired splint DirLig_CtlSplint2bp-l, or no ligase.
  • the ligation product was reverse transcribed using Superscript III according to the standard protocol (Thermo catalog # 18080093). Briefly, the reaction mixture was composed of IX first strand buffer, 0.5 mM dNTPs, 5 mM DTT, 1 U/pL Superase-In RNase Inhibitor (Thermo catalog # AM2694), and 10 U/pL superscript III with 0.6 pL ligation reaction in a total volume of 20 pL. The reaction was incubated at 55 °C for 30 min and then heat inactivated at 75 °C for 15 min. No primer was added since the splint or splint-helper ligation product acted as the primer.
  • each reaction was PCR amplified using standard Taq DNA polymerase (Thermo catalog # 10342020) according to the standard protocol. Briefly, the reaction mixture was composed of IX PCR buffer, 1.5 mM MgCh, 0.2 mM dNTPs, 0.2 pM DirLig For- 1, 0.2 pM DirLig_Rev-l, and 0.04 U/pL Taq DNA polymerase with 2 pL reverse transcription reaction in a total volume of 50 pL.
  • the PCR method was 94 °C for 3 min; 35 cycles of 94 °C for 45 s, 55 °C for 30 s, and 72 °C for 30 s; and finally 72 °C for 3 min.
  • a 2 pL sample of each PCR reaction was run on a 2% agarose E-gel (Thermo catalog # G402002) and then imaged by epi- fluorescence on an Azure c600 gel imager.
  • FIG. 9 shows (panel A) 10% PAGE results of ligation reaction with Cy5-labeled RNA in red (appearing as dark grey spots in the picture of the PAGE gel as shown) and SYBR-gold- stained RNA and DNA in green (appearing as light grey spots in the gel picture).
  • Panel B shows 2% agarose E-gel analysis of the RT-PCR reactions, with the Exactgene mini-DNA ladder in lane M and the desired product indicated with an arrow.
  • Panel C provides a description of samples 1- 6 from panels A and B.
  • Ligation of the RNA to the DNA barcode across a 2 bp splint using T4 DNA ligase and a helper DNA oligo was confirmed.
  • the ligation product gave the desired PCR product after RT- PCR.
  • the ligation and RT-PCR reactions worked best with the helper oligo, correct splint sequence, and enzyme for the 2 bp splint.
  • the splint was able to act as a primer for RT-PCR even without ligation, as indicated by sample 6 lacking T4 Rnl2.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides encoded libraries of small molecules that may be used to screen for binding to a nucleic acid target, such as an RNA or relevant fragment thereof implicated in a disease, disorder, or condition. The present invention also provides enriched encoded libraries and methods for preparing the same. The present invention further provides methods and kits for ligation, such as proximity-based ligation, of the nucleic acid target to a nucleic acid that encodes a small molecule library member, thus enabling methods of screening, preparation of enriched libraries, identification of screening hits, and sample processing.

Description

ENCODED LIBRARIES AND METHODS OF USE FOR SCREENING NUCLEIC ACID
TARGETS
TECHNICAL FIELD OF THE INVENTION
[0001] The present invention relates to encoded libraries and methods of use thereof for screening and identifying candidate compounds for binding to a nucleic acid target of interest.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0002] This application claims the benefit of U.S. Provisional Application No. 62/680,946, filed on June 5, 2018, the entirety of which is hereby incorporated by reference.
SEQUENCE LISTING
[0003] The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on June 3, 2019, is named 394457-005WO_l67379_SL_ST25.TXT and is 3,642 bytes in size.
BACKGROUND OF THE INVENTION
[0004] Ribonucleic acids (RNAs) historically have been considered mere transient intermediaries between genes and proteins, whereby a protein-coding section of deoxyribonucleic acid (DNA) is transcribed into ribonucleic acid (RNA) that is then translated into a protein. Most RNA was thought to lack defined tertiary structure, and even where tertiary structure was present it was believed to be largely irrelevant to the RNA’s function as a transient messenger. This understanding has been challenged by the recognition that RNA, including non-coding RNA (ncRNA), plays a multitude of critical regulatory roles in the cell and that RNA can have complex, defined, and functionally essential tertiary structure.
[0005] Intervention in the transcriptome has the potential to treat a vast array of diseases, but has mostly been investigated with nucleic acid-based therapeutic modalities such as antisense RNA or siRNA. Unfortunately, in most cases such approaches have yet to overcome significant challenges including drug delivery, absorption, distribution to target organs, pharmacokinetics, and cell penetration. In contrast, small molecules have a long history of successfully surmounting these barriers and these qualities, which make them suitable as drugs, and they are readily optimized. However, there are no validated, general methods of screening small molecules for binding to RNA or other nucleic acid targets in general. Consequently, there exists an urgent need for new methods to screen libraries of small molecules against a nucleic acid target of interest or even a library of nucleic acid targets.
[0006] DNA-encoded chemical libraries (DEL) are a technology that enables the synthesis and screening, on a massive scale, of libraries of small molecules. DEL technology bridges the fields of combinatorial chemistry and molecular biology and represents a well-validated tool for drug discovery against protein targets. The aim of DEL technology is to enable massive parallel screening in early phase drug discovery efforts such as target validation and hit identification, thereby accelerating and decreasing costs in the drug discovery process.
[0007] DEL technology generally uses DNA“barcodes” to give each library member a unique identifier. In some cases the DNA sequences include segments that direct and control chemical synthesis of small molecule library members from building block precursors. The technique enables massively parallel creation and interrogation of libraries via affinity selection, typically on an immobilized protein target. Homogeneous methods for screening DNA-encoded libraries are also available using, for example, water-in-oil emulsion technology to isolate individual ligand- target complexes that are later identified.
[0008] Current DEL technologies evaluate test compound binding against a protein target that is in solution, immobilized in a matrix, or is conjugated to a DNA barcode. However, a validated approach to DEL library screening against a nucleic acid target does not exist. Thus, current DEL technologies have not been shown to allow screening libraries of small molecules against targets that are themselves nucleic acids. The ability to screen compound libraries against a nucleic acid target would enable medicinal intervention in the transcriptome. Thus, there is a need in the art for improved DEL technologies capable of being applied to one or more nucleic acid targets.
BRIEF DESCRIPTION OF THE FIGURES
[0009] FIG. 1 shows cartoons of how a YoctoReactor® (yR) is used to prepare a DEL. (a) Self-assembly of chemical building blocks (BBs) conjugated to conserved DNA sequences serve to direct the synthesis and simultaneously record the synthetic route via a distal BB coding region. The conserved DNA is designed to self-assemble into a three-way junction (3WJ) or four-way DNA junction (4WJ), thus three or four BBs can be brought into close proximity and allowed to react (b) Representation of an arm of the yR. BBs are attached via cleavable or non-cleavable linkers to bispecific DNA oligonucleotides (oligo-BBs) designed to contain a DNA barcode for the attached BB at the distal end of the oligo and an area of conserved DNA sequence that self- assembles the DNA into a 3WJ or 4WJ. (c) Two reactants are brought into close proximity at the cavity at the center of a yR DNA junction, which has a volume of about one yoctoliter (10 24 L). The close proximity of the two reactants at the cavity results in a high effective concentration of the reactants and facilitates high reaction rates between the BB and acceptor (d) Representative member of a DEL library prepared from a yR comprising a 3WJ. The size of a DEL library is determined by the number of different BB-oligos as well as yR geometry.
[0010] FIG. 2 shows a cartoon scheme for preparing a small molecule DEL library member (display product) by the yR approach. Each DNA strand contains a codon region which encodes for the particular BB. In the first step, repertoires of two DNA strands with individual codons and BB conjugates are mixed together with a complementary DNA strand that assembles the yR. Because of sequence complementarity, these DNA strands self-assemble combinatorially into a stable three-way junction forming the stable double-stranded framework of the yR. The BBs are then coupled in a chemical reaction. Repetition with a third BB-oligo and cleavage of all by one of the BB linkers followed by purification and primer extension leads to the library member.
[0011] FIG. 3 shows a scheme of the Binder Trap Enrichment® (BTE) method of library screening. A DEL mixed with a DNA-labeled target is allowed to reach equilibrium in solution where the target concentration can be controlled. A rapid dilution during which the binding kinetics become dominated by dissociation is followed by a rapid emulsion formation which traps the bound ligands with the target within aqueous emulsion droplets. Once trapped, the target and the library DNA are ligated inside the droplets. PCR amplification and sequencing allow for determining hits.
[0012] FIG. 4 shows cartoons of some model nucleic acid sequences used to determine exemplary conditions for an RNA-DNA ligation. As shown by the directionality of the“Forward” and“Reverse” arrows in the top cartoon, the base of each arrow is the 5 '-end of the nucleic acid, while the head of the arrow is the 3 '-end. The top cartoon shows a setup in which a DNA Ligation Partner and Splint are hybridized with a short overhang that is complementary to the 3 '-end of an target RNA. The 5 '-end of the RNA includes a Cy5 label for facilitating gel analysis. An optional Helper oligo may be included, e.g. to facilitate a ligation between the RNA and the Ligation Partner or facilitate reverse transcription. In some cases, a ligase or mixture of ligases was surprisingly found to ligate either the RNA to the Ligation Partner or both the RNA to the Ligation Partner and the Splint to the Helper. In the bottom cartoon, the Ligation Partner and Splint may be dsDNA and may include a primer (Primer 1); the RNA may include another primer (Primer 2). The overhang in this example is a TC dinucleotide on the Splint that pairs with an AG on the RNA.
[0013] FIG. 5 shows PAGE results for a ligation experiment. Successful ligation was observed with the 5 bp splint (DirLig_Splint5bp-l), but not the 2 bp splint (DirLig_Splint2bp-l). The negative control splints (DirLig_CtlSplint2bp-l and DirLig_CtlSplint5bp-l) did not enable ligation.
[0014] FIG. 6 shows PAGE results for a set of modified ligation conditions for a 2 bp overhang that included 5% PEG4000 or a helper oligo with incubation at 22 °C.
[0015] FIG. 7 shows PAGE results for a set of modified ligation conditions featuring SplintR or T4 RNA Ligase 2 optionally in the presence of PEG4000 and different temperatures.
[0016] FIG. 8 shows PAGE results for modified ligation conditions featuring T4 DNA ligase and/or SplintR. The use of DNA ligases and a ligatable helper oligo enabled ligation of the RNA to DNA across a 2 bp splint.
[0017] FIG. 9 shows (panel A) 10% PAGE results of a ligation reaction with Cy5-labeled RNA in red (appearing as dark grey spots in the picture of the PAGE gel as shown) and SYBR- gold-stained RNA and DNA in green (appearing as light grey spots in the gel picture). Panel B shows 2% agarose E-gel analysis of the RT-PCR reactions, with the Exactgene mini-DNA ladder in lane M and the desired product indicated with an arrow. Panel C provides a description of samples 1-6 from panels A and B.
[0018] FIG. 10 shows a cartoon of an exemplary DEL screening strategy that uses one or two ligations to capture small molecule-target binding information for later analysis, e.g. by sequencing.
[0019] FIG. 11 shows a cartoon of an exemplary DEL screening strategy that uses a ligation to capture small molecule-target binding information for later analysis, e.g. by sequencing.
[0020] FIG. 12 shows a cartoon of an exemplary DEL screening strategy that uses reverse transcription of a partially double-stranded nucleic acid target-DEL sequence to capture small molecule-target binding information for later analysis, e.g. by sequencing of the fully double- stranded product, where the sequence denoted as“RT product” is added by a reverse transcriptase.
[0021] FIG. 13 shows a cartoon of an exemplary library screen. An encoded small molecule attached to a nucleic acid barcode is allowed to bind to a nucleic acid target, here a stem-loop RNA structure (for example, an miRNA-mRNA featuring a 3WJ and stem-loop structure). After binding, the complex is rapidly diluted and emulsified, then ligated in the emulsion. RT-PCR and sequencing allows counting of hits from the screen.
[0022] FIG. 14 shows overviews of various methods of assembling DNA-encoded libraries (a) DNA-recorded libraries are constructed through iterative steps of splitting, building block coupling, tag ligation and pooling (b) Libraries assembled by DNA-templated synthesis (DTS) harness the increased molarity obtained through the hybridization of complementary oligonucleotides for stepwise synthesis of DNA-encoded macrocycles (c) Related methods based on DNA-junctions such as the YoctoReactor® similarly rely on proximity-based reactions but do not necessarily require a pre-existing DNA template (d) In DNA routing, the library is synthesized through iterative segregation using immobilized complementary oligonucleotides into separate compartments, in which coupling of the corresponding building block occurs (e) Encoded self-assembling chemical (ESAC) libraries are assembled from sub-libraries by hybridizing oligonucleotides. When highly repetitive cycles are utilized for library assembly, only the first cycles and the final products are illustrated.
DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
1. General Description of Certain Embodiments of the Invention; Definitions
DNA-Encoded Libraries (DELs) and Methods of Preparation and Processing
[0023] DEL technology was developed as an alternative to combichem/HTS (combinatorial chemistry/high-throughput screening) to permit the rapid synthesis of millions to billions of drug like compounds and provide a resource efficient method to screen the diversity. By linking small molecules to a DNA code, large combinatorial libraries of drug-like compounds can be synthesized and screened against biological targets in a single-pot format. Hits are then deconvoluted by next- generation DNA sequencing. The encoding and directing of chemistry by DNA now make it possible to efficiently generate a much greater chemical diversity and interrogate the diversity robustly. In some cases, DELs can be subjected to evolutionary selection and become enriched for the small subset of the library which binds the target of interest. This subset is subsequently tested in functional assays. This straightforward method for discovering ligands that modulate targets works because it does not rely on enzymatic turnover. Thus, DELs are particularly versatile tools for discovery in a post-genomic era.
[0024] The screening of DNA-encoded small-molecule libraries is increasingly being applied in industrial drug discovery programs, and indeed compounds discovered in such screens, such as GSK2256294A, an inhibitor of soluble epoxide hydrolase, are now entering clinical trials. A key consideration is the rate of false positive hits. A viable success rate in drug discovery programs employing DELs depends both on the quality of the library and the screening process. Key lessons learned from traditional combinatorial chemistry and HTS are the importance of the library purity and the reliability of the screening assay. For DELs that feature exceptionally large compound diversity, a low false positive rate is especially crucial to limit extensive follow-up. A low false positive rate is also crucial for the predictive value of the data set generated to guide the hit-to- lead process. Optimization of DEL technology enables the screening of tens of millions of compounds to be conducted efficiently to yield potent hits and binding data rapidly.
[0025] To increase the probability of finding valuable hits to a wide range of targets compound libraries should in theory be as diverse as possible, without compromising library quality and drug- like properties of the library members. Molecular weight (MW) and cLogP are generally accepted as being the two most important drug-like parameters to consider. For libraries with a proposed average MW < 500 Da, a maximum of three diversity points (three building blocks (BBs)) are allowed, and this will in practice generate library sizes in the millions. By contrast, if average MW requirements are relaxed, four diversity points can be employed and library sizes in the billions can be achieved.
[0026] Since DELs are combinatorial and normally synthesized in just a few steps, the key features defining the characteristics of the final compounds are defined mainly by the nature of the BBs used to assemble the small molecule. The chemistries used in the library assembly to join the BBs contribute little to the diversity but instead define the range of BBs which are readily accessible for library assembly. Several reactions compatible with DNA have been developed, since in general the chemical reactions used to join the BBs must bioorthogonal, i.e. not react with the DNA barcodes. Accordingly, in some embodiments, the DEL is assembled by bioorthogonal chemical reactions such as amide formation, reductive amination, Pd-catalyzed cross-couplings, nucleophilic aromatic substitution, cycloaddition, urea formation, and protecting group manipulations.
[0027] In all combinatorial approaches, library purity greatly influences the outcome of the screening process and the efficiency of interpreting screening data by influencing the signal-to- noise of screening and the false positive rate. A high-fidelity DEL having a functional size comparable to the nominal or theoretical library size is a library in which all DNA codes are attached to the correct molecule anticipated by the sequence such that no compounds are incorrectly encoded. One advantage of the yR approach is that all truncated and unreacted products are eliminated and each compound in the high fidelity library is adequately represented. A high- fidelity library allows a higher level of information to be extracted directly from a library screen without further data processing and is thus less resource intensive. Examination of related structures identified in a primary screen, for example, immediately provides information on key pharmacophores and regions where chemical modifications are allowed providing a data set to accelerate the hit-to-lead process and a better starting point for further lead optimization.
[0028] In DEL screens the attachment of a small molecule to a unique DNA barcode allows straightforward identification of“hits,” i.e. molecules that bind to the target. Conventionally, DEL libraries are put through affinity selection on a selected, immobilized target protein, after which non-binders are removed by washing steps, and binders may be amplified by polymerase chain reaction (PCR) and identified by reference to their DNA code, for example, by DNA sequencing. In evolution-based DEL technologies, as described below, hits can be further enriched by performing rounds of selection, PCR amplification and translation in analogy to biological display systems such as antibody phage display. This makes it possible to work with larger compound libraries than previously possible. In some embodiments of the present invention, an“enriched library” refers to either a subset of an encoded library whose members have been selected (enriched) for binding to a particular target of interest, or a library that has been selected (enriched), through one, two, three, four, or more rounds of evolution-based selection, for binding to a target of interest.
[0029] The last two decades have seen the introduction of several independent developments in DEL technology. These recent DEL technology types can be classified as either non-evolution- based DEL technology platforms or evolution-based DEL technologies capable of molecular evolution. The first type benefits from library generation using commercially-available, off-the- shelf reagents (for example, generation of small molecule library members from commercially- available building blocks), so library generation is straightforward. More sophisticated approaches, involving custom preparation of chemical building blocks, are also known and are advantageous in some applications, for example because they can explore more unique areas of chemical space than ever before. One can identify hits by DNA sequencing, but DNA translation and therefore molecular evolution is not feasible by these methods. The split-and pool approaches developed at Praecis Pharmaceuticals (now owned by GlaxoSmithKline), Nuevolution (Copenhagen, Denmark) and ESAC technology developed in the laboratory of Prof. D. Neri (Institute of Pharmaceutical Science, Zurich, Switzerland) fall under this category. ESAC technology is notable as being a combinatorial, self-assembling approach that has some similarities to fragment-based drug discovery. Here DNA annealing enables discrete building block combinations to be sampled, but no chemical reaction takes place between them. Examples of evolution-based DEL technologies include DNA-routing developed by Prof. D. R. Halpin and Prof. P. B. Harbury (Stanford ETniversity, Stanford, CA), DNA-templated synthesis developed by Prof. D. Liu (Harvard University, Cambridge, MA) and YoctoReactor® technology, developed and commercialized by Vipergen (Copenhagen, Denmark). See, e.g., Heitner, T. R., el al ., “Streamlining hit discovery and optimization with a yoctoliter scale DNA reactor,” Expert Opinion on Drug Discovery 2009, 4(11), 1201-1213, hereby incorporated by reference. These technologies are described in further detail below. DNA-templated synthesis and YoctoReactor® technology require the prior conjugation of chemical building blocks (BBs) to a DNA oligonucleotide tag before library assembly, making it somewhat more labor-intensive upfront to assemble the library. Furthermore, the DNA tagged BBs enable the generation of a genetic code for synthesized compounds and artificial translation of the genetic code is possible. Artificial translation is possible because the small molecule candidate compounds (which are reaction products from multiple BBs) can be recalled by the PCR-amplified genetic code, and the library compounds can be regenerated (decoded). This, in turn, enables the principle of Darwinian natural selection and evolution to be applied to small molecule selection in direct analogy to biological display systems through rounds of selection, amplification and translation.
[0030] DELs are a subset of so-called in vitro display libraries, of which other examples are known in the art and may be used in accordance with the present invention. The term“in vitro display library” as used herein refers to a library comprising numerous different binding entities (small molecules) wherein each binding entity is attached to a nucleic acid molecule and the nucleic acid molecule comprises specific nucleic acid sequence information allowing one to identify the binding entity. More specifically, once one knows the specific nucleic acid sequence information of the nucleic acid molecule one can derive the structure of the specific binding entity attached to the nucleic acid molecule.
[0031] A number of different methods to make such in vitro display libraries are known in the art, such as described in, e.g., EP1809743 (Vipergen), EP1402024 (Nuevolution), EP1423400 (David Liu), Nature Chem. Biol. (2009), 5:647-654 (Clark), WO 00/23458 (Harbury), Nature Methods (2006), 3(7), 561-570, 2006 (Miller), Nat. Biotechnol. 2004; 22, 568-574 (Melkko), Nature. (1990); 346(6287), 818-822 (Ellington), Proc Natl Acad Sci USA 1997, 94 (23): 12297- 302 (Roberts), W02006/053571 (Rasmussen), Shi, B., et al. , Bioor ganic & Medicinal Chemistry Letters , 2016, 27(3), 361-369, and Brenner, S. and Lemer, R. Proc. Nati. Acad. Sci. USA, 1992, 89, 5381-5383; all of which are hereby incorporated by reference.
[0032] As described in the above documents, it is possible to prepare in vitro display libraries comprising a large number (e.g. 1012) specific binding entities (such as 1012 different chemical compounds). In some embodiments, the DEL is assembled by one of the methods described herein or, e.g., as shown in FIG. 14.
Non-Evolution Based Technologies
Split-and-Pool DNA Encoding
[0033] In some embodiments, the present invention provides DELs of small molecules prepared by a non-evolution-based split-and-pool method. Split-and-pool methods are known in the art and include those described herein. For example, initially a set of unique DNA- oligonucleotides (n), each containing a specific coding sequence, is chemically conjugated to a corresponding set of small organic molecules. Consequently, the oligonucleotide-conjugate compounds are mixed (“Pool”) and divided (“Split”) into a number of groups ( m ). Using appropriate conditions a second set of building blocks ( m ) are coupled to the first one and a further oligonucleotide which codes for the second modification is enzymatically introduced before mixing again. These“split-and-pool” steps can be iterated a number of times (r) and by doing so one increases at each round the library size in a combinatorial manner. By performing r rounds of split-pool synthesis with n alternate chemical groups per round, one achieves a diversity of if compounds.
[0034] A promising strategy for the construction of DNA-encoded libraries is represented by the use of multifunctional building blocks covalently conjugated to an oligonucleotide serving as a“core structure” for library synthesis. In a“pool-and-split” fashion a set of multifunctional scaffolds undergo orthogonal reactions with series of suitable reactive partners. Following each reaction step, the identity of the modification is encoded by a chemical or enzymatic addition (e.g., by ligation) of a DNA segment to the original DNA“core structure.” See, e.g., Mannocci, L., et al ., “High-throughput sequencing allows the identification of binding molecules isolated from DNA-encoded chemical libraries,” Proc. Natl. Acad. Sci. U.S.A. 2008, 105(46), 17670-5; Buller, F., et al,“Design and synthesis of a novel DNA-encoded chemical library using Diels- Alder cycloadditions,” Bioorg. Med. Chem. Lett. 2008, 18(22), 5926-31. The use of N-protected amino acids covalently attached to a DNA fragment allow, after a suitable deprotection step, a further amide bond formation with a series of carboxylic acids or a reductive amination with aldehydes. Similarly, diene carboxylic acids used as scaffolds for library construction at the 5 '-end of an amino modified oligonucleotide can be subjected to a Diels-Alder reaction with a variety of maleimide derivatives. As mentioned above, many other bioorthogonal reactions are known and are being developed to further extend the possible chemical diversity of DELs. See, e.g., Arico-Muendel, Med. Chem. Commun. 2016, 7, 1898-1909 and Goodnow, R. A. Jr. et al, Nat. Rev. Drug Discov. 2017, Feb; l6(2): l3 l-l47, hereby incorporated by reference.
Combinatorial Self-Assembing Libraries
[0035] In some embodiments, the present invention provides DELs of small molecules prepared as a combinatorial self-assembling library. Combinatorial self-assembling libraries include encoded self-assembling chemical libraries (ESAC libraries). Encoded Self-Assembling Chemical (ESAC) libraries rely on the principle that two sublibraries of a size of x members (e.g. 103) containing a constant complementary hybridization domain can yield a combinatorial DNA- duplex library after hybridization with a complexity of x2 uniformly represented library members (e.g. 106). Each sub-library member generally consists of an oligonucleotide containing a variable, coding region flanked by a constant DNA sequence, carrying a suitable chemical modification at the oligonucleotide extremity. The ESAC sublibraries can be used in at least four different embodiments. See, e.g., Melkko, S., et al,“Encoded self-assembling chemical libraries,” Nat. Biotechnol. 2004, 22(5), 568-74; hereby incorporated by reference. [0036] In some embodiments, a sub-library is paired with a complementary oligonucleotide and used as a DNA encoded library displaying a single covalently linked compound for affinity- based selection experiments.
[0037] In some embodiments, a sub-library is paired with an oligonucleotide displaying a known binder to the target, thus enabling affinity maturation strategies.
[0038] In some embodiments, two individual sublibraries are assembled combinatorially and used for the de novo identification of bidentate binding molecules.
[0039] In some embodiments, three different sublibraries are assembled to form a combinatorial triplex library.
[0040] In some embodiments, preferential binders isolated from an affinity-based selection are PCR-amplified and decoded on complementary oligonucleotide microarrays or by concatenation of the codes, subcloning and sequencing. Such methods are described, for example, in Lovrinovic, M., el al ., “DNA microarrays as decoding tools in combinatorial chemistry and chemical biology,” Angew. Chem. Int. Ed. Engl. 2005, 44(21), 3179-83 and Melkko, S., et al.,“Encoded self-assembling chemical libraries,” Ari/. Biotechnol. 2004, 22(5), 568-74; hereby incorporated by reference. The individual building blocks can eventually be conjugated using suitable linkers to yield a drug-like, high-affinity compound. The characteristics of the linker (e.g. length, flexibility, geometry, chemical nature and solubility) influence the binding affinity and the chemical properties of the resulting binder. For example, bio-panning experiments on HSA of a 600- member ESAC library allowed the isolation of the 4-(p-iodophenyl)butanoic moiety. This compound represents the core structure of a series of portable albumin binding molecules and of Albufluor™, a recently developed fluorescein angiographic contrast agent currently under clinical evaluation. As a further example, ESAC technology has been used for the isolation of potent inhibitors of bovine trypsin and for the identification of novel inhibitors of strom elysin-l (MMP- 3), a matrix metalloproteinase involved in disease processes such as arthritis and metastasis.
[0041] ESAC libraries may be prepared in several variations. Generally, in ESAC libraries small organic molecules are coupled to 5 '-amino modified oligonucleotides, containing a hybridization domain and a unique coding sequence, which allows identification of the coupled molecule. In some embodiments, the ESAC library is used in (1) a single pharmacophore format, (2) in affinity maturations of known binders, (3) in de novo selections of binding molecules by self assembling of sublibraries in DNA-double strand format, or (4) in DNA-triplexes. The ESAC library in the selected format is used in a selection and read-out procedure. Generally, following (i) incubation of the library with the target protein of choice and (ii) washing of unbound molecules, the oligonucleotide codes of the binding compounds are PCR-amplified and compared with the library without selection on oligonucleotide micro-arrays. Identified binders/binding pairs are validated after conjugation (if appropriate) to suitable scaffolds.
Evolution-Based Technologies
DNA-Routing
[0042] In 2004, D.R. Halpin and P.B. Harbury introduced a method for the construction of DNA-encoded libraries in which the DNA-conjugated templates served for both encoding and programming the infrastructure of the split-and-pool synthesis of the library components. Halpin, D.R. and Harbury, P.B.,“DNA display I. Sequence-encoded routing of DNA populations,” PLoS Biol. 2004, 2(7), E173, hereby incorporated by reference. This design enabled alternating rounds of selection, PCR amplification and diversification with small organic molecules, in complete analogy to phage display technology. The DNA-routing machinery consists of a series of connected columns bearing resin-bound anticodons, which could sequence-specifically separate a population of DNA-templates into spatially distinct locations by hybridization. According to this split-and-pool protocol a peptide combinatorial library DNA-encoded of 106 members was generated. Halpin, D.R. and Harbury, P.B., “DNA display II. Genetic manipulation of combinatorial chemistry libraries for small-molecule evolution,” PLoS Biol. 2004, 2(7), E174, hereby incorporated by reference.
DNA -Recording
[0043] The most widely applied method for library synthesis is a combinatorial split-and-pool approach (FIG. 14 (a)). Herein, individual steps of chemical synthesis are encoded by segregating aliquots of the nascent library and conducting one specific chemical step. The ligation of one specific oligonucleotide is then performed within each segregated compartment. Ligation and building block installation occur on a bifunctional oligonucleotide that supports both processes. The resultant libraries may be double-stranded or single-stranded. Multiple chemical and encoding steps are then conducted using a split-and-pool methodology. Most reported work using DNA- recorded chemistry uses enzymatic ligation to catenate oligonucleotide tags. Chemical ligation may also be used.
[0044] DNA-recording grew out of work at Praecis Pharmceuticals and GlaxoSmithKline (GSK). It has been used, for example, to generate lead compounds at GSK targeting the proteins soluble epoxide hydrolase (sEH) and Receptor interacting protein 1 kinase (RIP1). See, e.g., Arico-Muendel, Mecl. Chem. Commun. 2016, 7, 1898-1909 and Goodnow, R. A. Jr. etal , Nat. Rev. Drug Discov. 2017, Feb; 16(2): 131-147, each of which is hereby incorporated by reference. Accordingly, in some embodiments, the DEL is prepared by the use of DNA-recording.
[0045] According to a typical approach, DNA-recording begins with a chimeric DNA-linker library starting material termed a headpiece. Two short, complementary DNA sequences are stabilized as a duplex by a PEG-based reverse turn that displays an amino-PEG linker. Double- stranded tags can then be ligated to the headpiece DNA (whose ends contain a 2 base overhang) while the amino group attached to the PEG-based portion is derivatized as a small molecule warhead. ETsing multiple cycles of split-and-pool synthesis, large and diverse libraries are generated.
DNA-Templated Synthesis
[0046] In some embodiments, the DEL is prepared by DNA-templated synthesis. DNA- templated synthesis was first reported in 2001 by David Liu and co-workers. According to this method, complementary DNA oligonucleotides are used to assist certain synthetic reactions that do not efficiently take place in solution at low concentration. Gartner, Z.J. et al .,“The generality of DNA-templated synthesis as a basis for evolving non-natural small molecules,” J Am. Chem. Soc. 2001, 123(28), 6961-3; and Calderone, C.T. et al, “Directing otherwise incompatible reactions in a single solution by using DNA-templated organic synthesis,” Angew. Chem. Int. Ed. Engl. 2002, 41(21), 4104-8; hereby incorporated by reference.
[0047] DNA-templated synthesis is based on codon specific recognition of DNA sequences where a library of encoded DNA templates is used to direct the synthesis of library members using BBs conjugated to complementary codon specific DNA sequences. Generally, a DNA- heteroduplex is used to accelerate the reaction between BBs displayed at the extremities of the two DNA strands. Furthermore, the“proximity effect,” which accelerates the bimolecular reaction, was shown to be distance-independent (at least within a distance of 30 nucleotides). In a sequence- programmed fashion, oligonucleotides carrying one chemical reactant were hybridized to complementary oligonucleotide derivatives carrying a different reactive chemical group. The proximity conferred by the DNA hybridization drastically increases the effective molarity of the reaction reagents attached to the oligonucleotides, enabling the desired reaction to occur even in an aqueous environment at concentrations which are several orders of magnitude lower than those needed for the corresponding conventional organic reaction when not DNA-templated. DNA- templated synthesis has been used in preparing libraries of macrocyclic compounds.
YoctoReactor® (yR) and Other Approaches to Compound Library Assembly
[0048] The YoctoReactor® (yR) is a combinatorial synthetic approach that exploits the self- assembling nature of DNA oligonucleotides into 3, 4, or 5-way junctions to direct small molecule synthesis at the center of the junction by bringing reactants into close proximity. Synthesis of encoded libraries using the yR approach and variations thereof is described in, for example, U.S. Patent 8,202,823, U.S. 7,928,211, U.S. Patent Application Publication No. US 2017/0233726, and Hansen, M. H., el al. ,“A yoctoliter-scale DNA reactor for small molecule evolution,” J Am Chem Soc 2009, 737, 1322-1327, hereby incorporated by reference.
[0049] The cavity at the center of the yR DNA junction has a volume of about one yoctoliter (10 24 L). Such a minute volume is on the order of that required for a chemical reaction between two single molecules. When two reactants are brought into such close proximity, the effective concentration of the reactants is in the high-mM range, resulting in high reaction rates. The high reaction rate facilitated by the DNA junction effects chemical reactions that otherwise would not take place at practically feasible rates at the actual concentrations of the reactants in solution, which would be multiple orders of magnitude lower.
[0050] To prepare chemical libraries via the yR method, small-molecule chemical building blocks (BBs) are attached via cleavable or non-cleavable linkers to one of three designed, bispecific DNA oligonucleotides (oligo-BBs) that are capable of forming a three-way junction (3WJ) representing each arm of the yR, as shown in FIG. 1. See, e.g., Hansen, N. J. V. et al, “Fidelity by design: Yoctoreactor and binder trap enrichment for small-molecule DNA-encoded libraries and drug discovery,” Curr. Opin. Chem. Biol. 2015, 26, 62-71, hereby incorporated by reference. In some embodiments, each yR arm is an 18 bp stem with a 4 nt loop, and the whole oligo used in the combinatorial synthesis is 40 nt. In some embodiments, each yR arm is about 10-30, 12-28, 14-26, 16-24, or 18-22 bp stem with a loop of about 2-14, 3-13, 4-12, 5-11, 6-10, or 8-9 nt.
[0051] Accordingly, in some embodiments, the BBs are attached to three different, bispecific DNA oligonucleotides (oligo-BBs), which then interact so as to form a YoctoReactor® (yR) comprising a three-way junction (3WJ). In some embodiments, the BBs are attached to four different, bispecific DNA oligonucleotides, which then interact so as to form a yR comprising a four-way junction (4WJ).
[0052] In order to produce test compound libraries in a combinatorial manner, the oligo-BBs are designed such that the oligo contains (a) the code for an attached BB at the distal end of the oligo and (b) areas of constant DNA sequence that self-assemble the DNA into a 3WJ or 4WJ regardless of the identity of the BB and the subsequent chemical reaction. Generally, library preparation is carried out in a stepwise combinatorial fashion, for example as shown in FIG. 2.
[0053] In some embodiments, an area of constant DNA sequence capable of self-assembly with another such area (a hybridization region) is about 5 to about 200 nucleotides in length. In some embodiments, an area of constant DNA sequence is about 10 to about 150 nucleotides in length, or about 10-100, 15-100, 20-100, 10-80, 10-60, 15-50, or 20-40 nucleotides in length. In some embodiments, each yR arm comprises two hybridization regions, wherein each hybridization region is of about 10 nt each. In some embodiments, each yR arm comprises one, two, three, or four hybridization regions.
[0054] The length of the DNA barcodes may also vary. For example, the DNA barcodes may comprise at least 4 nucleotides in length, at least 5 nucleotides in length, at least 6 nucleotides in length, or at least 7, 8, 9, 10, 11, or 12 nucleotides in length. In some cases, each barcode sequence independently comprises from about 4 nucleotides in length to about 20 nucleotides in length. Barcodes are typically comprised of a relatively short sequence of nucleotides attached to a sample sequence, where the barcode sequence is either known, or identifiable by its location or sequence elements. In some embodiments, a unique identifier is useful for sample indexing and/or identification of the small molecule library member. In some cases, though, barcodes may also be useful in other contexts. For example, a barcode may serve to track samples throughout processing (e.g., location of sample in a lab, location of sample in plurality of reaction vessels, etc.); provide manufacturing information; track barcode performance over time (e.g., from barcode manufacturing to use) and in the field; track barcode lot performance over time in the field; provide product information during sequencing and perhaps trigger automated protocols (e.g., automated protocols initiated and executed with the aid of a computer) when a barcode associated with the product is read during sequencing; track and troubleshoot problematic barcode sequences or product lots; serve as a molecular trigger in a reaction involving the barcode, and combinations thereof. In some embodiments, and as alluded to above, barcode sequence segments as described herein are used to provide linkage information as between two discrete determined nucleic acid sequences. This linkage information may include, for example, linkage to a common sample, a common reaction vessel, e.g., a well or partition, or even a common starting nucleic acid molecule. In particular, by attaching common barcodes to a specific sample component, one can attribute the resulting sequences bearing that barcode to that sample component, e.g. small molecule.
[0055] As described in, e.g., EP1809743, which is hereby incorporated by reference, the barcode can be PNA, LNA, RNA, DNA or combinations thereof.
[0056] As described above, oligonucleotides incorporating barcode sequence segments, which function as a unique identifier, may also include additional sequence segments. Such additional sequence segments may include functional sequences, such as primer sequences, primer annealing site sequences, immobilization sequences, or other recognition or binding sequences useful for subsequent processing, e.g., a sequencing primer or primer binding site for use in sequencing of samples to which the barcode containing oligonucleotide is attached. Further, as used herein, the reference to specific functional sequences as being included within the barcode containing sequences also envisions the inclusion of the complements to any such sequences, such that upon complementary replication will yield the specific described sequence.
[0057] In some embodiments, barcodes or partial barcodes may be generated from oligonucleotides obtained from or suitable for use in an oligonucleotide array, such as a microarray or bead array. In such cases, oligonucleotides of a microarray may be cleaved, (e.g., using cleavable linkages or moieties that anchor the oligonucleotides to the array (such as photocleavable, chemically cleavable, or otherwise cleavable linkages)) such that the free oligonucleotides are capable of serving as barcodes or partial barcodes. In some embodiments, barcodes or partial barcodes are obtained from arrays are of known sequence. A microarray may provide at least about 10,000,000, at least about 1,000,000, at least about 900,000, at least about 800,000, at least about 700,000, at least about 600,000, at least about 500,000, at least about 400,000, at least about 300,000, at least about 200,000, at least about 100,000, at least about 50,000, at least about 10,000, at least about 1,000, at least about 100, or at least about 10 different sequences that may be used as barcodes or partial barcodes.
[0058] The length of a barcode sequence may be any suitable length, depending on the application (e.g., for homogeneous screening methods vs. bound to beads). In some embodiments, a barcode sequence is about 2 to about 500 nucleotides in length, about 2 to about 100 nucleotides in length, about 2 to about 50 nucleotides in length, about 2 to about 20 nucleotides in length, about 6 to about 20 nucleotides in length, or about 4 to 16 nucleotides in length. In some embodiments, a barcode sequence is about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 85, 90, 95, 100, 150, 200, 250, 300, 400, or 500 nucleotides in length. In some embodiments, a barcode sequence is greater than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 85, 90, 95, 100, 150, 200, 250,
300, 400, 500, 750, 1,000, 5,000, or 10,000 nucleotides in length. In some embodiments, a barcode sequence is less than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30,
35, 40, 45, 50, 55, 60, 65, 70, 85, 90, 95, 100, 150, 200, 250, 300, 400, 500, 750, or 1,000 nucleotides in length.
[0059] In some embodiments, barcodes with different sequences (or the same sequences) are assembled or, e.g., attached to beads, in separate steps. For example, in some embodiments, barcodes with unique sequences are attached to beads such that each bead has multiple copies of a first barcode sequence on it. In a second step, the beads can be further functionalized with a second sequence. The combination of first and second sequences may serve as a unique barcode, or unique identifier, attached to a bead. The process may be continued to add additional sequences that behave as barcode sequences (in some cases, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more barcode sequences are sequentially added to each bead). In some embodiments, the additional sequences that behave as barcode sequences (in some cases, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more barcode sequences) are ligated together in solution to assemble a complete barcode.
[0060] In some embodiments, the barcode is assembled from two or more shorter barcodes (single BB barcodes), each of which encode a particular BB. The barcode, in some embodiments, results from ligation of the two or more shorter barcodes. In such cases, the barcode encodes two or more BBs and the identity of a small molecule library member may be determined from the combination of the shorter barcodes.
[0061] In some embodiments, bifunctional BBs are used that comprise one functionality for linking to the DNA barcode and one functionality capable of undergoing a chemical reaction in the yR. In some embodiments, the BBs are linked covalently to their DNA barcodes via cleavable or non-cleavable linkers. In some embodiments, a DNA code unique to each BB is located at the distal end of the oligo. This ultimately enables the synthetic route of each assembled compound to be determined by its unique DNA barcode, which is a combination of the barcodes of its constituent BBs.
[0062] The use of bifunctional BBs and the ability to link them directly to their encoding DNA before the library synthesis provides several advantages. Firstly, since the code is intimately attached to the BB there is no chance of mismatch during the library synthesis resulting in a high fidelity library. Secondly, the DNA provides an excellent purification handle enabling incomplete reactions and truncated products to be eliminated from the yR library through systematic purification steps, resulting in an ultra-high purity library and significantly facilitating the interpretation of screening results.
[0063] The BBs are then allowed to react. In some embodiments, the library of compounds is prepared by contacting the BBs under appropriate conditions to facilitate chemical reactions between BBs, thus increasing the number of compounds in the library and their chemical diversity. Generally, chemical reactions are performed one step at a time. In some embodiments, the BBs are allowed to participate in multicomponent reactions in which three or four BBs participate. After each chemical reaction step the DNA is ligated and the product purified by suitable means, such as by polyacryamide gel electrophoresis or the like. Naturally, library size and chemical diversity increase with the number of chemical reaction steps performed, since the BBs can participate in numerous chemical reactions and the products from each round go on to participate in further chemical reactions, thus producing exponentially higher library members after each step.
[0064] Generally, cleavable linkers (BB-oligo) are used for all but one oligo-BB. In some embodiments, the cleavable linker is an amino-thiol linker such as those described in Hoejfeldt, J. W., etal. ,“A cleavable amino-thiol linker for reversible linking of amines to DNA,” J. Org. Chem. 2006, 71, 9556-9559, hereby incorporated by reference.
[0065] After the chemical reaction, the products are generally purified before proceeding further. Because of the increase in size of the DNA after the chemical reaction between the BBs, the product is easily purified by polyacrylamide gel electrophoresis (PAGE) under denaturating conditions, thus permitting only the desired reaction product to be recovered. The information regarding which BBs have reacted is now stored permanently by ligating the two DNA strands containing the codons. One of the BBs (linked to the DNA with a cleavable linker) is then cleaved.
[0066] A repertoire of BBs linked to a third DNA strand which encodes for each individual BB is now added and the sequence is repeated, resulting in the transfer of the third BB. The DNA contains a priming site so the yR is dismantled by forming the complementary DNA strand in a single round polymerase chain reaction (PCR). Thus a library of small molecules is produced in which each small molecule is attached covalently to a genetic code and can be readily identified by DNA sequencing. The combinatorial library is now ready to be affinity screened against the desired target.
[0067] After product generation and cleavage of the linkers, the small molecules in the library are all attached by a single covalent link to their respective DNA codes. Table A outlines how libraries of different sizes can be generated using yR technology. Library size is determined by the number of different BB-oligos as well as the number of arms in the DNA junction (i.e., 3WJ vs. 4WJ vs. 5WJ).
Table A: YoctoReactor® library size. yR library size is a function of the number of different functionalized oligos used in each position and the number of positions in the
DNA junction
Figure imgf000020_0001
[0068] The yR design approach provides an unvarying reaction site with regard to both (a) distance between reactants and (b) sequence environment surrounding the reaction site. Furthermore, the intimate connection between the code and the BB on the oligo-BB moieties which are mixed combinatorially in a single pot confers a high fidelity to the encoding of the library. The code of the synthesized products, furthermore, is not preset, but rather is assembled combinatorially and synthesized in synchronicity with the innate product.
[0069] Recently, another assembled approach inspired by the yR and demonstrated on a model system was reported with an alternative linear DNA design. Cao C. et al.,“DNA-templated synthesis of encoded small molecules by DNA self-assembly,” Chem Commun 2014, 50, 10997- 10999, hereby incorporated by reference.
Methods of Screening Compound Libraries Homogeneous Screening of Split-and-Pool, YoctoReactor®, and Other DEL Libraries [0070] In a conventional affinity -based screen, the library of potentially functional compounds is exposed to the target protein immobilized on a surface matrix such as a cellulose bead. Active compounds will bind to the target whereas non-active compounds will remain in solution, in theory. After association, the non-binding library members are chromatograpically separated from the binding library members by a series of washes. By disrupting the ligand-target interaction the active compounds can be identified by enhancing the signal by PCR and the identity of the compounds identified by DNA sequencing. The method, although proven successful on many target classes, has limitations related to non-specific binding of compounds to the matrix, multivalent binding, imprecise control of target concentration and the possibility of target denaturation. Many of these limitations can be dealt with by extensive data processing and deconvolution.
[0071] To address these limitations McGregor et al. developed an advanced selection method called interaction-dependent PCR (IDPCR) relying on a proximity-dependent binding signal. McGregor, L. M., et al .,“Interaction dependent PCR: identification of ligand-target pairs from libraries of ligands and libraries of targets in a single solution-phase experiment,” J Am Chem Soc 2010, 132, 15522-15524, hereby incorporated by reference. Selections are run in solution to avoid matrix artifacts, and information on binding events is stored by combining the information of the DNA sequence on the ligand with a DNA sequence on the target.
Binder Trap Enrichment® (BTE) and Other Emulsion-Based Screening Methods
[0072] A homogeneous method for screening YoctoReactor® libraries (yR) (and which is applicable to libraries generated by other means, such as those described herein) has been developed which uses water-in-oil emulsion technology to isolate individual ligand-target complexes. Called Binder Trap Enrichment® (BTE), it identifies ligands to a protein target by trapping binding pairs (DNA-labeled protein target and yR ligand) in emulsion droplets during dissociation dominated kinetics. Hansen, N. J. V. et al,“Fidelity by design: Yoctoreactor and binder trap enrichment for small-molecule DNA-encoded libraries and drug discovery,” Curr. Opin. Chem. Biol. 2015, 26, 62-71, hereby incorporated by reference. Once trapped, the target DNA and ligand DNA are joined by ligation, thus preserving the binding information. The rate of ligation is increased by proximity of the target and the ligand DNA. [0073] In some embodiments, the present invention provides a method of screening a DEL, comprising screening the DEL by an emulsion-based screening method such as BTE and wherein the target is a nucleic acid such as an RNA or fragment thereof. In some embodiments, BTE is performed as described in ETS 2017/0233726, hereby incorporated by reference.
[0074] In conventional emulsion-based (e.g., BTE) approaches, both the DEL and the target include DNA barcodes. BTE has thus far been limited to soluble proteins or fragments thereof conjugated to a DNA barcode. Several methods of conjugating the target to the DNA barcode have been developed which offer target-dependent versatility, but typically the well-established chemical process used for biotinylation of proteins can be applied. Biotinylation is efficient and tolerated by most proteins. In some embodiments, NHS-ester conjugation to lysine and maleimide conjugation to cysteine, or variations thereof, are used to conjugate the small molecule or target to its barcode.
[0075] The steps of BTE screening are shown in FIG. 3. During BTE screening, a DEL mixed with the DNA labeled target is allowed to reach equilibrium in solution where the target concentration can be controlled. A rapid dilution is then performed, during which the binding kinetics become dominated by dissociation. This is then followed by a rapid emulsion formation which traps the bound ligands with the target within aqueous emulsion droplets. By ensuring a great excess in droplet number over the number of target molecules, bound ligands are trapped with the target consistently whereas non-bound molecules are only trapped with the target by chance. In the emulsion, the partitions (e.g., droplets) typically contain on average at most one oligonucleotide sequence per partition. This frequency of distribution at a given sequence dilution follows a Poisson distribution. Thus, in some embodiments, about 6%, 10%, 18%, 20%, 30%, 36%, 40%, or 50% of the droplets or partitions comprise one or fewer oligonucleotide sequences. In some embodiments, more than about 6%, 10%, 18%, 20%, 30%, 36%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or more of the droplets comprise one or fewer oligonucleotide sequences. In some embodiments, less than about 6%, 10%, 18%, 20%, 30%, 36%, 40%, or 50% of the droplets may comprise one or fewer oligonucleotide sequences. In some embodiments, the number of droplets (compartments) is at least 2, 3, 4, 5, 10, 100, 1,000, or 10,000 times greater than the number of DEL library members. In some embodiments, the number of droplets is at least 102, 103, 104, 105, 106, 107, 108, or 109 times greater than the number of DEL library members. [0076] In the case of using emulsions as the compartmentalization system and in analogy with similar size distributions, the compartment volume distribution is modeled as a log-normal distribution, also called a Galton distribution. By assuming a log-normal distribution and performing measurements of the actual droplet sizes the expected value (mean) and the standard deviation can be calculated for a specific experiment. According to this distribution, 95% of the compartment volumes will be within L logarithmic units from the mean (log) volume, where L is 1.96 times the standard deviation of the log-volumes.
[0077] In some embodiments, the average compartments size, the variation, and the standard deviation are taken into account when analyzing the data.
[0078] In some embodiments, compartments with a volume larger than 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 times the average compartment size are removed from the experiment.
[0079] In some embodiments, compartments with a volume smaller than 1/100, 1/90, 1/80, 1/70, 1/60, 1/50, 1/40, 1/30, 1/20, 1/10, 1/9, 1/8, 1/7, 1/6, 1/5, 1/4, 1/3, or 1/2 times the average compartment size are removed from the experiment.
[0080] Several technologies may be employed to exclude compartments from the experiment based on the volume of the compartments, for example by FACS sorting, equilibrium centrifugation, filtration, or microfluidic systems, etc.
[0081] Once trapped, the target and the DEL DNA barcode are ligated inside the droplets, thus preserving the information of co-trapping. The emulsion is then disrupted, the material recovered and the DNA amplified by PCR. Methods of breaking emulsions of this type are known in the art; for example, centrifuging at 13,000 x g for 5 min at 25 °C. The oil phase is discarded, and residual mineral oil and surfactants are removed from the emulsion by performing the following extraction twice: adding 1 mL of water-saturated diethyl ether, vortexing, and disposing of the upper (solvent) phase.
[0082] Amplification of DNA codes for co-trapped species is assured as only the ligated DNA will be exponentially amplified in the PCR as each DNA fragment (target DNA tag and library member DNA) contributes a PCR priming site. The amplified DNA is then subjected to DNA sequencing and the DNA codes translated into compounds and counted.
[0083] It is also possible to perform more than one round of enrichment, for example as described in U.S. Patent 8,202,823, U.S. 7,928,21 1, U.S. Patent Application Publication No. US 2017/0233726, and Hansen, M. H., et al. ,“A yoctoliter-scale DNA reactor for small molecule evolution,” J Am Chem Soc 2009, 737, 1322-1327, hereby incorporated by reference.
[0084] Identification of hits after screening and ligation is essentially a counting exercise: information on binding events is deciphered by sequencing and counting the ligated DNA. Selective binders are counted with a much higher frequency than random binders. This is possible because random trapping of target and ligand is“diluted” by the high number of water droplets in the emulsion. Aqueous drops dispersed in oil act as compartments to trap binding complexes on a single target molecule basis, a method that allows screening of tens of millions of compounds with very low assay noise. Exploiting the information storage capacity of DNA and the power of next generation sequencing, information about the binding event is preserved in a unique DNA sequence, the DNA amplified in batch, sequenced, and decoded to reveal the families of compounds which may bind and modulate the target protein. The low noise and background signal characteristic of BTE is attributed to the“dilution” of the random signal, the lack of surface artifacts and the high fidelity of the yR library and screening method.
[0085] BTE mimics the non-equilibrium nature of in vivo ligand-target interactions and offers the unique possibility to screen for target specific ligands based on ligand-target residence time because the emulsion, which traps the binding complex, is formed during a dynamic dissociation phase.
[0086] Taken together, yR and BTE technologies allow vast drug-like small molecule libraries to be efficiently synthesized in a combinatorial fashion and screened for target binding in a single tube method. As described below, these technologies are compatible with an assay readout enabled by advances in next-generation sequencing technology. This approach has increasingly been applied as a viable technology for the identification of small-molecule modulators to protein targets and as precursors to drugs in the past decade.
[0087] However, application of yR and emulsion-based (e.g., BTE) screening technology has been limited to screening against protein targets to date. As described in detail below, it has now been surprisingly discovered that it is possible to screen small molecule DELs against nucleic acid targets such as RNA, and to form enriched libraries from which binding information can be derived. As one example, BTE followed by RT-PCR and next-generation sequencing will provide small molecule hits in the DEL that bind to the nucleic acid target.
Non-Emulsion-Based Screening Methods [0088] In some embodiments, screening of the DEL is performed using an emulsion-less proximity-enabled screening method. In some embodiments, the emulsion-less proximity-enabled screen is performed by the steps of: contacting a DEL with a target nucleic acid for a sufficient period of time and under conditions to allow the DEL and target nucleic acid to equilibrate and bind; performing a dilution; encoding or trapping information about binding of DEL library members to the target nucleic acid by, e.g., ligation or reverse transcription; and, optionally, decoding the results of the screen, for example by PCR. In some embodiments, the target nucleic acid is an RNA.
[0089] The high effective concentration of a DEL member and RNA that are bound together enables them to ligate efficiently, but unbound RNAs and DELs are too dilute to ligate efficiently.
Decoding of DNA-encoded chemical libraries
[0090] Following selection from DNA-encoded chemical libraries, the decoding strategy for the fast and efficient identification of the specific binding compounds is crucial for the further development of the DEL technology. A variety of decoding methods may be used in accordance with the present invention, including microarray-based methodology and high-throughput sequencing techniques. Alternatively, Sanger-based sequencing methods may be used.
[0091] For most sequencing applications, a sample such as a nucleic acid sample is processed prior to introduction to a sequencing machine. A sample may be processed, for example, by amplification or by attaching a unique identifier.
[0092] In some embodiments, a method of sequencing is used that does not rely on reverse transcription. For example, sensitive and highly multiplex methods to directly measure RNA sequence abundance without requiring reverse transcription are available for a number of biomedical applications, including high-throughput small molecule screening, pathogen transcript detection, and quantification of short/degraded RNAs. These methods include RNA Annealing, Selection and Ligation (RASL) assays, which are based on RNA template-dependent oligonucleotide probe ligation. See, e.g., Larman, H. B., et al, Nucleic Acids Research , 2014, 42(1), 9146-9157; and Li, H. et al., Current Protocols in Molecular Biology 4.13.1-4.13.9, April 2012, each of which is hereby incorporated by reference. RASL assays can use a DNA or RNA ligase, such as Rnl2, which can join a fully DNA donor probe to a 3 '-diribonucleotide-terminated acceptor probe with high efficiency on an RNA template strand. Rnl2-based RASL exhibits sub- femtomolar transcript detection sensitivity, and permits the rational tuning of probe signals for optimal analysis by massively parallel DNA sequencing (RASL-seq).
[0093] In some embodiments, RT-PCR is used. The PCR reagents may include any suitable PCR reagents. In some embodiments, dUTPs may be substituted for dTTPs during the primer extension or other amplification reactions, such that oligonucleotide products comprise uracil containing nucleotides rather than thymine containing nucleotides. This uracil-containing section of the universal sequence may later be used together with a polymerase that will not accept or process uracil-containing templates to mitigate undesired amplification products.
[0094] Amplification reagents may include a universal primer, universal primer binding site, sequencing primer, sequencing primer binding site, universal read primer, universal read binding site, or other primers compatible with a sequencing device, e.g., an Illumina sequencer, Ion Torrent sequencer, etc. The amplification reagents may include P5, non cleavable 5' acrydite-P5, a cleavable 5' acrydite-SS-P5, Rlc, Biotin Rlc, sequencing primer, read primer, P5-Universal, P5- U, 52-BioRl-rc, a random N-mer sequence, a universal read primer, etc. In some embodiments, a primer comprises a modified nucleotide, a locked nucleic acid (LNA), an LNA nucleotide, a uracil containing nucleotide, a nucleotide containing a non-native base, a blocker oligonucleotide, a blocked 3' end, or 3 ' ddCTP.
Microarray-based Sequencing
[0095] A DNA microarray is a device for high-throughput investigations widely used in molecular biology and in medicine. It consists of an arrayed series of microscopic spots (“features” or“locations”) containing few picomoles of oligonucleotides carrying a specific DNA sequence. This can be a short section of a gene or other DNA element that are used as probes to hybridize a DNA or RNA sample under suitable conditions. Probe-target hybridization is usually detected and quantified by fluorescence-based detection of fluorophore-labeled targets to determine relative abundance of the nucleic acid target sequences. Microarrays have been used for the successfully decoding of ESAC DNA-encoded libraries. See, e.g., Melkko, S., etal. ,“Encoded self-assembling chemical libraries,” Nat. Biotechnol. 2004, 22(5), 568-74; hereby incorporated by reference. The coding oligonucleotides representing the individual chemical compounds in the library are spotted and chemically linked onto the microarray slides, for example by using a BioChip Arrayer robot. Subsequently, the oligonucleotide tags of the binding compounds isolated from the selection are PCR amplified using a fluorescent primer and hybridized onto the DNA-microarray slide. Afterwards, microarrays are analyzed using a laser scan and spot intensities detected and quantified. The enrichment of the preferential binding compounds is revealed by comparing the intensity of the spots on the DNA-microarray slide before and after selection.
Decoding by high-throughput sequencing
[0096] Owing to the size of many DELs (for example, between 103 and 106 members, or more), a conventional Sanger sequencing-based decoding is unlikely to be usable in practice, due both to the high cost per base for the sequencing and to the tedious procedure involved. High-throughput sequencing technologies exploit strategies that parallelize the sequencing process, producing thousands or millions of sequence reads at once. Continuing advances in high-throughput sequencing make this an attractive approach to decoding DELs.
[0097] Sequencing may involve basic methods including Maxam-Gilbert sequencing and chain-termination methods, or de novo sequencing methods including shotgun sequencing and bridge PCR, or next-generation methods including polony sequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, HeliScope single molecule sequencing, SMRT® sequencing, and others.
Screening of Encoded Libraries and Preparation of Enriched Libraries
[0098] The present invention provides methods and kits for screening an encoded library against a nucleic acid target, such as a target RNA. The present invention further provides methods of producing enriched encoded libraries and processing samples from such libraries, as well as compositions comprising such enriched encoded libraries. In some embodiments, the encoded library is a DNA-encoded library (DEL) of small molecules. In some embodiments, the target RNA is selected from a naturally occurring RNA or chimera, homolog, isoform, mutant, fragment, or analog thereof such as those described in detail herein. In some embodiments, the target RNA is associated with or implicated in a disease, such as those diseases described herein. In some embodiments, the target RNA is one of those listed in Table 1, 2, 3, or 4 herein.
[0099] In some embodiments, the DEL is prepared using a split-and-pool, DNA-recording, or YoctoReactor® (yR) method of combinatorial synthesis. In some embodiments, the DEL is screened using Binder Trap Enrichment® (BTE) or emulsion-free, proximity-enhanced ligation conditions.
[00100] ETnlike conventional screening methods, the present invention allows screening of the target without the need to conjugate the target to a DNA label (barcode). Accordingly, in some embodiments, the target is a nucleic acid that is not conjugated to, or does not comprise as part of its sequence, a non-natural barcode for use in identifying which DEL library members bind to the target during a screen. This is because, in certain embodiments of the invention, the nucleic acid target acts as its own barcode. In such cases, the nucleic acid target comprises an identifier sequence that allows the nucleic acid to be identified during library decoding, for example by sequencing. In some embodiments, the present invention provides a proximity-driven (ligand/target interaction-driven) method to ligate the nucleic acid target to a bound small molecule DEL member. In some embodiments, the identifier sequence is an RNA nucleotide sequence of a naturally occurring RNA, or a chimera of two or more naturally occurring RNAs, or a homolog, isoform, fragment, or analog thereof. In some embodiments, the identifier sequence is an RNA nucleotide sequence associated with or implicated in a disease, such as those diseases and associated RNAs described herein. In some embodiments, the identifier sequence is at least a portion of one of those listed in Table 1, 2, 3, or 4 herein.
[00101] In one aspect, the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
(i) a DNA barcode allowing identification of each library member;
(ii) a small molecule covalently conjugated to the DNA barcode; and
(iii) a nucleic acid target ligated to the DNA barcode.
[00102] In one aspect, the present invention provides a partially double-stranded, hybrid RNA- DNA ligation product comprising:
(i) an RNA strand comprising at least a portion of a biologically relevant target RNA; and
(ii) an at least partially double- stranded synthetic DNA molecule comprising a DNA strand that is at least partially hybridized to a single-stranded DNA oligonucleic acid and wherein the DNA oligonucleic acid comprises a 3 '- or 5'-overhang of 1-20 nucleotides; wherein the RNA strand and the DNA strand have been ligated between the 3 '-end of the
RNA strand and the 5 '-end of the DNA strand, or the 5 '-end of the RNA strand and the 3 '-end of the DNA strand, to form a contiguous sequence. In some embodiments, the biologically relevant target RNA is an RNA implicated in or a cause of a disease or disorder, such as one of those recited in Table 1, 2, 3, or 4.
[00103] In one aspect, the present invention provides a method of ligating an RNA strand to a DNA strand, comprising:
(i) providing a partially double-stranded DNA molecule having a 3 '-overhang comprising at least one nucleotide, wherein the 3 '-overhang has at least partial sequence complementarity to the 3 '-end of the RNA strand; and
(ii) contacting the RNA strand and the partially double- stranded DNA molecule with at least one ligase, wherein the at least one ligase catalyzes ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand;
thereby producing a ligated product comprising the RNA strand ligated to the DNA strand.
[00104] In some embodiments, the partially double-stranded DNA molecule is produced by ligating together shorter fragments of dsDNA; or is produced by the steps of:
(i) ligating together single-stranded DNA fragments; and
(ii) performing primer extension.
[00105] In some embodiments, the partially double-stranded DNA molecule is produced by ligating together shorter fragments of dsDNA; or is produced by the steps of:
(i) ligating together single-stranded DNA fragments;
(ii) performing primer extension; and
(iii) ligating to the DNA strand or appending by chemical synthesis a single stranded DNA sequence having at least partial sequence complementarity to the 3 '-end of the RNA strand.
[00106] In some embodiments, the partially double-stranded DNA molecule is produced by hybridizing the DNA strand to a single-stranded DNA splint having at least partial sequence complementarity to the DNA strand.
[00107] In some embodiments, step (ii) further comprises contacting the RNA strand with a helper oligonucleic acid, wherein the helper oligonucleic acid has at least partial sequence complementarity to a region of the RNA strand adjacent to the portion of the RNA strand that has sequence complementarity to the 3 '-overhang.
[00108] In some embodiments, the helper oligonucleic acid is an oligonucleotide.
[00109] In some embodiments, the helper oligonucleotide is DNA, RNA, or LNA.
[00110] In some embodiments, the helper oligonucleotide further comprises one or more nucleotide modifications selected from: (i) a sugar modification selected from 2'-OMe, 2'-F, 2',2'-difluoro, 2 '-Me, 2'- methoxyethyl, 2'-propyl, or replacement of a ribose or deoxyribose with an arabinose sugar, a BNA (Bridged Nucleic Acid) sugar, an LNA (Locked Nucleic Acid) sugar, or an ENA (2'-0,4'-C-ethylene-bridged nucleic acid) sugar;
(ii) a base modification selected from pseudouracil, 2-methyladenine, 2,6- diaminopurine, 2-C1 adenine, 2-F adenine, 5-azauracil, 5-azacytidine, N2- methylguanine, N7-methyl guanine, N6-methyladenine, or a C-nucleobase (7- deazapurine or l-deazapyrimidine); or
(iii) a nucleoside modification selected from 2'-Deoxypseudouridine, 2'-Deoxyuridine,
2-Thiothymidine, 4-Thio-2'-deoxyuridine, 4-Thiothymidine, 5' Aminothymidine, 5-( 1 -Pyrenylethynyl)-2'-deoxyuridine, 5-(Carboxy)vinyl-2'-deoxyuridine, 5,6- Dihydro-2'-deoxyuridine, 5-Bromo-2'-deoxycytidine, 5-Bromo-2'-deoxyuridine, 5-Carboxy-2'-deoxycytidine, 5-Fluoro-2'-deoxyuridine, 5-Formyl -2'- deoxycytidine, 5 -Hydroxy-2 '-deoxycyti dine, 5-Hydroxy-2'-deoxyuridine, 5- Hydroxym ethyl -2'-deoxycyti dine, 5-Hydroxymethyl -2 '-deoxyuri dine, 5-
Hydroxybutynl-2 '-deoxyuri dine, 5 -Iodo-2 '-deoxycyti dine, 5 -Iodo-2 '-deoxyuridine, 5-Methyl-2 '-deoxycyti dine, 5-Methyl-2'-deoxyisocytidine, 5-Propynyl-2'- deoxycytidine, 5-Propynyl-2'-deoxyuridine, 2-Aminopurine-2'-deoxyriboside, 6- Thio-2'-deoxyguanosine, 7-Deaza-2'-deoxyguanosine, 7-Deaza-2'- deoxyxanthosine, 7-Deaza-8-aza-2'-deoxyadenosine, 2,6-diaminopurine-riboside, 2-Aminopurine-riboside, Pseudouridine, Puromycin, Pyrrol ocyti dine, 2,6- diaminopurine-2'-0-methylriboside, N6-diaminopurine-riboside, 2-aminopurine- 2'-0-methylriboside, 2'-0-Methylinosine, 3-Deaza-5-Aza-2'-0-methylcytidine, 5- Bromo-2'-0-methyluridine, 5-Fluoro-2'-0-Methyluridine, 5-Fluoro-4-0-TMP-2'- O-Methyluridine, 5-Methyl-2'-0-Methylcytidine, 5-Methyl-2'-0-
Methylthymidine, 2' Deoxynebularine, 2'-Deoxyinosine, 2'-Deoxyisoguanosine, 3- Nitropyrrole-2 '-deoxyribose, 5-Nitroindole-2'-deoxyriboside, pyridin-4-one ribonucleoside, 5-aza-uridine, 2-thio-5-aza-uridine, 2-thiouridine, 4-thio- pseudouridine, 2-thio-pseudouridine, 5-hydroxyuridine, 3-methyluridine, 5- carboxymethyl-uridine, l-carboxym ethyl-pseudouridine, 5-propynyl -uridine, 1- propynyl-pseudouridine, 5-taurinomethyluridine, 1 -taurinomethyl-pseudouridine, 5-taurinomethyl-2-thio-uridine, 1 -taurinomethyl-4-thio-uridine, 5-methyl -uridine, 1 -methylpseudouridine, 4-thio- 1 -methyl-pseudouridine, 2-thio- 1 -methyl- pseudouridine, 1 -methyl- l-deaza-pseudouri dine, 2-thio- 1 -methyl- l-deaza- pseudouridine, dihydrouridine, dihydropseudouridine, 2-thio-dihydrouridine, 2- thio-dihydropseudouridine, 2-methoxyuridine, 2-methoxy-4-thio-uridine, 4- methoxy-pseudouridine, 4-methoxy-2-thio-pseudouridine, 5-aza-cytidine, pseudoisocytidine, 3-methyl-cytidine, N4-acetylcytidine, 5-formylcytidine, N4- methylcytidine, 5 -hydroxymethyl cyti dine, 1 -methyl -pseudoisocytidine, pyrrolo- cytidine, pyrrolo-pseudoisocytidine, 2-thio-cytidine, 2-thio-5-methyl-cytidine, 4- thio-pseudoisocytidine, 4-thio- 1 -methyl -pseudoisocytidine, 4-thio- 1 -methyl- 1 - deaza-pseudoisocytidine, l-methyl-l-deaza-pseudoisocytidine, zebularine, 5-aza- zebularine, 5-methyl-zebularine, 5-aza-2-thio-zebularine, 2-thio-zebularine, 2- methoxy-cytidine, 2-methoxy-5-methyl-cytidine, 4-methoxy-pseudoisocytidine, 4- methoxy-l -methyl -pseudoisocytidine, 2-aminopurine, 2, 6-diaminopurine, 7- deaza-adeninosine, 7-deaza-deoxyadenosine, 7-deaza-8-aza-adenine, 7-deaza-2- aminopurine, 7-deaza-8-aza-2-aminopurine, 7-deaza-2, 6-diaminopurine, 7-deaza- 8-aza-2, 6-diaminopurine, l-methyladenosine, N6-methyladenosine, N6- isopentenyladenosine, N6-(cis-hydroxyisopentenyl)adenosine, 2-methylthio-N6- (cis-hydroxyisopentenyl) adenosine, N6-glycinylcarbamoyladenosine, N6- threonylcarbamoyladenosine, 2-methylthio-N6-threonyl carbamoyladenosine, N6,N6-dimethyladenosine, 7-methyl adenine, 2-methylthio-adenine, 2-methoxy- adenine, inosine, l-methyl-inosine, wyosine, wybutosine, 7-deaza-guanosine, 7- deaza-8-aza-guanosine, 6-thio-guanosine, 6-thio-7-deaza-guanosine, 6-thio-7- deaza-8-aza-guanosine, 7-methyl-guanosine, 6-thio-7-methyl-guanosine, 7- methylinosine, 6-methoxy-guanosine, 1 -methyl guanosine, N2-methylguanosine, N2,N2-dimethylguanosine, 8-oxo-guanosine, 7-methyl-8-oxo-guanosine, 1- methyl-6-thio-guanosine, N2-methyl-6-thio-guanosine, 2'-3'-dideoxyadenine, 2'- 3 '-dideoxycytidine, 2'-3'-dideoxyguanosine, 2'-3'-dideoxythymidine, inverted thymidine, or N2,N2-dimethyl-6-thio-guanosine.
[00111] In some embodiments, the helper oligonucleic acid further comprises a modificationo the phosphate backbone selected from boranophosphate, methylphosphonate, P-ethoxy, phosphonoacetate, phosphorothioate, or phosphorodithioate.
[00112] In some embodiments, the helper oligonucleic acid is PNA or a morpholino oligomer.
[00113] In some embodiments, the at least one ligase catalyzes ligation of the 3 '-overhang to the 5 '-end of the helper oligonucleic acid.
[00114] In some embodiments, a first ligase and a second ligase are used in step (ii); the first ligase catalyzes ligation of the 3 '-overhang and the 5 '-end of the helper oligonucleic acid; and the second ligase catalyzes ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand.
[00115] In some embodiments, the helper oligonucleic acid hybridizes to the RNA strand and facilitates the ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand.
[00116] In some embodiments, the method further comprises, before or after step (i), hybridizing the helper oligonucleic acid to the RNA strand under appropriate conditions to effect the hybridization.
[00117] In some embodiments, the DNA strand comprises at the 5 '-end a phosphate or analog thereof capable of participating in ligation.
[00118] In some embodiments, the RNA strand comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
[00119] In some embodiments, the single- stranded DNA splint comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
[00120] In some embodiments, the helper oligonucleotide comprises at the 5 '-end a phosphate or analog thereof capable of participating in ligation.
[00121] In some embodiments, the phosphate or analog thereof capable of participating in ligation has been added by chemical synthesis.
[00122] In some embodiments, the phosphate or analog thereof capable of participating in ligation is a phosphate group.
[00123] In some embodiments, the phosphate group has been added from phosphorylation by a kinase.
[00124] In some embodiments, the phosphorylation is performed before step (ii) is performed.
[00125] In some embodiments, the kinase is allowed to contact the DNA strand during step (ii).
[00126] In some embodiments, the phosphate or analog thereof capable of participating in ligation is a 5 '-adenosine diphosphate group.
[00127] In some embodiments, the ligated product is a template for reverse transcription. [00128] In some embodiments, the ligated product comprises at least one binding site for a primer sequence for a reverse transcriptase.
[00129] In some embodiments, the ligated product is a template for PCR.
[00130] In some embodiments, the ligated product comprises at least one binding site for a primer for PCR.
[00131] In some embodiments, the method further comprises the step of ligating a single- stranded oligonucleic acid to the 5 '-end of the RNA strand, or to the DNA strand, thereby producing an extended ligated product.
[00132] In some embodiments, the extended ligated product is a template for reverse transcription.
[00133] In some embodiments, the extended ligated product comprises at least one binding site for a primer sequence for a reverse transcriptase.
[00134] In some embodiments, the extended ligated product is a template for PCR.
[00135] In some embodiments, the extended ligated product comprises at least one binding site for a primer for PCR.
[00136] In some embodiments, the partially double-stranded DNA molecule is a member of a DNA-encoded library (DEL).
[00137] In some embodiments, the partially double-stranded DNA molecule comprises a sequence that encodes the identity of a small molecule member of a DNA-encoded library (DEL).
[00138] In some embodiments, the RNA strand is 30-1,000 nucleotides in length.
[00139] In some embodiments, the 3 '-overhang is 1-20 nucleotides.
[00140] In some embodiments, the 3'-overhang is 2-10 nucleotides.
[00141] In some embodiments, the 3 '-overhang is 2, 3, 4, or 5 nucleotides.
[00142] In some embodiments, the 3 '-overhang is 2 or 3 nucleotides.
[00143] In some embodiments, the 3 '-overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
[00144] In some embodiments, the 3 '-overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
[00145] In some embodiments, the helper oligonucleic acid is at least 5 nucleotides in length.
[00146] In some embodiments, the helper oligonucleic acid is about 10 to about 75 nucleotides in length. [00147] In some embodiments, the helper oligonucleic acid is 10-50, 12-30, 14-25, 16-22, 17- 20, 18-19, or 18 nucleotides in length.
[00148] In some embodiments, the helper oligonucleic acid is about 10 to about 50, about 12 to about 30, about 14 to about 25, about 16 to about 22, about 17 to about 20, about 18 to about 19, or about 18 nucleotides in length.
[00149] In some embodiments, the helper oligonucleic acid has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
[00150] In some embodiments, the at least one ligase is selected from T4 RNA ligase 2, SplintR, ElectroLigase®, T4 DNA ligase, T3 DNA ligase, T4 RNA ligase 1, PBCV-l ligase, RtcB Ligase, bacteriophage TS2126 ligase, PBCV-l ligase, Taq DNA Ligase, HiFi Taq DNA Ligase, or 9°N™ DNA Ligase, CircLigase RNA ligase, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof.
[00151] In some embodiments, the at least one ligase is selected from T4 RNA ligase 2, SplintR, T4 DNA ligase, or T3 DNA ligase.
[00152] In some embodiments, the at least one ligase is T4 RNA ligase 2.
[00153] In some embodiments, the at least one ligase is SplintR.
[00154] In some embodiments, the at least one ligase is T4 DNA ligase or T3 DNA ligase.
[00155] In some embodiments, the first ligase is selected from T4 RNA ligase 2, SplintR, T4
DNA ligase, T3 DNA ligase, T4 RNA ligase 1, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof; and the second ligase is selected from T4 RNA ligase 2, SplintR, T4 DNA ligase, T4 RNA ligase I, RtcB Ligase, PBCV-l ligase, CircLigase RNA ligase, 5' App DNA/RNA ligase, or a truncated version thereof.
[00156] In some embodiments, the at least one ligase is a combination of T4 DNA ligase and SplintR.
[00157] In some embodiments, step (ii) further comprises adding a crowding agent such as a polyethylene glycol (PEG) (e.g., PEG4000), Ficoll, dextran, or albumin.
[00158] In some embodiments, step (ii) is performed at about 2-50 °C.
[00159] In some embodiments, step (ii) is performed at about 4, 12, 16, 22, or 37 °C. [00160] In some embodiments, step (ii) is performed in a reaction buffer comprising about 25- 300 mM salt.
[00161] In another aspect, the present invention provides a ligation product prepared by any one of the foregoing methods.
[00162] In another aspect, the present invention provides a composition comprising:
(i) an RNA strand comprising at least a portion of a biologically relevant target RNA; and
(ii) an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 3 '-overhang of 2-5 nucleotides.
[00163] In some embodiments, the 3 '-overhang has sequence complementarity to the 3 '-end of the RNA strand.
[00164] In some embodiments, the composition further comprises one or more ligases capable of ligating the RNA strand to the DNA molecule.
[00165] In another aspect, the present invention provides a partially double-stranded RNA- DNA ligation product comprising:
(i) an RNA strand comprising at least a portion of a biologically relevant target RNA, or a homolog, isoform, or analog thereof; and
(ii) an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 3 '-overhang of 2-5 nucleotides;
wherein the 3 '-end of the RNA strand and the 5 '-end of the DNA strand have been ligated to form a contiguous sequence.
[00166] In another aspect, the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
(i) a DNA barcode allowing identification of each library member;
(ii) a small molecule covalently conjugated to the DNA barcode; and
(iii) a nucleic acid target ligated to the DNA barcode.
[00167] In another aspect, the present invention provides a method of producing an enriched DNA-encoded library (DEL), comprising:
(i) providing a DEL of small molecules covalently conjugated to DNA barcodes; (ii) contacting the DEL with a nucleic acid target under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes;
(iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture;
(iv) contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
(v) ligating the DEL and the nucleic acid target of the at least one complex in the at least one aqueous emulsion droplet of step (iv) to form at least one ligated product;
(vi) optionally, disrupting the at least one aqueous emulsion droplet of step (v); and
(vii) optionally, isolating the at least one ligated product.
[00168] In another aspect, the present invention provides a method of producing an enriched
DNA-encoded library (DEL), comprising:
(i) providing a DEL of small molecules covalently conjugated to DNA barcodes;
(ii) contacting the DEL with a nucleic acid target under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes;
(iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture;
(iv) ligating the DEL and the nucleic acid target of the at least one complex to form at least one ligated product;
(v) optionally, isolating the at least one ligated product.
[00169] In some embodiments, the method is performed using an emulsion-free screen that employs proximity-based ligation.
[00170] In some embodiments, the high factor dilution of step (iii) is 1 :2 to 1 : 10,000.
[00171] In some embodiments, the time period of step (iii) is 1 minute to 48 hours.
[00172] In some embodiments, the temperature of step (iii) is 4 °C to 65 °C.
[00173] In some embodiments, the ligation is performed according to a method described above.
[00174] In some embodiments, step (vi) is performed by contacting with at least one reagent selected from a surfactant, an alcohol, or a halogenated hydrocarbon solvent.
[00175] In some embodiments, the aqueous emulsion preparation in step (iv) is a water-in-oil emulsion.
[00176] In some embodiments, the surfactant is an anionic surfactant, a cationic surfactant, a zwitterionic surfactant, or a nonionic surfactant.
[00177] In some embodiments, the surfactant is selected from Triton X-100, or Tween 80.
[00178] In some embodiments, the nucleic acid target is an RNA, or a homolog, isoform, chimera, fragment, mutant, or analog thereof.
[00179] In some embodiments, step (iv) is performed using Binder Trap Enrichment® (BTE).
[00180] In some embodiments, the compartmentalization in step (iv) creates more compartments than there are members of the DEL in the sample.
[00181] In some embodiments, the small molecules are covalently bound to their DNA barcodes by an amino-thiol linkage.
[00182] In another aspect, the present invention provides a method of processing a sample from an enriched DEL, comprising:
(i) providing an enriched DEL, wherein the DEL is enriched to bind a nucleic acid target;
(ii) performing a PCR amplification of the enriched DEL to form amplified products; and
(iii) sequencing the amplified products from step (ii) to produce a DEL library screen result.
[00183] In some embodiments, the method further comprises, before step (ii), ligating a single- stranded oligonucleic acid to the 5 '-end of the RNA strand of the enriched DEL or to the DNA strand of the enriched DEL.
[00184] In some embodiments, the method further comprises, before step (ii), contacting the enriched DEL with a reverse transcriptase (RT) to form an enriched DEL cDNA of the nucleic acid target.
[00185] In another aspect, the present invention provides a composition comprising a plurality of enriched DEL cDNAs produced according to the foregoing method. [00186] In some embodiments, the sequencing in step (iii) is selected from microarray-based sequencing or high-throughput sequencing.
[00187] In anothert aspect, the present invention provides a method of screening an encoded library against a nucleic acid target, comprising:
(i) providing a DNA-encoded library (DEL) of small molecules covalently conjugated to DNA barcodes;
(ii) contacting the DEL with a nucleic acid target under conditions selected to allow the small molecules to bind to the nucleic acid target;
(iii) performing an in vitro compartmentalization and/or high-factor dilution on at least a portion of the mixture from (ii);
(iv) performing a ligation of the nucleic acid target to a DNA barcode whose covalently conjugated small molecule has bound to the nucleic acid target, thus producing a ligated nucleic acid; and
(v) optionally, contacting the ligated nucleic acid with a reverse transcriptase (RT) under conditions selected such that the RT synthesizes a complementary DNA strand to the nucleic acid target to produce a double-stranded, ligated nucleic acid.
[00188] In some embodiments, the method further comprises the step of:
(vi) performing a polymerase chain reaction (PCR) amplification of the double- stranded, ligated nucleic acid.
[00189] In some embodiments, the method further comprises the step of:
(vii) sequencing the mixture of products obtained from the PCR amplification in (vi) to produce a DEL library screen result.
[00190] In some embodiments, the nucleic acid target is an RNA, or a homolog, isoform, mutant, chimera, fragment, or analog thereof.
[00191] In some embodiments, the in vitro compartmentalization is an emulsion technique.
[00192] In some embodiments, the in vitro compartmentalization is Binder Trap Enrichment®
(BTE).
[00193] In some embodiments, the in vitro compartmentalization creates more compartments than there are members of the DEL in the sample.
[00194] In some embodiments, a high-factor dilution is performed instead of an in vitro compartmentalization technique, and the dilution enables proximity-based ligation. [00195] In some embodiments, the DNA barcode comprises a nucleotide overhang complementary to the 3 '-end of the nucleic acid target.
[00196] In some embodiments, the overhang is 1-20 nucleotides.
[00197] In some embodiments, the overhang is about 2-10, 2-7, 2-5, 2-4, 3-5, 3-4, 2-3, about 2, or 2 nucleotides.
[00198] In some embodiments, the overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
[00199] In some embodiments, the overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
[00200] In some embodiments, the small molecules are covalently bound to their DNA barcodes by an amino-thiol linkage.
[00201] In some embodiments, the method further comprises phosphorylating the 3' end of the nucleic acid target prior to the ligation of step (iv).
[00202] In some embodiments, step (iv) further comprises breaking up the compartments created by an in vitro compartmentalization.
[00203] In some embodiments, step (ii) is performed in an aqueous solution.
[00204] In some embodiments, the method further comprises purifying the ligated nucleic acid and/or double-stranded, ligated nucleic acid.
[00205] In another aspect, the present invention provides an enriched DEL cDNA produced by the methods described above.
[00206] In another aspect, the present invention provides a composition comprising a plurality of enriched DEL cDNA molecules that encode an enriched DEL, wherein the enriched DEL is produced according to the methods described above.
[00207] In another aspect, the present invention provides a method of performing a multiplexed DEL screen, comprising:
(i) providing a DNA-encoded library (DEL) of small molecules covalently conjugated to DNA barcodes;
(ii) contacting the DEL with a plurality of nucleic acid targets that have different sequences under conditions selected to allow the small molecules to bind to the nucleic acid targets; (iii) performing an in vitro compartmentalization and/or high-factor dilution on at least a portion of the mixture from (ii);
(iv) performing a ligation of at least one of the nucleic acid targets to a DNA barcode whose covalently conjugated small molecule has bound to the nucleic acid target, thus producing a ligated nucleic acid; and
optionally, contacting the ligated nucleic acid with a reverse transcriptase (RT) under conditions selected such that the RT synthesizes a complementary DNA strand to the nucleic acid target to produce a double-stranded, ligated nucleic acid.
[00208] In some embodiments, at least 10 nucleic acid targets of different sequence are screened in parallel.
[00209] In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 150, 350, 500, 1,000, 5,000, 10,000, 100,000, or 1,000,000 nucleic acid targets of different sequence are screened in parallel. In some embodiments, about 10-100, 10- 1,000, 10-100,000, or 10-1,000,000 nucleic acid targets of different sequence are screened in parallel.
[00210] In some embodiments, the ligation is performed according to an RNA-DNA ligation disclosed herein.
[00211] In another aspect, the present invention provides a kit for producing a ligated product comprising an RNA strand ligated to a DNA strand, comprising:
(i) an oligonucleic acid splint molecule;
(ii) at least one ligase;
(iii) a buffer comprising buffering molecule, a chloride salt of a divalent cation, and ATP;
(iv) optionally, a helper oligonucleotide; and
(v) directions for producing a ligated product.
[00212] In another aspect, the present invention provides a method of producing an enriched DEL cDNA of a nucleic acid target, comprising:
(i) contacting, with at least one ligase, a mixture of a single-stranded nucleic acid target and a partially double-stranded DNA molecule of a DEL having a 5 '-overhang comprising at least two nucleotides, wherein the 5 '-overhang has sequence complementarity to the 5 '-end of the nucleic acid target, wherein the at least one ligase catalyzes ligation of the 5 '-end of the nucleic acid target and the DNA molecule to form a template for reverse transcription;
(ii) contacting the template with a reverse transcriptase and oligonucleic acid primer having sequence complementarity to the 3 '-end of the nucleic acid target; and
(iii) incubating at a temperature for a time period sufficient to enable reverse transcription of the nucleic acid target into complementary DNA (cDNA);
(iv) thereby forming an enriched DEL cDNA of a nucleic acid target.
[00213] In some embodiments, the primer is about 10-30 nucleotides in length.
[00214] In some embodiments, the primer is about 15-25 nucleotides in length.
[00215] In some embodiments, the enriched DEL cDNA comprises at least one PCR primer binding site.
[00216] In some embodiments, the DNA molecule is a member of a DNA-encoded library (DEL).
[00217] In some embodiments, the DNA molecule comprises a sequence that encodes the identity of a small molecule member of a DNA-encoded library (DEL).
[00218] In some embodiments, the nucleic acid target is an RNA molecule that is about 30- 1,000 ribonucleotides in length.
[00219] In some embodiments, the 5'-overhang of step (i) is 2-5 nucleotides.
[00220] In some embodiments, the 5 '-overhang of step (i) is 2, 3, 4, or 5 nucleotides.
[00221] In some embodiments, the 5 '-overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
[00222] In some embodiments, the 5 '-overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
[00223] In some embodiments, the DNA strand comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
[00224] In some embodiments, the RNA strand comprises at the 5 '-end a phosphate or analog thereof capable of participating in ligation.
[00225] In some embodiments, the phosphate or analog thereof capable of participating in ligation has been added by chemical synthesis.
[00226] In some embodiments, the phosphate or analog thereof capable of participating in ligation is a phosphate group. [00227] In some embodiments, the phosphate group has been added from phosphorylation by a kinase.
[00228] In some embodiments, the phosphorylation is performed before step (ii) is performed.
[00229] In some embodiments, the kinase is allowed to contact the DNA strand during step (ii).
[00230] In some embodiments, the phosphate or analog thereof capable of participating in ligation is a 5 '-adenosine diphosphate group.
[00231] In some embodiments, the at least one ligase is selected from T4 RNA ligase 2, SplintR, ElectroLigase®, T4 DNA ligase, T3 DNA ligase, T4 RNA ligase 1, PBCV-l ligase, RtcB Ligase, bacteriophage TS2126 ligase, PBCV-l ligase, Taq DNA Ligase, HiFi Taq DNA Ligase, or 9°N™ DNA Ligase, CircLigase RNA ligase, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof.
[00232] In some embodiments, the at least one ligase is selected from T4 DNA ligase, T3 DNA ligase, T7 DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or T4 DNA ligase, T4 RNA ligase 2, or SplintR.
[00233] In some embodiments, step (ii) further comprises crowding agents such as polyethylene glycol (PEG) (e.g., PEG4000), Ficoll, dextran, or albumin.
[00234] In some embodiments, step (ii) is performed at 2-50 °C.
[00235] In some embodiments, step (ii) is performed at 4, 12, 16, 22, or 37 °C.
[00236] In some embodiments, step (ii) is performed in a reaction buffer comprising 25-300 mM of a dissolved salt.
[00237] In another aspect, the present invention provides an RNA/DNA hybrid, prepared by any one of the methods described above.
[00238] In some embodiments, the temperature in step (iv) is 4-60°C.
[00239] In some embodiments, the temperature in step (iv) is 4, 10, 15, 20, 35, 30, 35, 40, 42,
45, 50, or 55°C.
[00240] In some embodiments, the time period in step (iv) is at least about 5 minutes.
[00241] In some embodiments, the time period in step (iv) is at least 10 minutes, at least 30 minutes, at least 1 hour, at least 2 hours, at least 10 hours, at least 24 hours, or at least 48 hours.
[00242] In some embodiments, the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
[00243] In some embodiments, the reverse transcriptase is Superscript III.
[00244] In some embodiments, step (iii) further comprises contacting with a RNase inhibitor.
[00245] In some embodiments, the RNase inhibitor is selected from SUPERase-In, RNaseOUT, or RNAsecure.
[00246] In some embodiments, the RNase inhibitor is SUPERase-In.
[00247] In some embodiments, the method further comprises:
(v) incubating for a time period at a temperature sufficient to heat inactivate the reverse transcriptase.
[00248] In some embodiments, the temperature in step (v) is at least 60°C.
[00249] In some embodiments, the temperature in step (v) is about 75°C.
[00250] In some embodiments, the time period in step (v) is at least 5 minutes.
[00251] In some embodiments, the time period in step (v) is at least 15 minutes.
[00252] In another aspect, the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
(i) a DNA barcode allowing identification of each library member;
(ii) a small molecule covalently conjugated to the DNA barcode;
(iii) a nucleic acid target ligated to the DNA barcode; and
(iv) an enriched DEL cDNA of the nucleic acid target hybridized to the nucleic acid target.
[00253] In another aspect, the present invention provides a partially double-stranded RNA- DNA ligation product comprising:
(i) an RNA strand comprising at least a portion of a biologically relevant target RNA, or a homolog, isoform, mutant, or analog thereof; and
(ii) an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 5 '-overhang of 2-5 nucleotides; wherein the 5 '-end of the RNA strand and the 3 '-end of the DNA strand have been ligated to form a contiguous sequence.
[00254] In another aspect, the present invention provides a method of producing an enriched DNA-encoded library (DEL), comprising: (i) providing a DEL of small molecules covalently conjugated to DNA barcodes, wherein each DNA barcode has a 5 '-overhang comprising at least one nucleotide;
(ii) contacting the DEL of step (i) with a nucleic acid target under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes, and wherein the 5 '-overhang has sequence complementarity to the 5 '-end of the nucleic acid target;
(iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
(iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
(v) ligating the DEL and the nucleic acid target of step (iii) to form a ligated complex;
(vi) contacting, with a reverse transcriptase (RT) and optionally with a primer sequence, the ligated complex and incubating at a temperature and for a time period sufficient to enable reverse transcription of the nucleic acid target into at least one complementary DNA (cDNA);
(vii) optionally, disrupting the at least one aqueous emulsion droplet of step (vi); and
(viii) optionally, isolating the ligated product, the cDNA, or a duplex thereof.
[00255] In some embodiments, steps (iv) and (vii) are omitted.
[00256] In some embodiments, reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
[00257] In another aspect, the present invention provides a method of performing a multiplexed DEL screen, comprising:
(i) providing a DEL of small molecules covalently conjugated to DNA barcodes, wherein each DNA barcode has a 5 '-overhang comprising at least one nucleotide;
(ii) contacting the DEL of step (i) with a plurality of nucleic acid targets of different sequences under conditions selected to allow binding between the small molecules and the nucleic acid targets to form a mixture of complexes, and wherein the 5'- overhang has sequence complementarity to the 5 '-end of the nucleic acid targets; (iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
(iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
(v) ligating the enriched mixture of bound complexes to form a ligated complex;
(vi) contacting, with a reverse transcriptase (RT) and optionally with a primer sequence, the ligated complex and incubating at a temperature and for a time period sufficient to enable reverse transcription of the nucleic acid target into at least one complementary DNA (cDNA);
(vii) optionally, disrupting the at least one aqueous emulsion droplet of step (vi); and
(viii) optionally, isolating the ligated product, the cDNA, or a duplex thereof.
[00258] In some embodiments, steps (iv) and (vii) are omitted.
[00259] In some embodiments, the ligation and/or reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
[00260] In some embodiments, at least 10 nucleic acid targets of different sequence are screened in parallel.
[00261] In some embodiments, the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
[00262] In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 150, 350, 500, 1,000, 5,000, 10,000, 100,000, or 1,000,000 nucleic acid targets of different sequence are screened in parallel. In some embodiments, about 10-100, 10- 1,000, 10-100,000, or 10-1,000,000 nucleic acid targets of different sequence are screened in parallel.
[00263] In some embodiments, the DEL comprises at least 1 x 103 library members. In some embodiments, the DEL comprises at least 1 x 104 library members. In some embodiments, the DEL comprises at least 1 x 105 library members. In some embodiments, the DEL comprises at least 1 x 106 library members. In some embodiments, the DEL comprises at least 1 x 107, 1 x 108, 1 x 109, 1 x 1010, 1 x 1011, or 1 x 1012 library members. In some embodiments, the DEL comprises from about 1 x 103 to about 1 x 1012 library members. In some embodiments, the DEL comprises from about 1 x 104 to about 1 x 1011 library members. In some embodiments, the DEL comprises from about 1 x 105 to about 1 x 1010 library members. In some embodiments, the DEL comprises from about 1 x 106 to about 1 x 109 library members. In some embodiments, the DEL comprises about 1 x 103, about 1 x 104, about 1 x 105, about 1 x 106, about 1 x 107, about 1 x 108, about 1 x 109, about 1 x 1010, about 1 x 1011, or about 1 x 1012 library members. In some embodiments, the DEL comprises approximately a number of library members shown in Table A above or in FIG. 3.
[00264] In some embodiments, the DEL comprises about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1,000 BB-oligos per position; or about 50-1,000, 50-900, 50-800, 50-700, 50-600, 50- 500, 50-400, 100-800, 100-600, 100-500, 100-300, 150-250, or 100-200, 200-300, 300-500, or 150-450 BB-oligos per position, for example per position in a YoctoReactor® used to prepare the DEL.
[00265] In some embodiments, an enriched library or partially double stranded, ligated nucleic acid prepared according to the presently described methods lacks a primer binding site. In some embodiments, a nucleic acid comprising a primer binding site is optionally ligated onto the nucleic acid target, such as at the 5 '-end. In some embodiments, the primer binding site is about 10-40, 10-30, 10-20, 20-40, 20-30, or 15-30 nucleotides. In some embodiments, the nucleic acid comprising a primer binding site is optionally ligated to cDNA produced after a binding screen and reverse transcription.
[00266] In some embodiments, the target nucleic acid comprises a primer binding site. In some embodiments, the primer binding site is present at or near the 3 '-end or the 5 '-end of the nucleic acid.
Enriched Libraries Prepared by Reverse Transcription of a Nucleic Acid Target Bound to a DEL
[00267] In one aspect, the present invention provides enriched libraries, compositions comprising such libraries, as well as methods of preparing such enriched libraries and processing samples of such enriched libraries, wherein reverse transcription of a bound-together library member-nucleic acid target (bound complex) is used to capture information about the binding event. In some embodiments, a ligation of the nucleic acid target to the DEL library member’s barcode is not used. In some embodiments, such a ligation is optionally included.
[00268] The present invention provides methods and kits for screening an encoded library against a nucleic acid target, such as a target RNA, wherein reverse transcription is used to capture information about the binding event. The present invention further provides methods of producing enriched encoded libraries and processing samples from such libraries, as well as compositions comprising such enriched encoded libraries, wherein reverse transcription is used to capture information about the binding event, e.g. capture binding information about which encoded compounds are hits (bind to the nucleic acid target). In some embodiments, the encoded library is a DNA-encoded library (DEL) of small molecules.
[00269] In one aspect, the present invention provides a method of producing an enriched DEL cDNA of a nucleic acid target, comprising:
(i) contacting, with a reverse transcriptase (RT), a mixture of a nucleic acid target and a partially double-stranded DNA molecule of a DEL having a 3 '-overhang comprising at least two nucleotides, wherein the 3 '-overhang has sequence complementarity to nucleotides at the 3 '-end of the nucleic acid target; and
(ii) incubating at a temperature and for a time period sufficient to enable reverse transcription of the nucleic acid target into complementary DNA (cDNA);
thereby forming an enriched DEL cDNA of a nucleic acid target.
[00270] In some embodiments, the DEL encodes a small molecule candidate compound and the nucleic acid target is a target to which the small molecule binds.
[00271] In some embodiments, the nucleic acid target is a target RNA.
[00272] In some embodiments, step (ii) is performed after a high-factor dilution that optionally comprises in vitro compartmentalization.
[00273] In some embodiments, the in vitro compartmentalization is an aqueous emulsion based technique.
[00274] In some embodiments, the in vitro compartmentalization is Binder Trap Enrichment® (BTE).
[00275] In some embodiments, the in vitro compartmentalization creates more compartments than there are members of the DEL in the sample.
[00276] In some embodiments, the 3'-overhang is 1-10 nucleotides. [00277] In some embodiments, the 3 '-overhang is 2, 3, 4, or 5 nucleotides.
[00278] In some embodiments, the 3 '-overhang has a guanosine-cytosine content (GC-content) that is 50% or greater, 60% or greater, 70% or greater, 80% or greater, 90% or greater, or 100%.
[00279] In some embodiments, the 3 '-overhang has at least 2, at least 3, at least 4, or at least 5 guanosine and/or cytosine nucleotides.
[00280] In some embodiments, the temperature is about 4-60 °C.
[00281] In some embodiments, the temperature is about 4, 10, 15, 20, 35, 30, 35, or 40 °C.
[00282] In some embodiments, the time period is at least 5 minutes.
[00283] In some embodiments, the time period is at least 10 minutes, at least 30 minutes, at least 1 hour, about 10 minutes to about 1 hour, about 5 minutes to 2 hours, about 5 minutes to 10 hours, about 10 minutes to 24 hours, or about 15 minutes to 48 hours.
[00284] In some embodiments, the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
[00285] In some embodiments, the reverse transcriptase is Superscript III.
[00286] In some embodiments, the method further comprises carrying out step (i) or (ii), or both, in the presence of an RNase inhibitor.
[00287] In some embodiments, the RNase inhibitor is selected from SUPERase-In, RNaseOUT, or RNAsecure.
[00288] In some embodiments, the RNase inhibitor is SUPERase-In.
[00289] In some embodiments, the method further comprises:
(iii) incubating for a time period at a temperature sufficient to heat inactivate the reverse transcriptase.
[00290] In some embodiments, the temperature in step (iii) is at least about 60 °C.
[00291] In some embodiments, the temperature in step (iii) is about 75 °C.
[00292] In some embodiments, the time period in step (iii) is at least 5 about minutes.
[00293] In some embodiments, the time period in step (iii) is at least about 15 minutes.
[00294] In another aspect, the present invention provides an enriched DNA-encoded library (DEL) comprising library members comprising:
(i) a DNA barcode allowing identification of each library member;
(ii) a 3' overhang of 1-10 nucleotides attached to the DNA barcode; (iii) a small molecule covalently conjugated to the DNA barcode;
(iv) a nucleic acid target hybridized to the 3' overhang; and
(v) an enriched DEL cDNA of the nucleic acid target hybridized to the nucleic acid target.
[00295] In another aspect, the present invention provides a method of producing an enriched DNA-encoded library (DEL), comprising:
(i) providing a DEL of small molecules covalently conjugated to DNA barcodes, wherein the DEL has a 3 '-overhang comprising at least two nucleotides;
(ii) contacting the DEL of step (i) with a nucleic acid target, wherein the 3 '-overhang has sequence complementarity to the 3 '-end of the nucleic acid target, under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes;
(iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
(iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
(v) contacting, with a reverse transcriptase (RT), at least one of the bound complexes and incubating at a temperature and for a time period sufficient to enable reverse transcription of the nucleic acid target into at least one complementary DNA (cDNA);
(vi) optionally, disrupting the at least one aqueous emulsion droplet of step (v); and
(vii) optionally, isolating the at least one cDNA.
[00296] In some embodiments, steps (iv) and (vi) are omitted.
[00297] In some embodiments, reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
[00298] In another aspect, the present invention provides a method of performing a multiplexed DEL screen, comprising: (i) providing a DEL of small molecules covalently conjugated to DNA barcodes, wherein the DEL has a 3 '-overhang comprising at least two nucleotides;
(ii) contacting the DEL of step (i) with a plurality of nucleic acid targets that have different sequences, wherein the 3 '-overhang has sequence complementarity to the 3 '-end of each nucleic acid target, under conditions selected to allow binding between the small molecules and the nucleic acid targets to form a mixture of complexes;
(iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture of bound complexes;
(iv) optionally, contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
(v) contacting, with a reverse transcriptase (RT), at least one of the bound complexes and incubating at a temperature and for a time period sufficient to enable reverse transcription of at least one of the nucleic acid targets into a complementary DNA (cDNA);
(vi) optionally, disrupting the at least one aqueous emulsion droplet of step (v); and
(vii) optionally, isolating the at least one cDNA.
[00299] In some embodiments, steps (iv) and (vi) are omitted.
[00300] In some embodiments, reverse transcription takes place at a higher rate for the bound complexes in comparison with the weakly-bound complexes.
[00301] In some embodiments, at least 10 nucleic acid targets of different sequence are screened in parallel.
[00302] In some embodiments, the reverse transcriptase is selected from Superscript II, Superscript III, Superscript IV, AMV, MMLV (M-MuLV), ProtoScript II, WarmStart RTx, HIV RT, HTLV, EpiScript, MonsterScript, GoScript, TGIRT, MarathonRT, or PyroScript.
[00303] In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 150, 350, 500, 1,000, 5,000, 10,000, 100,000, or 1,000,000 nucleic acid targets of different sequence are screened in parallel. In some embodiments, about 10-100, 10- 1,000, 10-100,000, or 10-1,000,000 nucleic acid targets of different sequence are screened in parallel.
[00304] In some embodiments, the nucleic acid target is a target RNA, e.g. an RNA selected from a naturally occurring RNA or chimera, homolog, isoform, mutant, fragment, or analog thereof such as those described in detail herein. In some embodiments, the target RNA is associated with or implicated in a disease, such as those diseases described herein. In some embodiments, the target RNA is one of those listed in Table 1, 2, 3, or 4 herein.
[00305] In some embodiments, the DEL is prepared using a split-and-pool, DNA-recording, or YoctoReactor® (yR) method of combinatorial synthesis. In some embodiments, the DEL is screened using Binder Trap Enrichment® (BTE) or emulsion-free, proximity-enhanced ligation conditions.
[00306] In some embodiments, the DEL comprises at least 1 x 103 library members. In some embodiments, the DEL comprises at least 1 x 104 library members. In some embodiments, the DEL comprises at least 1 x 105 library members. In some embodiments, the DEL comprises at least 1 x 106 library members. In some embodiments, the DEL comprises at least 1 x 107, 1 x 108, 1 x 109, 1 x 1010, 1 x 1011, or 1 x 1012 library members. In some embodiments, the DEL comprises from about 1 x 103 to about 1 x 1012 library members. In some embodiments, the DEL comprises from about 1 x 104 to about 1 x 1011 library members. In some embodiments, the DEL comprises from about 1 x 105 to about 1 x 1010 library members. In some embodiments, the DEL comprises from about 1 x 106 to about 1 x 109 library members. In some embodiments, the DEL comprises about 1 x 103, about 1 x 104, about 1 x 105, about 1 x 106, about 1 x 107, about 1 x 108, about 1 x 109, about 1 x 1010, about 1 x 1011, or about 1 x 1012 library members. In some embodiments, the DEL comprises approximately a number of library members shown in Table A above or in FIG. 3.
[00307] In some embodiments, the DEL comprises about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1,000 BB-oligos per position; or about 50-1,000, 50-900, 50-800, 50-700, 50-600, 50- 500, 50-400, 100-800, 100-600, 100-500, 100-300, 150-250, or 100-200, 200-300, 300-500, or 150-450 BB-oligos per position in a YoctoReactor® used to prepare the DEL.
[00308] In some embodiments, an enriched library or partially double stranded, ligated nucleic acid prepared according to the presently described methods lacks a primer binding site. In some embodiments, a nucleic acid comprising a primer binding site is optionally ligated onto the nucleic acid target, such as at the 5 '-end. In some embodiments, the primer binding site is about 10-40, 10-30, 10-20, 20-40, 20-30, or 15-30 nucleotides. In some embodiments, the nucleic acid comprising a primer binding site is optionally ligated to cDNA produced after a binding screen and reverse transcription.
[00309] In some embodiments, the target nucleic acid comprises a primer binding site. In some embodiments, the primer binding site is present at or near the 3 '-end or the 5 '-end of the nucleic acid.
Ligases
[00310] A variety of ligases may be used in the present invention. In some embodiments, the ligase is an RNA ligase capable of ligating an RNA strand to a DNA strand. In some embodiments, a combination of two or more ligases is used. In some embodiments, a DNA ligase is used. In some embodiments, an RNA ligase is used. In some embodiments, an RNA ligase is used in combination with a DNA ligase. In some embodiments, the RNA ligase ligates the RNA strand to the DNA strand, and the DNA ligase ligates the DNA splint to the helper oligonucleotide. When used in the context of a BTE method, the ligase is generally added or included in the assay mixture before the emulsion forming step or concurrently with it. In some embodiments, the ligase is added during the dilution step.
[00311] Methods of“tagging” RNA molecules by ligating them to a natural or modified nucleic acid are known in the art. US 2011/0104785, which is hereby incorporated by reference, describes methods of tagging RNA.
[00312] RNA Ligase 1 from bacteriophage T4-infected E. coli (T4 RNA Ligase 1) catalyzes the adenosine triphosphate (ATP)-dependent formation of a 3 ' to 5' phosphodiester bond between an RNA molecule with a 3 '-hydroxyl group (the acceptor molecule) and another molecule bearing a 5'-phosphoryl group (the donor molecule). The reaction occurs in three steps, involving covalent intermediates (see, e.g ., Silverman, S.“Practical and general synthesis of 5'-adenylated RNA (5'- AppRNA),” RNA 2004, 10, 731-746; England, T. E. et al,“Dinucleoside pyrophosphates are substrates for T4-induced RNA ligase,” Proc. Natl. Acad. Sci. USA, 1977, 74, 4839-4842.):
(i) T4 RNA Ligase 1 reacts with ATP to form a covalent enzyme-AMP intermediate (“adenylated enzyme”). (ii) The adenyl group is transferred from the adenylated enzyme to the 5 '-phosphoryl end of a RNA molecule, to form a 5',5'-phosphoanhydride bond (5'-App-RNA) with the elimination of adenosine monophosphate (AMP).
(iii) The 5 App-RNA donor reacts with the 3 '-hydroxyl group of another acceptor RNA molecule, in the absence of ATP, to form a standard 3' to 5' phosphodiester bond between the acceptor and donor RNA molecules.
[00313] In the first step, an adenosine 5 '-monophosphate (AMP) group is transferred from the cofactor NAD+ or ATP to a lysine residue in the adenylation motif KXDG (in single-letter amino- acid code where X denotes any amino acid) through a phosphoamide linkage. In the second step, the AMP group is transferred to the 5 '-phosphate at the nick through a pyrophosphate linkage to form a DNA- or RNA-adenylate intermediate (AppDNA or AppRNA, respectively). Finally, in the third step, a phosphodiester bond is formed to seal the nick and release AMP. DNA ligase is an essential component involved in various DNA transactions, including replication, repair and recombination. DNA ligases can be classified into two families based on adenylation cofactor dependence. ATP-dependent ligases are found in bacterial and eukaryotic viruses, Archaea, yeast, mammals and eubacteria. NAD+-dependent ligases are found almost exclusively in eubacteria with the exception of the sequenced entomopoxvirus genomes Melanoplus sanguinipes and Amsacta moorei. Some simple eubacteria genomes encode both NAD+- and ATP-dependent ligases, whereas many eukaryotic organisms encode multiple ATP-dependent ligases to fulfil diverse biological functions.
[00314] A chemically adenylated or pre-5 '-adenylated DNA or RNA molecule may also be used as the donor molecule in step (iii) above. This approach has proven useful in 3 '-ligation-tagging of RNA molecules. In one example of this method, a 5 '-adenylated donor oligonucleotide (5'- App-DNA) is ligated to the 3 ' end of a miRNA acceptor using T4 RNA Ligase 1 in the absence of ATP (Ebhardt, H. A. et al. ,“Extensive 3' modification of plant small RNAs is modulated by helper component-proteinase expression,” Proc. Natl. Acad. Sci. USA 2005, 102 , 13398-13403.). In another example of this method, the 5 '-adenylated donor oligonucleotide additionally contains a blocking group at its 3' end (5'-App-DNA-X), thereby preventing self-ligation of the donor oligonucleotide (Hafner, M. etal.,“Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing,” Methods 2008, 44, 3-12.); the reaction is catalyzed by T4 RNA Ligase 2. [00315] Such 5'-adenylated, 3'-blocked oligonucleotides are available commercially (US Patent Application 2009/0011422 and Vigneault, F. et al .,“Efficient microRNA capture and bar-coding via enzymatic oligonucleotide adenylation,” Nature Meth. 2008, 5, 777-779., hereby incorporated by reference). However, commercial synthesis of such 5'- and 3 '-modified oligonucleotides can be inefficient and expensive compared to standard (unmodified) oligonucleotides, especially when many oligonucleotides are required for applications such as preparing barcoded libraries for RNA- Seq (Vigneault, 2008).
[00316] Chemical and enzymatic methods are known in the art for 5'-adenylating oligonucleotides (e.g., see Vigneault, 2008). For large-scale production, chemical methods are more common.
[00317] In some embodiments, the ligase is selected from T7 DNA Ligase, Thermostable 5' AppDNA/RNA Ligase, T3 DNA Ligase, ElectroLigase®, T4 RNA Ligase 1, T4 RNA Ligase 2, truncated T4 RNA Ligase 2, a mixture comprising T4 RNA Ligase 1 and truncated T4 RNA Ligase 2, T4 RNA Ligase 2 (Truncated K227Q), T4 RNA Ligase 2 (Truncated KQ), RtcB Ligase, SplintR, bacteriophage TS2126 ligase, CircLigase I ssDNA Ligase, or CircLigase II ssDNA Ligase (EPICENTRE), E. coli DNA Ligase, Taq DNA Ligase, HiFi Taq DNA Ligase, or 9°N™ DNA Ligase.
[00318] In some embodiments, the ligase is a truncated ligase, such as a truncated version of any of the foregoing full length ligases. For example, Truncated T4 RNA Ligase 2 (T4 Rnl2tr) specifically ligates the pre-adenylated 5'-end of DNA or RNA to the 3 '-end of RNA. The enzyme does not require ATP for ligation but does need the pre-adenylated substrate. T4 Rnl2tr is expressed from a plasmid in E. coli which encodes the first 249 amino acids of the full length T4 RNA Ligase 2. Unlike the full length ligase, T4 Rnl2tr cannot ligate the phosphorylated 5 '-end of RNA or DNA to the 3 '-end of RNA. This enzyme, also known as Rnl2 (1-249) has been used for optimized linker ligation for the cloning of microRNAs. This enzyme reduces background ligation because it can only use pre-adenylated linkers.
Crowding Reagents
[00319] Crowding agents may be included to increase ligation efficiency or obtain other desired results. For example, in some embodiments a crowding agent such as a polymer is present during the ligation reaction at a concentration of about 1%, 2%, 5%, 8%, 10%, 12%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, or 75% w/w. In some embodiments, the crowding agent is present during the ligation reaction at a concentration of more than about 6%, 10%, 18%, 20%, 25%, 30%, 36%, 40%, 50% w/w or more. In some embodiments, the crowding agent is present during the ligation reaction at a concentration of less than about 6%, 10%, 18%, 20%, 30%, 36%, 40%, or 50% w/w.
[00320] In some embodiments, the crowding agent is a polymer or protein. In some embodiments, the crowding agent is a water-soluble or hydrophilic polymer, such as polyethylene glycol (PEG). In some embodiments, the crowding agent is selected from PEG4000, Ficoll, an albumin protein such as bovine serum albumin (BSA) or ovalbumin, hemoglobin, or dextran.
RNA Targets and Association with Diseases and Disorders
[00321] The vast majority of molecular targets that have been addressed therapeutically are proteins. However, it is now understood that a variety of RNA molecules play important regulatory roles in both healthy and diseased cells. While only 1-2% of the human genome codes for proteins, it is now known that the maj ority of the genome is transcribed (Caminci el al, Science 309: 1559-1563; 2005). Thus, the noncoding transcripts (the noncoding transcriptome) represent a large group of new therapeutic targets. Noncoding RNAs such as microRNA (miRNA) and long noncoding RNA (lncRNA) regulate transcription, splicing, mRNA stability/decay, and translation. In addition, the noncoding regions of mRNA such as the 5' untranslated regions (5' ETTR), the 3 ' ETTR, and introns can play regulatory roles in affecting mRNA expression levels, alternative splicing, translational efficiency, and mRNA and protein subcellular localization. RNA secondary and tertiary structures are critical for these regulatory activities. Thus, in some embodiments, the target RNA is a non-coding RNA or non-coding region of an RNA that includes both non-coding and coding regions.
[00322] In some embodiments, the target RNA is a coding RNA, such as an mRNA or coding region of an RNA that includes both non-coding and coding regions. Targeting mRNA allows modulation of downstream production of proteins. This provides a new approach to modulating the function of otherwise intractable protein targets as well as proteins that are capable of being targeted by conventional drug discovery methods (e.g., by small molecules or biologies). In some embodiments, the target RNA is an mRNA or the coding or non-coding region of an mRNA.
[00323] Remarkably, GW AS studies have shown that there are far more single nucleotide polymorphisms (SNPs) associated with human disease in the noncoding transcriptome relative to the coding transcripts (Maurano et al., Science 337: 1190-1195; 2012). Therefore, the therapeutic targeting of noncoding RNAs and noncoding regions of mRNA can yield novel agents to treat to previously intractable human diseases.
[00324] Current therapeutic approaches to interdict mRNA require methods such as gene therapy (Naldini, Nature 2015, 526, 351-360), genome editing (Cox et al., Nature Medicine 2015, 27, 121-131), or a wide range of oligonucleotide technologies (antisense, RNAi, etc.) (Bennett & Swayze, Annu. Rev. Pharmacol. Toxicol. 2010, 50, 259-293). Oligonucleotides modulate the action of RNA via canonical base/base hybridization. The appeal of this approach is that the basic pharmacophore of an oligonucleotide can be defined in a straightforward fashion from the sequence subject to interdiction. Each of these therapeutic modalities suffers from substantial technical, clinical, and regulatory challenges. Some limitations of oligonucleotides as therapeutics (e.g. antisense, RNAi) include unfavorable pharmacokinetics, lack of oral bioavailability, and lack of blood-brain-barrier penetration, with the latter precluding delivery to the brain or spinal cord after parenteral drug administration for the treatment of neurological diseases. In addition, oligonucleotides are not taken up effectively into solid tumors without a complex delivery system such as lipid nanoparticles. Lastly, a vast majority of the oligonucleotides that are taken up into cells and tissues remain in a non-functional compartment such as endosomes, and only a small fraction of the material escapes to gain access to the cytosol and/or nucleus where the target is located.
[00325] Small molecules can be optimized to exhibit excellent absorption from the gut, excellent distribution to target organs, and excellent cell penetration. The use of“conventional” (e.g.,“Lipinski-compliant” (Lipinski et al., Adv. Drug Deliv. Rev. 2001, 46, 3-26) small molecules with favorable drug properties that bind and modulate the activity of a target RNA would solve many of the problems noted above.
[00326] In one aspect, the present invention provides a method of identifying the identity or structure of a binding or active site to which a small molecule binds in a target RNA, comprising the steps of (i) contacting the target RNA with a disclosed small molecule DEL library member and (ii) capturing information about binding of the DEL library member to the target RNA by a method disclosed herein, optionally in combination with sequencing and/or a computational method to process the information about binding and thus identify hits. In some embodiments, the target RNA is selected from an mRNA or a noncoding RNA. In some embodiments, the target RNA is an aptamer or riboswitch. In some embodiments, the RNA is the FMN riboswitch, PreQi, or Aptamer 21. In some embodiments, the assay identifies the location in the primary sequence of the binding site(s) on the target RNA.
[00327] In some embodiments, the target RNA is a full-length transcript or may be a truncated version thereof. For example, in embodiments where the target is an mRNA, the polyA tail present in the full-length mRNA is optionally omitted from the target to simplify or streamline the encoded library screen. As another non-limiting example, if the target RNA contributes to or causes a triplet repeat expansion disease (TRED) such as a CAG repeat, the number of repeats may be reduced to make the screen simpler, streamlined, or more tractable, while still yielding useful binding data.
[00328] In some embodiments, the nucleic acid target is an analog of the corresponding naturally occuring nucleic acid target, e.g. target RNA. An“analog,” as used herein, includes a nucleic acid modified at one or more positions. Such modifications include, but are not limited to, replacing a nucleotide with a nucleotide analog, replacing a sugar with a modified sugar, replacing a nucleobase with a modified nucleobase, conjugating a fluorophore or reporter group, or the like.
[00329] In some embodiments, the nucleic acid target is a chimera, for example a chimeric sequence that combines portions of the sequences of two or more nucleic acid targets, such as two target RNAs.
[00330] In some embodiments, the nucleic acid target is a homolog or isoform of a naturally occurring nucleic acid target, such as a bacterial or murine homolog or isoform of a corresponding human RNA. This may be advantageous where the target of interest is too long, of unknown sequence, or not amenable to study in a model system or assay.
[00331] Furthermore, the nucleic acid target may be modified by appending primer binding sites, a fluorophore, a radioactive isotope, a pull-down group such as a hapten (e.g. fluorescein, biotin, digoxigenin, or dinitrophenol), or an artificial sequence. In some embodiments, a primer binding sequence is appended to the 3 '-end or 5 '-end of the nucleic acid target. In some embodiments, an oligonucleic acid region is appended to the 3 '-end or 5'-end of the nucleic acid target that is at least partially complementary to a 3 '-overhang (or 5 '-overhang, respectively) present in a DEL library member and/or at least partially complementary to the helper oligonucleic acid used in certain embodiments of the present invention.
[00332] In some embodiments, the nucleic acid target, such as a target RNA, is single-stranded. In some embodiments, the nucleic acid target is double-stranded or partially double-stranded. In some embodiments, the nucleic acid target is a pair of nucleic acids engaged in an interaction, such as an miRNA-mRNA hybridized (or partially hybridized) pair. In some embodiments, the nucleic acid target comprises one, two, or more miRNAs bound to an mRNA. In some embodiments, the nucleic acid target is an mRNA, miRNA, premiRNA, or a viral or fungal RNA.
[00333] In some embodiments, the nucleic acid target includes structural features such as at least some intramolecular base pairing, a junction (e.g., c/s or trans three-way junctions (3WJ)), quadruplex, hairpin, triplex, bulge loop, pseudoknot, or internal loop, etc., and any transient forms or structures adopted by the nucleic acid. In some embodiments, the nucleic acid target includes a bound protein, such as a chaperone, RNA-binding protein (RBP), or other nucleic acid-binding protein. In some embodiments, the assay conditions are selected such that the structure and/or structural dynamics of the nucleic acid target in the assay conditions match, as closely as possible, the native {in vivo) structure and/or structural dynamics of the nucleic acid target, at least during the step in which the small molecule DEL library member is allowed to bind to the target.
[00334] Nucleic acid targets, such as target RNAs, of various lengths are compatible with the present invention. For example, the target may be from 20-10,000 nucleotides in length. In some embodiments, the target is a relatively short sequence of, e.g., less than 250, less than 100, or less than 50 nucleotides in length. In some embodiments, the target is 100 or more nucleotides in length. In some embodiments, the target is 250 or more nucleotides in length. In some embodiments, the target is up to about 350, 450, 500, 600, 750, or 1,000, 2,000, 3,000, 4,000, 5,000, 7,500, 10,000, 15,000, 25,000, 50,000, or more than 50,000 nucleotides in length. In some embodiments, the target is between about 30 and about 500 nucleotides in length. In some embodiments, the target is between about 250 and about 1,000 nucleotides in length. In some embodiments, the target is between about 20-50, 30-60, 40-70, 50-80, 20-100, 30-100, 40-100, 50- 100, 20-200, 30-200, 40-200, 50-200, 20-300, 50-300, 75-300, 100-300, 20-400, 50-400, 100-400, 200-400, 20-500, 50-500, 100-500, 250-500, 20-750, 50-750, 100-750, 250-750, 500-750, 20- 1,000, 100-1,000, 250-1,000, 500-1,000, 20-2,000, 100-2,000, 500-2,000, 1,000-2,000, 20-5,000, 100-5,000, 1,000-5,000, 20-10,000, 100-10,000, 1,000-10,000, or 20-25,000 nucleotides in length.
[00335] Where the target or other referenced nucleic acicd is an RNA,“nucleotides” refers to ribonucleotides. Where the target or other referenced nucleic acid is DNA,“nucleotides” refers to 2'-deoxyribonucleotides. [00336] In some embodiments, the target is an RNA such as a pre-mRNA, pre-miRNA, or pretranscript. In some embodiments, the RNA is a non-coding RNA (ncRNA), messenger RNA (mRNA), micro-RNA (miRNA), a ribozyme, riboswitch, lncRNA, lincRNA, snoRNA, snRNA, scaRNA, piRNA, rRNA, ceRNA, or pseudo-gene, wherein each of the foregoing may be selected from a human or non-human RNA, such as viral RNA, fungal RNA, or bacterial RNA.
[00337] In some embodiments, one nucleic acid target is screened in a disclosed method. However, screening of more than one target is also contemplated. For example, in some embodiments 2, 3, 4, 5, 6, 7, 8, 9, 10, or more targets are screened at one time. In some embodiments, different or partially identical portions of a single nucleic acid target are screened at once. For example, nucleotides 1-50 of a hypothetical target comprising 300 nucleotides may form one target, nucleotides 10-60 may form a second target, nucleotides 40-100 may form a third target, and so on. Without wishing to be bound by theory, this might yield information about the influence of different portions of the sequence on a putative or known binding site on the full- length target. In some embodiments, the targets are different nucleic acids, e.g. they have little or no sequence homology and/or are from distict portions of the genome and/or have unrelated biological roles.
[00338] Multiplexing (parallel screening of multiple targets and small-molecule DEL compounds at once) is also contemplated as within the scope of the present invention. For example, in some embodiments a screen of a small molecule DEL is performed against 2-10, 2- 100, 2-1,000, 2-10,000, 2-100,000, or 2-1,000,000 different nucleic acid targets, which have only partial, minimal, or no sequence homology. In some embodiments, the different nucleic acid targets have some sequence homology, for example if they are nucleic acids of a similar function or group of functions. In other embodiments, the different nucleic acid targets have little or no sequence homology, for example if they are nucleic acids that have no particular relationship to one another. In some embodiments of multiplexing experiments, sequencing (e.g., next generation or massively parallel sequencing) and computational methods are used to derive target-binder association information due to the potentially enormous amount of information obtained. In some embodiments, the nucleic acid target is selected from a panel of natural or artificial RNAs such as those derived from in vitro transcription (IVT) or cell lysates, or may be an artificially generated library of nucleic acid targets, such as aptamers.
Definitions [00339] The terms“sample” and“biological sample” are used in their broadest sense and encompass samples or specimens obtained from any source, including biological and environmental sources. As used herein, the term“sample,” when used to refer to biological samples obtained from organisms, includes bodily fluids, isolated cells, fixed cells, cell lysates and the like. The organisms include bacteria, viruses, fungi, plants, animals, and humans. In some embodiments,“sample” refers to a mixture of encoded library compounds or other test compounds being screened for activity against a target RNA. The sample may be taken from any step along the process of screening the compounds, including the final step comprising isolated, PCR- amplified fragments encoding hits from a screen. However, these examples are not to be construed as limiting the types of samples or organisms that find use with the present invention.
[00340] As used herein, the term“incubating” and variants thereof mean contacting one or more components of a reaction with another component or components, under conditions and for sufficient time such that a desired reaction product is formed.
[00341] As used herein, a“nucleoside” refers to a molecule consisting of a guanine (G), adenine (A), thymine (T), uridine (U), or cytidine (C) base covalently linked to a pentose sugar, whereas “nucleotide” or“mononucleotide” refers to a nucleoside phosphorylated at one of the hydroxyl groups of the pentose sugar. “Nucleoside” also encompasses analogs of G, A, T, C, or U and natural or non-natural nucleic acid components wherein the base, sugar, and/or phosphate backbone have been modified or replaced. Nucleoside analogs are known in the art and include those described herein. Also included are endogenous, post-transcriptionally modified nucleosides, such as methylated nucleosides.
[00342] Linear nucleic acid molecules are said to have a“5' terminus” (5 '-end) and a“3 ' terminus” (3 '-end) because, except with respect to adenylation (as described elsewhere herein), mononucleotides are joined in one direction via a phosphodiester linkage (or analog thereof) to make oligonucleotides, in a manner such that a phosphate (or analog thereof) on the 5' carbon of one mononucleotide sugar is joined to an oxygen on the 3 ' carbon of the sugar of its neighboring mononucleotide. Therefore, an end of an oligonucleotide is referred to as the“5' end” if its 5' phosphate (or analog thereof) is not linked to the oxygen of the 3 ' carbon of a mononucleotide sugar, and as the“3' end” if its 3 ' oxygen is not linked to a 5' phosphate (or analog thereof) of a subsequent mononucleotide sugar. A“terminal nucleotide,” as used herein, is the nucleotide at the end position of the 3' or 5' terminus. The 3' or 5' terminus may alternatively end in a 3'-OH or 5 '-OH if the terminal nucleotide is not phosphorylated.
[00343] As used herein, the term“nucleic acid” refers to a covalently linked sequence of nucleotides in which the 3' position of the sugar of one nucleotide is joined by a phosphodiester bond to the 5' position of the sugar of the next nucleotide (i.e., a 3' to 5' phosphodiester bond), and in which the nucleotides are linked in specific sequence; i.e., a linear order of nucleotides. “Nucleic acid” includes analogs of the foregoing wherein one or more nucleotides are modified at the base, sugar, or phosphodiester. Such analogs are known in the art and include those described elsewhere herein. As used herein,“polynucleotide” or“polynucleic acid” refers to a long nucleic acid sequence (or analog thereof) of many nucleotides. For example, but without limitation, a polynucleotide (or polynucleic acid) may be greater than 60, 61-1,000, or 201-1,000, or greater than 1,000 nucleotides in length. As used herein, an“oligonucleotide” or“oligonucleic acid” is a short polynucleotide or a portion of a polynucleotide. For example, but without limitation, an oligonucleotide may be between 5-10, 10-60, or 10-200 nucleotides in length.
[00344] In some embodiments, a nucleic acid, oligonucleotide, or polynucleotide consists of, consists primarily of, or is mostly 2'-deoxyribonucleotides (DNA) or ribonucleotides (RNA). In some embodiments, an oligonucleotide consists of or comprises 2'-deoxyribonucleotides (DNA). In some embodiments, the oligonucleotide consists of or comprises ribonucleotides (RNA). In some embodiments, the oligonucleotide is a DNA-RNA hybrid, such as a DNA sequence of contiguous nucleotides linked to an RNA sequence of contiguous nucleotides, or with some regions of RNA and some regions of DNA.
[00345] The term“RNA” (ribonucleic acid) as used herein, means a naturally occurring or synthetic oligoribonucleotide or polyribonucleotide independent of source (e.g., the RNA may be produced by a human, animal, plant, virus, or bacterium, or may be synthetic in origin), biological context (e.g., the RNA may be in the nucleus, circulating in the blood, in vitro , cell lysate, or isolated or pure form), or physical form (e.g., the RNA may be in single-, double-, or triple- stranded form (including RNA-DNA hybrids), may include epigenetic modifications, native post- transcriptional modifications, artificial modifications (e.g., obtained by chemical or in vitro modification), or other modifications, may be bound to, e.g., metal ions, small molecules, protein chaperones, or co-factors, or may be in a denatured, partially denatured, or folded state including any native or unnatural secondary or tertiary structure such as junctions (e.g., cis or trans three- way junctions (3WJ)), quadruplexes, hairpins, triplexes, hairpins, bulge loops, pseudoknots, and internal loops, etc., and any transient forms or structures adopted by the RNA). In some embodiments, the RNA is 100 or more nucleotides in length. In some embodiments, the RNA is 250 or more nucleotides in length. In some embodiments, the RNA is 350, 450, 500, 600, 750, or 1,000, 2,000, 3,000, 4,000, 5,000, 7,500, 10,000, 15,000, 25,000, 50,000, or more ribonucleotides in length. In some embodiments, the RNA is between 250 and 1,000 ribonucleotides in length. In some embodiments, the RNA is a pre-RNA, pre-miRNA, or pretranscript. In some embodiments, the RNA is a non-coding RNA (ncRNA), messenger RNA (mRNA), micro-RNA (miRNA), a ribozyme, riboswitch, lncRNA, lincRNA, snoRNA, snRNA, scaRNA, piRNA, ceRNA, or pseudo- gene, wherein each of the foregoing may be selected from a human or non-human RNA, such as viral RNA, fungal RNA, or bacterial RNA. The term“target RNA” or“RNA target” as used herein means any type of RNA having a secondary or tertiary structure capable of binding a small- molecule ligand described herein. The target RNA may be inside a cell, in a cell lysate, or in isolated form prior to contacting the compound.
[00346] As used herein,“RNA ligase” means an enzyme that is capable of catalyzing the joining or ligating of an RNA acceptor molecule, which has a hydroxyl group on its 3 ' or 5' terminus, to an RNA or DNA donor molecule. As used herein,“DNA ligase” means an enzyme that is capable of catalyzing the joining or ligating of a DNA acceptor molecule, which has a hydroxyl group on its 3' or 5' terminus, to an RNA or DNA donor molecule. In some embodiments, the donor molecule has a 5' phosphate group on its 5' terminus and/or a 3' phosphate on its 3' terminus. The invention is not limited with respect to the RNA ligase, and any RNA ligase from any source can be used in an embodiment of the methods and kits of the present invention that is capable of effecting the required ligation reaction. For example, in some embodiments, the RNA ligase is a polypeptide (gp63) encoded by bacteriophage T4 gene 63; this enzyme, which is commonly referred to simply as“T4 RNA Ligase,” is more correctly now called“T4 RNA Ligase 1” since a second RNA ligase (gp24. l) that is encoded by bacteriophage T4 gene 24.1 is known, which is now called“T4 RNA Ligase 2” (Ho, C. K. and Shuman, S.,“Bacteriophage T4 RNA ligase 2 (gp24. l) exemplifies a family of RNA ligases found in all phylogenetic domains,” Proc. Natl. Acad. Sci. USA 2002, 99, 12709-12714, hereby incorporated by reference). Unless otherwise stated, when“T4 RNA Ligase” is used in the present specification,“T4 RNA Ligase 1” is meant. Also as defined herein,“truncated T4 RNA Ligase 2” refers to the T4 RNA Ligase 2 mutant containing the N-terminal residues 1-249, also known as Rnl2(l-249) (Ho, 2004).
[00347] Other truncated ligases or mutants thereof are known in the art and may be used in the present invention.
[00348] As used herein, a“single-strand ligase” is a DNA or RNA ligase enzyme that is active on single-stranded DNA or RNA molecules. In some embodiments, the ligase is a single-strand ligase.
[00349] In some embodiments, the ligase is a double-stranded ligase. In some embodiments, the ligase is T4 RNA ligase 2 (non-truncated), a dsRNA ligase, or SplintR, also a double-stranded ligase.
[00350] As used herein, the terms“buffer” or“buffering agents” refer to materials that, when added to a solution, cause the solution to resist changes in pH. As used herein, the term“reaction buffer” refers to a buffering solution in which an enzymatic or chemical reaction is performed.
[00351] The terms“isolated” or“purified” when used in relation to a polynucleotide or nucleic acid, as in“isolated RNA” or“purified RNA” refers to a nucleic acid that is identified and separated from at least one contaminant with which it is ordinarily associated in its source. Thus, an isolated or purified nucleic acid (e.g., DNA and RNA) is present in a form or setting different from that in which it is found in nature, or a form or setting different from that which existed prior to subjecting it to a treatment or purification method. For example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome together with other genes as well as structural and functional proteins, and a specific RNA (e.g., a specific mRNA encoding a specific protein), is found in the cell as a mixture with numerous other RNAs and other cellular components. The isolated or purified polynucleotide or nucleic acid may be present in single-stranded or double- stranded form.
[00352] As used herein, the terms“RNA-mediated” disorders, diseases, and/or conditions as used herein means any disease or other deleterious condition in which RNA, such as an overexpressed, underexpressed, mutant, misfolded, expanded, pathogenic, or oncogenic RNA, is known to play a role.
[00353] As used herein, the term“inhibitor” is defined as a compound that binds to and/or modulates or inhibits a target RNA with measurable affinity. In certain embodiments, an inhibitor has an IC50 and/or binding constant of less than about 100 mM, less than about 50 pM, less than about 1 mM, less than about 500 nM, less than about 100 nM, less than about 10 nM, or less than about 1 nM.
[00354] The terms“measurable affinity” and“measurably inhibit,” as used herein, mean a measurable change in a downstream biological effect between a sample comprising a compound of the present invention, or composition thereof, and a target RNA, and an equivalent sample comprising the target RNA, in the absence of said compound, or composition thereof.
[00355] “Modulating” the function of a target RNA includes enhancing or increasing the function of the RNA and decreasing or agonizing the function of the RNA.
Targeting mRNA
[00356] In some embodiments, the target RNA is an mRNA. In some embodiments, a provided small molecule binds to a coding region of the mRNA. In some embodiments, a provided small molecule binds to a non-coding region of the mRNA. Within mRNAs, noncoding regions can affect the level of mRNA and protein expression. Briefly, these include IRES and upstream open reading frames (uORF) that affect translation efficiency, intronic sequences that affect splicing efficiency and alternative splicing patterns, 3' UTR sequences that affect mRNA and protein localization, and elements that control mRNA decay and half-life. Therapeutic modulation of these RNA elements can have beneficial effects. Accordingly, in some embodiments, the target RNA is the 3 ' UTR or 5' UTR of an mRNA. Also, mRNAs may contain expansions of simple repeat sequences such as trinucleotide repeats. These repeat expansion containing RNAs can be toxic and have been observed to drive disease pathology, particularly in certain neurological and musculoskeletal diseases (see Gatchel & Zoghbi, Nature Rev. Gen. 2005, 6 , 743-755, hereby incorporated by reference). In addition, splicing can be modulated to skip exons having mutations that introduce stop codons in order to relieve premature termination during translation.
[00357] Small molecules can be used to modulate splicing of pre-mRNA for therapeutic benefit in a variety of settings. One example is spinal muscular atrophy (SMA). SMA is a consequence of insufficient amounts of the survival of motor neuron (SMN) protein. Humans have two versions of the SMN gene, SMN1 and SMN2. SMA patients have a mutated SMN1 gene and thus rely solely on SMN2 for their SMN protein. The SMN2 gene has a silent mutation in exon 7 that causes inefficient splicing such that exon 7 is skipped in the majority of mature SMN2 transcripts, leading to the generation of a defective protein that is rapidly degraded in cells, thus limiting the amount of SMN protein produced from this locus. A small molecule that promotes the efficient inclusion of exon 7 during the splicing of SMN2 transcripts would be an effective treatment for SMA (Palacino et al, Nature Chem. Biol., 2015, 11, 511-517, hereby incorporated by reference). Accordingly, in one aspect, the present invention provides a method of identifying a small molecule that modulates the splicing of a target pre-mRNA to treat a disease or disorder, comprising the steps of: screening one or more encoded small molecules for binding to the target pre-mRNA (with or without any RBPs that may normally bind to the target); and analyzing the results by an RNA binding assay disclosed herein. In some embodiments, the pre-mRNA is an SMN2 transcript. In some embodiments, the disease or disorder is spinal muscular atrophy (SMA).
[00358] Even in cases in which defective splicing does not cause the disease, alteration of splicing patterns can be used to correct the disease. Nonsense mutations leading to premature translational termination can be eliminated by exon skipping if the exon sequences are in-frame. This can create a protein that is at least partially functional. One example of the use of exon skipping is the dystrophin gene in Duchenne muscular dystrophy (DMD). A variety of different mutations leading to premature termination codons in DMD patients can be eliminated by exon skipping promoted by oligonucleotides (reviewed in Fairclough et al., Nature Rev. Gen., 2013, 14, 373-378, hereby incorporated by reference). Small molecules that bind RNA structures and affect splicing are expected to have a similar effect. Accordingly, in one aspect, the present invention provides a method of identifying a small molecule that modulates (up or down) the splicing pattern of a target pre-mRNA to treat a disease or disorder, comprising the steps of: screening one or more encoded small molecules for binding to the target pre-mRNA; and analyzing the results by an RNA binding assay disclosed herein. In some embodiments, the pre-mRNA is a dystrophin gene transcript. In some embodiments, the small molecule promotes exon skipping to eliminate premature translational termination. In some embodiments, the disease or disorder is Duchenne muscular dystrophy (DMD).
[00359] Lastly, the expression of an mRNA and its translation products could be affected by targeting noncoding sequences and structures in the 5' and 3 ' UTRs. For instance, RNA structures in the 5' UTR can affect translational efficiency. RNA structures such as hairpins in the 5' UTR have been shown to affect translation. In general, RNA structures are believed to play a critical role in translation of mRNA. Two examples of these are internal ribosome entry sites (IRES) and upstream open reading frames (uORF) that can affect the level of translation of the main open reading frame (Komar and Hatzoglou, Frontiers Oncol. 5:233, 2015; Weingarten-Gabbay et al, Science 35 l :pii:aad4939, 2016; Calvo et al, Proc. Natl. Acad. Sci. USA 106:7507-7512; Le Quesne et al, J. Pathol. 220: 140-151, 2010; Barbosa et al., PLOS Genetics 9:el0035529, 2013, hereby incorporated by reference). For example, nearly half of all human mRNAs have uORFs, and many of these reduce the translation of the main ORF. Small molecules targeting these RNAs could be used to modulate specific protein levels for therapeutic benefit. Accordingly, in one aspect, the present invention provides a method of producing a small molecule that modulates the expression or translation efficiency of a target pre-mRNA or mRNA to treat a disease or disorder, comprising the steps of: screening one or more encoded small molecules for binding to the target pre-mRNA or mRNA; and analyzing the results by an RNA binding assay disclosed herein. In some embodiments, the small molecule binding site is a 5' UTR, internal ribosome entry site, or upsteam open reading frame.
Targeting Regulatory or Non-Coding RNA
[00360] The largest set of RNA targets is RNA that is transcribed but not translated into protein, termed“non-coding RNA”. Non-coding RNA is highly conserved and the many varieties of non coding RNA play a wide range of regulatory functions. The term“non-coding RNA,” as used herein, includes but is not limited to micro-RNA (miRNA), long non-coding RNA (lncRNA), long intergenic non-coding RNA (lincRNA), Piwi-interacting RNA (piRNA), competing endogenous RNA (ceRNA), and pseudo-genes. Each of these sub-categories of non-coding RNA offers a large number of RNA targets with significant therapeutic potential. Accordingly, in some embodiments, the target RNA is a non-coding RNA. In some embodiments, the non-coding RNA is an miRNA, lncRNA, lincRNA, piRNA, ceRNA, or pseudo-gene.
[00361] miRNAs are short double-strand RNAs that regulate gene expression (see Elliott & Ladomery, Molecular Biology of RNA, 2nd Ed., hereby incorporated by reference). Each miRNA can affect the expression of many human genes. There are nearly 2,000 miRNAs in humans. These RNAs regulate many biological processes, including cell differentiation, cell fate, motility, survival, and function. miRNA expression levels vary between different tissues, cell types, and disease settings. They are frequently aberrantly expressed in tumors versus normal tissue, and their activity may play significant roles in cancer (for reviews, see Croce, Nature Rev. Genet. 10:704-714, 2009; Dykxhoom Cancer Res. 70:6401-6406, 2010, hereby incorporated by reference). miRNAs have been shown to regulate oncogenes and tumor suppressors and themselves can act as oncogenes or tumor suppressors. Some have been shown to promote epithelial-mesenchymal transition (EMT) and cancer cell invasiveness and metastasis. In the case of oncogenic miRNAs, their inhibition could be an effective anti-cancer treatment. In some embodiments, the target miRNA regulates an oncogene or tumor suppressor, or acts as an oncogene or tumor suppressor. In some embodiments, the disease is cancer. In some embodiments, the cancer is a solid tumor.
[00362] There are multiple oncogenic miRNA that could be therapeutically targeted including miR-l55, miR-l7~92, miR-l9, miR-2l, and miR-lOb (see Stahlhut & Slack, Genome Med. 2013, 5, 111, hereby incorporated by reference). miR-l55 plays pathological roles in inflammation, hypertension, heart failure, and cancer. In cancer, miR-l55 triggers oncogenic cascades and apoptosis resistance, as well as increasing cancer cell invasiveness. Altered expression of miR- 155 has been described in multiple cancers, reflecting staging, progress and treatment outcomes. Cancers in which miR-l55 over- expression has been reported are breast cancer, thyroid carcinoma, colon cancer, cervical cancer, and lung cancer. It is reported to play a role in drug resistance in breast cancer. miR-l7~92 (also called Oncomir-l) is a polycistronic 1 kb primary transcript comprising miR-l7, 20a, l8a, l9a, 92-1 and l9b-l . It is activated by MYC. miR-l9 alters the gene expression and signal transduction pathways in multiple hematopoietic cells, and it triggers leukemogenesis and lymphomagenesis. It is implicated in a wide variety of human solid tumors and hematological cancers. miR-2l is an oncogenic miRNA that reduces the expression of multiple tumor suppressors. It stimulates cancer cell invasion and is associated with a wide variety of human cancers including breast, ovarian, cervix, colon, lung, liver, brain, esophagus, prostate, pancreas, and thyroid cancers. Accordingly, in some embodiments, the target miRNA is selected from miR-l55, miR-l7~92, miR-l9, miR-2l, or miR-lOb. In some embodiments, the target miRNA mediates or is implicated in a cancer selected from breast cancer, ovarian cancer, cervical cancer, thyroid carcinoma, colon cancer, liver cancer, brain cancer, esophageal cancer, prostate cancer, lung cancer, leukemia, or lymph node cancer. In some embodiments, the cancer is a solid tumor.
[00363] Beyond oncology, miRNAs play roles in many other diseases including cardiovascular and metabolic diseases (Quiant and Olson, J Clin. Invest. 123 : 11-18, 2013; Olson, Science Trans. Med. 6: 239ps3, 2014; Baffy, J. Clin. Med. 4: 1977-1988, 2015, hereby incorporated by reference).
[00364] Many mature miRNAs are relatively short in length and thus may lack sufficient folded, three-dimensional structure to be targeted by small molecules. However, it is believed that the levels of such miRNA could be reduced by small molecules that bind the primary transcript or the pre-miRNA to block the biogenesis of the mature miRNA. Accordingly, in some embodiments of the methods described above, the target miRNA is a primary transcript or pre-miRNA.
[00365] lncRNA are RNAs of over 200 nucleotides (nt) that do not encode proteins (see Rinn & Chang, Ann. Rev. Biochem. 2012, 87, 145-166, hereby incorporated by reference; (for reviews, see Morris and Mattick, Nature Reviews Genetics 15:423-437, 2014; Mattick and Rinn, Nature Structural & Mol. Biol. 22:5-7, 2015; Iyer et al, Nature Genetics 47(: 199-208, 2015), hereby incorporated by reference). They can affect the expression of the protein-encoding mRNAs at the level of transcription, splicing and mRNA decay. Considerable research has shown that lncRNA can regulate transcription by recruiting epigenetic regulators that increase or decrease transcription by altering chromatin structure (e.g., Holoch and Moazed, Nature Reviews Genetics 16:71-84, 2015, hereby incorporated by reference). lncRNAs are associated with human diseases including cancer, inflammatory diseases, neurological diseases and cardiovascular disease (for instance, Presner and Chinnaiyan, Cancer Discovery 1 :391-407, 2011; Johnson, Neurobiology of Disease 46:245-254, 2012; Gutscher and Diederichs, RNA Biology 9:703-719, 2012; Kumar et al, PLOS Genetics 9:el00320l, 2013; van de Vondervoort et al, Frontiers in Molecular Neuroscience , 2013; Li et al, Int. J. Mol. Sci. 14: 18790-18808, 2013, hereby incorporated by reference). The targeting of lncRNA could be done to up-regulate or down-regulate the expression of specific genes and proteins for therapeutic benefit (e.g., Wahlestedt, Nature Reviews Drug Discovery 12:433-446, 2013; Guil and Esteller, Nature Structural & Mol. Biol. 19: 1068-1075, 2012, hereby incorporated by reference). In general, lncRNA are expressed at a lower level relative to mRNAs. Many lncRNAs are physically associated with chromatin (Werner et al, Cell Reports 12, 1-10, 2015, hereby incorporated by reference) and are transcribed in close proximity to protein-encoding genes. They often remain physically associated at their site of transcription and act locally, in cis, to regulate the expression of a neighboring mRNA. The mutation and dysregulation of lncRNA is associated with human diseases; therefore, there are a multitude of lncRNAs that could be therapeutic targets. Accordingly, in some embodiments of the methods described above, the target non-coding RNA is a lncRNA. In some embodiments, the lncRNA is associated with a cancer, inflammatory disease, neurological disease, or cardiovascular disease.
[00366] lncRNAs regulate the expression of protein-encoding genes, acting at multiple different levels to affect transcription, alternative splicing and mRNA decay. For example, lncRNA has been shown to bind to the epigenetic regulator PRC2 to promote its recruitment to genes whose transcription is then repressed via chromatin modification. lncRNA may form complex structures that mediate their association with various regulatory proteins. A small molecule that binds to these lncRNA structures could be used to modulate the expression of genes that are normally regulated by an individual lncRNA.
[00367] One examplary target lncRNA is HOTAIR, a lncRNA expressed from the HoxC locus on human chromosome 12. Its expression level is low (-100 RNA copies per cell). Unlike many lncRNAs, HOTAIR can act in trans to affect the expression of distant genes. It binds the epigenetic repressor PRC2 as well as the LSDl/CoREST/REST complex, another repressive epigenetic regulator (Tsai et al, Science 329, 689-693, 2010, hereby incorporated by reference). HOTAIR is a highly structured RNA with over 50% of its nucleotides being involved in base pairing. It is frequently dysregulated (often up-regulated) in various types of cancer (Yao et al, Int. J Mol. Sci. 15: 18985-18999, 2014; Deng et al, PLOS One 9:el 10059, 2014, hereby incorporated by reference). Cancer patients with high expression levels of HOTAIR have a significantly poorer prognosis, compared with those with low expression levels. HOTAIR has been reported to be involved in the control of apoptosis, proliferation, metastasis, angiogenesis, DNA repair, chemoresi stance and tumor cell metabolism. It is highly expressed in metastatic breast cancers. High levels of expression in primary breast tumors are a significant predictor of subsequent metastasis and death. HOTAIR also has been reported to be associated with esophageal squamous cell carcinoma, and it is a prognostic factor in colorectal cancer, cervical cancer, gastric cancer and endometrial carcinoma. Therefore, HOTAIR-binding small molecules are novel anti-cancer drug candidates. Accordingly, in some embodiments of the methods described above, the target non-coding RNA is HOTAIR. In some embodiments, the disease or disorder is breast cancer, esophageal squamous cell carcinoma, colorectal cancer, cervical cancer, gastric cancer, or endometrial carcinoma.
[00368] Another potential cancer target among lncRNA is MALAT-l (metastasis-associated lung adenocarcinoma transcript 1), also known as NEAT2 (nuclear-enriched abundant transcript 2) (Gutschner et al, Cancer Res. 73: 1180-1189, 2013; Brown et al, Nat. Structural &Mol. Biol. 21 :633-640, 2014, hereby incorporated by reference). It is a highly conserved 7 kb nuclear lncRNA that is localized in nuclear speckles. It is ubiquitously expressed in normal tissues, but is up-regulated in many cancers. MALAT-l is a predictive marker for metastasis development in multiple cancers including lung cancer. It appears to function as a regulator of gene expression, potentially affecting transcription and/or splicing. MALAT-l knockout mice have no phenotype, indicating that it has limited normal function. However, MALAT-l -deficient cancer cells are impaired in migration and form fewer tumors in a mouse xenograft tumor models. Antisense oligonucleotides (ASO) blocking MALAT-l prevent metastasis formation after tumor implantation in mice. Some mouse xenograft tumor model data indicates that MALAT-l knockdown by ASOs may inhibit both primary tumor growth and metastasis. Thus, a small molecule targeting MALAT-l is exptected to be effective in inhibiting tumor growth and metastasis. Accordingly, in some embodiments of the methods described above, the target non coding RNA is MALAT-l or a fragment thereof. In some embodiments, the disease or disorder is a cancer in which MALAT-l is upregulated, such as lung cancer.
Targeting Toxic RNA (Repeat RNA)
[00369] Simple repeats in mRNA often are associated with human disease. These are often, but not exclusively, repeats of three nucleotides such as CAG (“triplet repeats”) (for reviews, see Gatchel and Zoghbi, Nature Reviews Genetics 6:743-755, 2005; Krzyzosiak et al, Nucleic Acids Res. 40: 11-26, 2012; Budworth and McMurray, Methods Mol. Biol. 1010:3-17, 2013, hereby incorporated by reference). Triplet repeats are abundant in the human genome, and they tend to undergo expansion over generations. Approximately 40 human diseases are associated with the expansion of repeat sequences. Diseases caused by triplet expansions are known as Triplet Repeat Expansion Diseases (TRED). Healthy individuals have a variable number of triplet repeats, but there is a threshold beyond which a higher repeat number causes disease. The threshold varies in different disorders. The triplet repeat can be unstable. As the gene is inherited, the number of repeats may increase, and the condition may be more severe or have an earlier onset from generation to generation. When an individual has a number of repeats in the normal range, it is not expected to expand when passed to the next generation. When the repeat number is in the premutation range (a normal, but unstable repeat number), then the repeats may or may not expand upon transmission to the next generation. Normal individuals who carry a premutation do not have the condition, but are at risk of having a child who has inherited a triplet repeat in the full mutation range and who will be affected. TREDs can be autosomal dominant, autosomal recessive or X- linked. The more common triplet repeat disorders are autosomal dominant. [00370] The repeats can be in the coding or noncoding portions of the mRNA. In the case of repeats within noncoding regions, the repeats may lie in the 5' UTR, introns, or 3' UTR sequences. Some examples of diseases caused by repeat sequences within coding regions are shown in Table 1
Table 1: Repeat Expansion Diseases in Which the Repeat Resides in the Coding Regions of mRNA
Figure imgf000071_0001
[00371] In some embodiments, the target RNA is one of those listed in Table 1, or a fragment or analog thereof.
[00372] Some examples of diseases caused by repeat sequences within noncoding regions of mRNA are shown in Table 2.
Table 2: Repeat Expansion Diseases in Which the Repeat Resides in the Noncoding
Regions of mRNA
Figure imgf000071_0002
Figure imgf000072_0001
[00373] In some embodiments, the target RNA is one of those listed in Table 2, or a fragment or analog thereof.
[00374] The toxicity that results from the repeat sequence can be direct consequence of the action of the toxic RNA itself, or, in cases in which the repeat expansion is in the coding sequence, due to the toxicity of the RNA and/or the aberrant protein. The repeat expansion RNA can act by sequestering critical RNA-binding proteins (RBP) into foci. One example of a sequestered RBP is the Muscleblind family protein MBNL1. Sequestration of RBPs leads to defects in splicing as well as defects in nuclear-cytoplasmic transport of RNA and proteins. Sequestration of RBPs also can affect miRNA biogenesis. These perturbations in RNA biology can profoundly affect neuronal function and survival, leading to a variety of neurological diseases.
[00375] Repeat sequences in RNA form secondary and tertiary structures that bind RBPs and affect normal RNA biology. One specific example disease is myotonic dystrophy (DM1; dystrophia myotonica), a common inherited form of muscle disease characterized by muscle weakness and slow relaxation of the muscles after contraction (Machuca-Tzili et al, Muscle Nerve 32: 1-18, 2005, hereby incorporated by reference). It is caused by a CUG expansion in the 3 ' UTR of the dystrophia myotonica protein kinase (DMPK) gene. This repeat-containing RNA causes the misregulation of alternative splicing of several developmentally regulated transcripts through effects on the splicing regulators MBNL1 and the CUG repeat binding protein (CELF1) (Wheeler et al, Science 325:336-339, 2009, hereby incorporated by reference). Small molecules that bind the CUG repeat within the DMPK transcript would alter the RNA structure and prevent focus formation and alleviate the effects on these spicing regulators. Fragile X Syndrome (FXS), the most common inherited form of mental retardation, is the consequence of a CGG repeat expansion within the 5' UTR of the FMR1 gene (Lozano et al, Intractable Rare Dis. Res. 3 : 134-146, 2014, hereby incorporated by reference). FMRP is critical for the regulation of translation of many mRNAs and for protein trafficking, and it is an essential protein for synaptic development and neural plasticity. Thus, its deficiency leads to neuropathology. A small molecule targeting this CGG repeat RNA may alleviate the suppression of FMR1 mRNA and FMRP protein expression. Another TRED having a very high unmet medical need is Huntington’s disease (HD). HD is a progressive neurological disorder with motor, cognitive, and psychiatric changes (Zuccato et al, Physiol Rev. 90:905-981, 2010, hereby incorporated by reference). It is characterized as a poly- glutamine or polyQ disorder since the CAG repeat within the coding sequence of the HTT gene leads to a protein having a poly-glutamine repeat that appears to have detrimental effects on transcription, vesicle trafficking, mitochondrial function, and proteasome activity. However, the HTT CAG repeat RNA itself also demonstrates toxicity, including the sequestration of MBNL1 protein into nuclear inclusions. One other specific example is the GGGGCC repeat expansion in the C9orf72 (chromosome 9 open reading frame 72) gene that is prevalent in both familial frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) (Ling et al, Neuron 79:416-438, 2013; Haeusler et al., Nature 507: 195-200, 2014, hereby incorporated by reference). The repeat RNA structures form nuclear foci that sequester critical RNA binding proteins. The GGGGCC repeat RNA also binds and sequesters RanGAPl to impair nucleocytoplasmic transport of RNA and proteins (Zhang et al, Nature 525:56-61, 2015, hereby incorporated by reference). Selectively targeting any of these repeat expansion RNAs could add therapeutic benefit in these neurological diseases.
[00376] The present invention contemplates a method of treating a disease or disorder wherein aberrant RNAs themselves cause pathogenic effects, rather than acting through the agency of protein expression or regulation of protein expression. In some embodiments, the target RNA is a repeat RNA, such as those described herein or in Tables 1 or 2. In some embodiments, the repeat RNA mediates or is implicated in a repeat expansion disease in which the repeat resides in the coding regions of mRNA. In some embodiments, the disease or disorder is a repeat expansion disease in which the repeat resides in the noncoding regions of mRNA. In some embodiments, the disease or disorder is selected from Huntington’s disease (HD), dentatorubral-pallidoluysian atrophy (DRPLA), spinal-bulbar muscular atrophy (SBMA), or a spinocerebellar ataxia (SCA) selected from SCA1, SCA2, SCA3, SCA6, SCA7, or SCA17. In some embodiments, the disease or disorder is selected from Fragile X Syndrome, myotonic dystrophy (DM1 or dystrophia myotonica), Friedreich’s Ataxia (FRDA), a spinocerebellar ataxia (SCA) selected from SCA8, SCA10, or SCA12, or C9FTD (amyotrophic lateral sclerosis or ALS). [00377] In some embodiments, the disease is amyotrophic lateral sclerosis (ALS), Huntington’s disease (HD), frontotemporal dementia (FTD), myotonic dystrophy (DM1 or dystrophia myotonica), or Fragile X Syndrome.
[00378] Also provided is a method of producing a small molecule that modulates the activity of a target repeat expansion RNA to treat a disease or disorder, comprising the steps of: screening one or more disclosed compounds for binding to the target repeat expansion RNA; and analyzing the results by an RNA binding assay disclosed herein. In some embodiments, the repeat expansion RNA causes a disease or disorder selected from HD, DRPLA, SBMA, SCA1, SCA2, SCA3, SCA6, SCA7, or SCA17. In some embodiments, the disease or disorder is selected from Fragile X Syndrome, DM1, FRDA, SCA8, SCA10, SCA12, or C9FTD.
Other Target RNAs and Diseases/Conditions
[00379] An association is known to exist between a large number of additional RNAs and diseases or conditions, some of which are shown below in Table 3. Accordingly, in some embodiments of the methods described above, the target RNA is selected from those in Table 3. In some embodiments, the target RNA mediates or is implicated in a disease or disorder selected from one of those in Table 3.
Table 3: Target RNAs and Associated Diseases/Conditions
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000076_0001
Figure imgf000077_0001
Figure imgf000078_0001
Figure imgf000079_0001
Figure imgf000080_0001
Table 4: Additional Target RNAs
Figure imgf000080_0002
2. Compounds, Compound Libraries, and Uses Thereof
[00380] Encoded compounds of the present invention, and pharmaceutically acceptable salts thereof, are capable of binding to an active site or allosteric site(s) and/or the tertiary structure of a nucleic acid target, such as a target RNA. Libraries of compounds, which may be produced as described herein or using other methods known in the art, are similarly useful in drug discovery, probing RNA structure, and discovering new RNA targets for treatment of disease. In some embodiments, such libraries are used to generate lead compound structures for further optimization. In some embodiments, hit compounds from a first compound library are used to generate further libraries.
[00381] In some embodiments, encoded compounds are synthesized in a combinatorial manner using randomly or semi-randomly selected building blocks as starting points. Such building blocks may be selected according to principles of combinatorial library construction as are known in the art. In other embodiments, a building block is selected because it is a known binder to nucleic acids, or a fragment of a known binder. Exemplary known nucleic acid binders include those described herein.
Small-Molecule RNA Ligands
[00382] The design and synthesis of novel small molecules capable of binding nucleic acids such as RNA represents largely untapped therapeutic potential. In some embodiments, the small molecule is selected from a compound known to bind to RNA, such as a heteroaryldihydropyrimidine (HAP), a macrolide (e.g., erythromycin, azithromycin), alkaloid (e.g., berberine, palmatine), aminoglycoside (e.g., paromomycin, neomycin B, kanamycin A), tetracycline (e.g., doxycycline, oxytetracycline), a theophylline, ribocil, clindamycin, chloramphenicol, LMI070, a triptycene-based scaffold, an oxazolidinone (e.g., linezolid, tedizolid), or CPNQ.
[00383] In some embodiments, the small molecule is ribocil, which has the following structure:
Figure imgf000081_0001
or a pharmaceutically acceptable salt thereof. Ribocil is a a drug-like ligand that binds to the FMN riboswitch (PDB 5KX9) and inhibits riboswitch function ( Nature 2015, 526 , 672-677 , hereby incorporated by reference).
[00384] In some embodiments, the small-molecule ligand is an oxazolidinone such as linezolid, tedizolid, eperezolid, or PNU 176798.
[00385] Furthermore, certain compounds comprising a quinoline core, of which CPNQ is one, are capable of binding RNA. CPNQ has the following structure:
Figure imgf000082_0001
[00386] Accordingly, in some embodiments, the small molecule is selected from CPNQ or a pharmaceutically acceptable salt thereof. In other embodiments, the small molecule is selected from a quinoline compound related to CPNQ; or a pharmaceutically acceptable salt thereof.
[00387] Organic dyes, amino acids, biological cofactors, metal complexes, and peptides also show RNA binding ability.
[00388] The terms“small molecule that binds a target,”“small molecule RNA binder,”“affinity moiety,”“ligand,”“small-molecule RNA ligand,” or“ligand moiety,” as used herein, include all compounds generally classified as small molecules that are capable of binding to a nucleic acid target such as an RNA with sufficient affinity and specificity for use in a disclosed method. Small molecules that bind a nucleic acid for use in the present invention may bind to one or more secondary or tertiary structure elements of a nucleic acid target. These sites include triplexes, hairpins, bulge loops, pseudoknots, internal loops, junctions, G-quadruplexes, and other higher- order structural motifs described or referred to herein.
[00389] Accordingly, in some embodiments, the small molecule is selected from a heteroaryldihydropyrimidine (HAP), a macrolide, alkaloid, aminoglycoside, a member of the tetracycline family, an oxazolidinone, a SMN2 pre-mRNA ligand such as LMI070 (NVS-SM1), ribocil or an analog thereof, clindamycin, chloramphenicol, an anthracene, a triptycene, theophylline or an analog thereof, or CPNQ or an analog thereof. In some embodiments, the small molecule is selected from paromomycin, a neomycin (such as neomycin B), a kanamycin (such as kanamycin A), linezolid, tedizolid, pleuromutilin, ribocil, anthracene, triptycene, or CPNQ or an analog thereof; wherein each small molecule may be optionally substituted with one or more “optional substituents” as defined below, such as 1, 2, 3, or 4, for example 1 or 2, optional substituents.
[00390] In some embodiments, the compound or DNA-encoded library member thereof binds to a junction, stem-loop, or bulge in a target RNA. In some embodiments, the compound or DNA- encoded library member thereof binds to a nucleic acid three-way junction (3WJ) or four-way junction (4WJ). In some embodiments, the 3WJ is a trans 3WJ that is capable of forming between a miRNA and mRNA in vivo or in vitro.
[00391] Compounds of the present invention include those described generally herein, and are further illustrated by the classes, subclasses, and species disclosed herein. As used herein, the following definitions shall apply unless otherwise indicated. For purposes of this invention, the chemical elements are identified in accordance with the Periodic Table of the Elements, CAS version, Handbook of Chemistry and Physics, 75th Ed, hereby incorporated by reference. Additionally, general principles of organic chemistry are described in“Organic Chemistry”, Thomas Sorrell, ETniversity Science Books, Sausalito: 1999, and“March’s Advanced Organic Chemistry,” 5th Ed., Ed.: Smith, M.B. and March, T, John Wiley & Sons, New York: 2001, the entire contents of which are hereby incorporated by reference.
[00392] The term“aliphatic” or“aliphatic group,” as used herein, means a straight-chain (i.e., unbranched) or branched, substituted or unsubstituted hydrocarbon chain that is completely saturated or that contains one or more units of unsaturation, or a monocyclic hydrocarbon or bicyclic hydrocarbon that is completely saturated or that contains one or more units of unsaturation, but which is not aromatic (also referred to herein as“carbocycle,”“cycloaliphatic” or“cycloalkyl”), that has a single point of attachment to the rest of the molecule. Unless otherwise specified, aliphatic groups contain 1-6 aliphatic carbon atoms. In some embodiments, aliphatic groups contain 1-5 aliphatic carbon atoms. In other embodiments, aliphatic groups contain 1-4 aliphatic carbon atoms. In still other embodiments, aliphatic groups contain 1-3 aliphatic carbon atoms, and in yet other embodiments, aliphatic groups contain 1-2 aliphatic carbon atoms. In some embodiments,“cycloaliphatic” (or“carbocycle” or“cycloalkyl”) refers to a monocyclic C3-C6 hydrocarbon that is completely saturated or that contains one or more units of unsaturation, but which is not aromatic, that has a single point of attachment to the rest of the molecule. Suitable aliphatic groups include, but are not limited to, linear or branched, substituted or unsubstituted alkyl, alkenyl, alkynyl groups and hybrids thereof such as (cycloalkyl)alkyl, (cycloalkenyl)alkyl or (cycloalkyl)alkenyl.
[00393] As used herein, the term“bridged bicyclic” refers to any bicyclic ring system, i.e. carbocyclic or heterocyclic, saturated or partially unsaturated, having at least one bridge. As defined by IUPAC, a“bridge” is an unbranched chain of atoms or an atom or a valence bond connecting two bridgeheads, where a“bridgehead” is any skeletal atom of the ring system which is bonded to three or more skeletal atoms (excluding hydrogen). In some embodiments, a bridged bicyclic group has 7-12 ring members and 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur. Such bridged bicyclic groups are well known in the art and include those groups set forth below where each group is attached to the rest of the molecule at any substitutable carbon or nitrogen atom. Unless otherwise specified, a bridged bicyclic group is optionally substituted with one or more substituents as set forth for aliphatic groups. Additionally or alternatively, any substitutable nitrogen of a bridged bicyclic group is optionally substituted. Exemplary bridged bicyclics include:
Figure imgf000084_0001
[00394] The term“lower alkyl” refers to a Ci-4 straight or branched alkyl group. Exemplary lower alkyl groups are methyl, ethyl, propyl, isopropyl, butyl, isobutyl, and tert-butyl.
[00395] The term“lower haloalkyl” refers to a Ci-4 straight or branched alkyl group that is substituted with one or more halogen atoms.
[00396] The term“heteroatom” means one or more of oxygen, sulfur, nitrogen, phosphorus, or silicon (including, any oxidized form of nitrogen, sulfur, phosphorus, or silicon; the quaternized form of any basic nitrogen or; a substitutable nitrogen of a heterocyclic ring, for example N (as in 3 ,4-di hydro-2//-pyrrol yl ), NH (as in pyrrolidinyl) or NR+ (as in N-substituted pyrrolidinyl)). [00397] The term“unsaturated,” as used herein, means that a moiety has one or more units of unsaturation.
[00398] As used herein, the term“bivalent Ci-8 (or Ci-6) saturated or unsaturated, straight or branched, hydrocarbon chain,” refers to bivalent alkylene, alkenylene, and alkynylene chains that are straight or branched as defined herein.
[00399] The term“alkylene” refers to a bivalent alkyl group. An“alkylene chain” is a polymethylene group, i.e., -(CH2)n-, wherein n is a positive integer, preferably from 1 to 6, from 1 to 4, from 1 to 3, from 1 to 2, or from 2 to 3. A substituted alkylene chain is a polymethylene group in which one or more methylene hydrogen atoms are replaced with a substituent. Suitable substituents include those described below for a substituted aliphatic group.
[00400] The term“alkenylene” refers to a bivalent alkenyl group. A substituted alkenylene chain is a polymethylene group containing at least one double bond in which one or more hydrogen atoms are replaced with a substituent. Suitable substituents include those described below for a substituted aliphatic group.
[00401] The term“halogen” means F, Cl, Br, or I.
[00402] The term“aryl” used alone or as part of a larger moiety as in“aralkyl,”“aralkoxy,” or “aryloxyalkyl,” refers to monocyclic or bicyclic ring systems having a total of 6 to 14 ring members, wherein at least one ring in the system is aromatic and wherein each ring in the system contains 3 to 7 ring members. The term“aryl” may be used interchangeably with the term“aryl ring.” In certain embodiments of the present invention,“aryl” refers to an aromatic ring system which includes, but not limited to, phenyl, biphenyl, naphthyl, anthracyl and the like, which may bear one or more substituents. Also included within the scope of the term“aryl,” as it is used herein, is a group in which an aromatic ring is fused to one or more non-aromatic rings, such as indanyl, phthalimidyl, naphthimidyl, phenanthridinyl, or tetrahydronaphthyl, and the like.
[00403] The terms“heteroaryl” and“heteroar-,” used alone or as part of a larger moiety, e.g., “heteroaralkyl,” or“heteroaralkoxy,” refer to groups having 5 to 10 ring atoms, preferably 5, 6, or 9 ring atoms; having 6, 10, or 14 p electrons shared in a cyclic array; and having, in addition to carbon atoms, from one to five heteroatoms. The term“heteroatom” refers to nitrogen, oxygen, or sulfur, and includes any oxidized form of nitrogen or sulfur, and any quatemized form of a basic nitrogen. Heteroaryl groups include, without limitation, thienyl, furanyl, pyrrolyl, imidazolyl, pyrazolyl, triazolyl, tetrazolyl, oxazolyl, isoxazolyl, oxadiazolyl, thiazolyl, isothiazolyl, thiadiazolyl, pyridyl, pyridazinyl, pyrimidinyl, pyrazinyl, indolizinyl, purinyl, naphthyridinyl, and pteridinyl. The terms“heteroaryl” and“heteroar-”, as used herein, also include groups in which a heteroaromatic ring is fused to one or more aryl, cycloaliphatic, or heterocyclyl rings, where the radical or point of attachment is on the heteroaromatic ring. Nonlimiting examples include indolyl, isoindolyl, benzothienyl, benzofuranyl, dibenzofuranyl, indazolyl, benzimidazolyl, benzthiazolyl, quinolyl, isoquinolyl, cinnolinyl, phthalazinyl, quinazolinyl, quinoxalinyl, AH quinolizinyl, carbazolyl, acridinyl, phenazinyl, phenothiazinyl, phenoxazinyl, tetrahydroquinolinyl, tetrahydroisoquinolinyl, and pyri do[2,3 -/?]- 1 ,4-oxazin-3(4//)-one. A heteroaryl group may be mono- or bicyclic. The term“heteroaryl” may be used interchangeably with the terms“heteroaryl ring,”“heteroaryl group,” or“heteroaromatic,” any of which terms include rings that are optionally substituted. The term“heteroaralkyl” refers to an alkyl group substituted with a heteroaryl, wherein the alkyl and heteroaryl portions independently are optionally substituted.
[00404] As used herein, the terms“heterocycle,”“heterocyclyl,”“heterocyclic radical,” and “heterocyclic ring” are used interchangeably and refer to a stable 5- to 7-membered monocyclic or 7-lO-membered bicyclic heterocyclic moiety that is either saturated or partially unsaturated, and having, in addition to carbon atoms, one or more, preferably one to four, heteroatoms, as defined above. When used in reference to a ring atom of a heterocycle, the term“nitrogen” includes a substituted nitrogen. As an example, in a saturated or partially unsaturated ring having 0-3 heteroatoms selected from oxygen, sulfur or nitrogen, the nitrogen may be N (as in 3,4- di hydro-2// pyrrol yl), NH (as in pyrrolidinyl), or +NR (as in N substituted pyrrolidinyl).
[00405] A heterocyclic ring can be attached to its pendant group at any heteroatom or carbon atom that results in a stable structure and any of the ring atoms can be optionally substituted. Examples of such saturated or partially unsaturated heterocyclic radicals include, without limitation, tetrahydrofuranyl, tetrahydrothiophenyl, pyrrolidinyl, piperidinyl, pyrrolinyl, tetrahydroquinolinyl, tetrahydroisoquinolinyl, decahydroquinolinyl, oxazolidinyl, piperazinyl, dioxanyl, dioxolanyl, diazepinyl, oxazepinyl, thiazepinyl, morpholinyl, and quinuclidinyl. The terms “heterocycle,” “heterocyclyl,” “heterocyclyl ring,” “heterocyclic group,” “heterocyclic moiety,” and“heterocyclic radical,” are used interchangeably herein, and also include groups in which a heterocyclyl ring is fused to one or more aryl, heteroaryl, or cycloaliphatic rings, such as indolinyl, 3// indolyl, chromanyl, phenanthridinyl, or tetrahydroquinolinyl. A heterocyclyl group may be mono- or bicyclic. The term“heterocyclylalkyl” refers to an alkyl group substituted with a heterocyclyl, wherein the alkyl and heterocyclyl portions independently are optionally substituted.
[00406] As used herein, the term“partially unsaturated” refers to a ring that includes at least one double or triple bond. The term“partially unsaturated” is intended to encompass rings having multiple sites of unsaturation, but is not intended to include aryl or heteroaryl moieties, as herein defined.
[00407] As described herein, compounds of the invention may contain“optionally substituted” moieties. In general, the term“substituted,” whether preceded by the term“optionally” or not, means that one or more hydrogens of the designated moiety are replaced with a suitable substituent. Unless otherwise indicated, an“optionally substituted” group may have a suitable substituent (“optional substituent”) at each substitutable position of the group, and when more than one position in any given structure may be substituted with more than one substituent selected from a specified group, the substituent may be either the same or different at every position. Combinations of substituents envisioned by this invention are preferably those that result in the formation of stable or chemically feasible compounds. The term“stable,” as used herein, refers to compounds that are not substantially altered when subjected to conditions to allow for their production, detection, and, in certain embodiments, their recovery, purification, and use for one or more of the purposes disclosed herein.
[00408] Suitable monovalent substituents on a substitutable carbon atom of an“optionally substituted” group are independently halogen; -(CH2)o-4R°; -(CH2)o-40R°; -0(CH2)o-4R°, -O- (CH2)O-4C(0)OR°; -(CH2)O-4CH(OR°)2; -(CH2)O^SR°; -(CH2)o^Ph, which may be substituted with R°; -(CH2)o-40(CH2)o-iPh which may be substituted with R°; -CH=CHPh, which may be substituted with R°; -(CH2)o-40(CH2)o-i-pyridyl which may be substituted with R°; -N02; -CN; -N3; -(CH2)O-4N(R°)2; -(CH2)O-4N(R°)C(0)R0; -N(R°)C(S)R°; -(CH2)O-
4N(R°)C(0)NR°2; -N(R°)C(S)NR°2; -(CH2)O-4N(R0)C(0)OR°;
N(R°)N(R°)C(0)R°; -N(R°)N(R°)C(0)NR°2; -N(R°)N(R°)C(0)OR°; -(CH2)0-4C(O)R°; - C(S)R°; -(CH2> C(0)0R°; -(CH2)O-4C(0)SR°; -(CH2)0-4C(O)OSiRo 3; -(CH2)0-4OC(O)Ro; - OC(0)(CH2)O-4SR-, SC(S)SR°; -(CH2), SC(0)R°; -(CH2)O-4C(0)NR°2; -C(S)NR°2; -C(S)SR°; -SC(S)SR°, -(CH2)O-4OC(0)NR°2; -C(0)N(OR°)R°; -C(0)C(0)R°; -C(0)CH2C(0)R°; - C(NOR°)R°; -(CH2), SSR°; -(CH2> S(0)2R°; -(CH2> S(0)20R°; -(CH2)0-4OS(O)2Ro; - S(0)2NR°2; -(CH2)O-4S(0)R°; -N(R°)S(0)2NR°2; -N(R°)S(0)2R°; -N(OR°)R°; -C(NH)NR°2; - P(0)2R°; -P(0)R°2; -0P(0)R°2; -0P(0)(0R°)2; SiR0 3; -(Ci-4 straight or branched alkylene)0- N(R°)2; or -(Ci-4 straight or branched alkylene)C(0)0-N(R°)2, wherein each R° may be substituted as defined below and is independently hydrogen, Ci-6 aliphatic, -CH2Ph, -0(CH2)o- iPh, -CH2-(5-6 membered heteroaryl ring), or a 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur, or, notwithstanding the definition above, two independent occurrences of R°, taken together with their intervening atom(s), form a 3-l2-membered saturated, partially unsaturated, or aryl mono- or bicyclic ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur, which may be substituted as defined below.
[00409] Suitable monovalent substituents on R° (or the ring formed by taking two independent occurrences of R° together with their intervening atoms), are independently halogen, -(CH2)o-2R*, -(haloR*), -(CH2)O-2OH, -(CH2)O-2OR*, -(CH2)O-2CH(OR*)2; -0(haloR*), -CN, -N3, -(CH2)0- 2C(0)R·, -(CH2)O-2C(0)OH, -(CH2)O-2C(0)OR·, -(CH2)O-2SR*, -(CH2)O-2SH, -(CH2)O-2NH2, - (CH2)O-2NHR·, -(CH2)O-2NR*2, -N02, -SiR*3, -OSiR*3, -C(0)SR* -(Ci_4 straight or branched alkylene)C(0)OR*, or -SSR* wherein each R* is unsubstituted or where preceded by“halo” is substituted only with one or more halogens, and is independently selected from Ci-4 aliphatic, - CH2Ph, -0(CH2)o-iPh, or a 5-6-membered saturated, partially unsaturated, or aryl ring having 0- 4 heteroatoms independently selected from nitrogen, oxygen, or sulfur. Suitable divalent substituents on a saturated carbon atom of R° include =0 and =S.
[00410] Suitable divalent substituents on a saturated carbon atom of an“optionally substituted” group include the following: =0, =S, =NNR%, =NNHC(0)R*, =NNHC(0)OR*, =NNHS(0)2R*, =NR*, =NOR*, -0(C(R* 2))2-30-, or -S(C(R* 2))2-3S-, wherein each independent occurrence of R* is selected from hydrogen, Ci-6 aliphatic which may be substituted as defined below, or an unsubstituted 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur. Suitable divalent substituents that are bound to vicinal substitutable carbons of an“optionally substituted” group include: -0(CR* 2)2- 30— , wherein each independent occurrence of R* is selected from hydrogen, Ci-6 aliphatic which may be substituted as defined below, or an unsubstituted 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur. [00411] Suitable substituents on the aliphatic group of R* include halogen, -R*, -(haloR*), -OH, -OR*, -0(haloR*), -CN, -C(0)0H, -C(0)0R*, -NH2, NHR*, -NR*2, or -N02, wherein each R* is unsubstituted or where preceded by“halo” is substituted only with one or more halogens, and is independently Ci-4 aliphatic, -CH2Ph, -0(CH2)o-iPh, or a 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
[00412] Suitable substituents on a substitutable nitrogen of an“optionally substituted” group include -R, -NR 2, -C(0)R, -C(0)OR, -C(0)C(0)R,
C(0)CH2C(0)R, -S(0)2R, -S(0)2NR 2, -C(S)NR 2, -C(NH)NR 2, or -N(R)S(0)2R; wherein each R' is independently hydrogen, Ci-6 aliphatic which may be substituted as defined below, unsubstituted -OPh, or an unsubstituted 5-6-membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur, or, notwithstanding the definition above, two independent occurrences of R, taken together with their intervening atom(s) form an unsubstituted 3-l2-membered saturated, partially unsaturated, or aryl mono- or bicyclic ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
[00413] Suitable substituents on the aliphatic group of R' are independently halogen, - R*, -(haloR*), -OH, -OR*, -0(haloR*), -CN, -C(0)OH, -C(0)OR*, -NH2, -NHR*, -NR*2, or -N02, wherein each R* is unsubstituted or where preceded by“halo” is substituted only with one or more halogens, and is independently Ci-4 aliphatic, -CH2Ph, -0(CH2)o-iPh, or a 5-6- membered saturated, partially unsaturated, or aryl ring having 0-4 heteroatoms independently selected from nitrogen, oxygen, or sulfur.
[00414] As used herein, the term“pharmaceutically acceptable salt” refers to those salts which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of humans and lower animals without undue toxicity, irritation, allergic response and the like, and are commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable salts are well known in the art. For example, S. M. Berge et ah, describe pharmaceutically acceptable salts in detail in J. Pharmaceutical Sciences, 1977, 66, 1-19, incorporated herein by reference. Pharmaceutically acceptable salts of the compounds of this invention include those derived from suitable inorganic and organic acids and bases. Examples of pharmaceutically acceptable, nontoxic acid addition salts are salts of an amino group formed with inorganic acids such as hydrochloric acid, hydrobromic acid, phosphoric acid, sulfuric acid and perchloric acid or with organic acids such as acetic acid, oxalic acid, maleic acid, tartaric acid, citric acid, succinic acid or malonic acid or by using other methods used in the art such as ion exchange. Other pharmaceutically acceptable salts include adipate, alginate, ascorbate, aspartate, benzenesulfonate, benzoate, bisulfate, borate, butyrate, camphorate, camphor sulfonate, citrate, cyclopentanepropionate, digluconate, dodecyl sulfate, ethanesulfonate, formate, fumarate, glucoheptonate, glycerophosphate, gluconate, hemisulfate, heptanoate, hexanoate, hydroiodide, 2- hydroxy-ethanesulfonate, lactobionate, lactate, laurate, lauryl sulfate, malate, maleate, malonate, methanesulfonate, 2-naphthalenesulfonate, nicotinate, nitrate, oleate, oxalate, palmitate, pamoate, pectinate, persulfate, 3-phenylpropionate, phosphate, pivalate, propionate, stearate, succinate, sulfate, tartrate, thiocyanate, p-toluenesulfonate, undecanoate, valerate salts, and the like.
[00415] Salts derived from appropriate bases include alkali metal, alkaline earth metal, ammonium and N+(Ci^alkyl)4 salts. Representative alkali or alkaline earth metal salts include sodium, lithium, potassium, calcium, magnesium, and the like. Further pharmaceutically acceptable salts include, when appropriate, nontoxic ammonium, quaternary ammonium, and amine cations formed using counterions such as halide, hydroxide, carboxylate, sulfate, phosphate, nitrate, loweralkyl sulfonate and aryl sulfonate.
[00416] Unless otherwise stated, structures depicted herein are also meant to include all isomeric (e.g., enantiomeric, diastereomeric, and geometric (or conformational)) forms of the structure; for example, the R and S configurations for each asymmetric center, Z and E double bond isomers, and Z and E conformational isomers. Therefore, single stereochemical isomers as well as enantiomeric, diastereomeric, and geometric (or conformational) mixtures of the present compounds are within the scope of the invention. Unless otherwise stated, all tautomeric forms of the compounds of the invention are within the scope of the invention. Additionally, unless otherwise stated, structures depicted herein are also meant to include compounds that differ only in the presence of one or more isotopically enriched atoms. For example, compounds having the present structures including the replacement of hydrogen by deuterium or tritium, or the replacement of a carbon by a 13C- or 14C-enriched carbon are within the scope of this invention. Such compounds are useful, for example, as analytical tools, as probes in biological assays, or as therapeutic agents in accordance with the present invention.
Tethering Groups (Linkers) [00417] The present invention contemplates the use of a wide variety of tethering groups (“tethers” or“linkers”) for covalent conjugation of the small molecule compound or building block to the DNA barcode. Either a cleavable or non-cleavable tether is used depending on the context. In some embodiments, the tether is a non-cleavable group such as a polyethylene glycol (PEG) group of, e.g., 1-10 ethylene glycol subunits. In some embodiments, the tether is a non-cleavable group such as an optionally substituted C1-12 aliphatic group or a peptide comprising 1-8 amino acids. In some embodiments, the tether comprises a click reaction product resulting from the so- called“click” reaction between two click-ready groups.
[00418] Generally, a small molecule or building block is conjugated to a DNA barcode by a single covalent linker. A small molecule assembled from two or more building blocks will be attached to the resulting DNA barcode by two or more linkers (i.e., the number of linkers will be the same as the number of building blocks used to prepare the small molecule). In some embodiments, all of the linkers but one is cleavable, so that the small molecule is attached to the DNA barcode by a single, non-cleavable linker. In some embodiments, the remaining, cleavable linkers are cleaved before the small molecule is screened against a target. In some embodiments, the linker is selected so as to avoid interfering with amplification (such as RT-PCR) of screening hits, for example by selecting a sufficiently long linker or by conjugating it to a position on the DNA barcode to avoid interference.
Click-Ready Groups
[00419] A variety of bioorthogonal reaction partners may be used in the present invention to tether a chemical building block or compound to its DNA barcode. The term“bioorthogonal chemistry” or“bioorthogonal reaction,” as used herein, refers to any chemical reaction that can take place in living systems without interfering significantly with native biochemical processes. Accordingly, a“bioorthogonal reaction partner” is a chemical group capable of undergoing a bioorthogonal reaction with an appropriate reaction partner to couple a compound described herein to its DNA barcode. In some embodiments, the bioorthogonal reaction partner is selected from a click-ready group or a group capable of undergoing a nitrone/cyclooctyne reaction, oxime/hydrazone formation, a tetrazine ligation, an isocyanide-based click reaction, or a quadricyclane ligation. [00420] In some embodiments, the bioorthogonal reaction partner is a click-ready group. The term“click-ready group” refers to a chemical moiety capable of undergoing a click reaction, such as an azide or alkyne.
[00421] Click reactions tend to involve high-energy (“spring-loaded”) reagents with well- defined reaction coordinates, that give rise to selective bond-forming events of wide scope. Examples include nucleophilic trapping of strained-ring electrophiles (epoxide, aziridines, aziridinium ions, episulfonium ions), certain carbonyl reactivity (e.g., the reaction between aldehydes and hydrazines or hydroxylamines), and several cycloaddition reactions. The azide- alkyne l,3-dipolar cycloaddition and the Diels- Alder cycloaddition are two such reactions.
[00422] Such click reactions (i.e., dipolar cycloadditions) are associated with a high activation energy and therefore require heat or a catalyst. Indeed, use of a copper catalyst is routinely employed in click reactions. However, in certain instances where click chemistry is particularly useful (e.g., in bioconjugation reactions), the presence of copper can be detrimental (See Wolbers, F. et al.; Electrophoresis 2006, 27, 5073, hereby incorporated by reference). Accordingly, methods of performing dipolar cycloaddition reactions were developed without the use of metal catalysis. Such“metal free” click reactions utilize activated moieties in order to facilitate cycloaddition. Therefore, the present invention provides click-ready groups suitable for metal- free click chemistry.
[00423] Certain metal-free click moieties are known in the literature. Examples include 4- dibenzocyclooctynol (DIBO) (from Ning et al; Angew Chem Int Ed, 2008, 47, 2253); gem- difluorinated cyclooctynes (DIFO or DFO) (from Codelli, et al.; J Am. Chem. Soc. 2008, 130, 11486-11493.); biarylazacyclooctynone (BARAC) (from Jewett et al.; J. Am. Chem. Soc. 2010, 132, 3688.); or bicyclononyne (BCN) (From Dommerholt, et al.; Angew Chem Int Ed, 2010, 49, 9422-9425); each of which is hereby incorporated by reference.
[00424] As used herein, the phrase“a moiety suitable for metal-free click chemistry” refers to a functional group capable of dipolar cycloaddition without use of a metal catalyst. Such moieties include an activated alkyne (such as a strained cyclooctyne), an oxime (such as a nitrile oxide precursor), or oxanorbornadiene, for coupling to an azide to form a cycloaddition product (e.g., triazole or isoxazole).
[00425] Thus, in certain embodiments, the click-ready group is selected from an azide, an alkyne, 4-dibenzocyclooctynol (DIBO) gem- difluorinated cyclooctynes (DIFO or DFO), biarylazacyclooctynone (BARAC), bicyclononyne (BCN), a strained cyclooctyne, an oxime, or oxanorb ornadi ene .
3. General Methods of Providing the Present Compounds and Encoded Libraries Thereof
[00426] The compounds of this invention may be prepared or isolated in general by synthetic and/or semi-synthetic methods known to those skilled in the art for analogous compounds and by methods described in detail in the Examples and Figures, herein.
[00427] In the schemes and chemical reactions depicted in the detailed description, Examples, and Figures, where a particular protecting group (“PG”), leaving group (“LG”), or transformation condition is depicted, one of ordinary skill in the art will appreciate that other protecting groups, leaving groups, and transformation conditions are also suitable and are contemplated. Such groups and transformations are described in detail in March’s Advanced Organic Chemistry: Reactions, Mechanisms, and Structure , M. B. Smith and J. March, 5th Edition, John Wiley & Sons, 2001, Comprehensive Organic Transformations , R. C. Larock, 2nd Edition, John Wiley & Sons, 1999, and Protecting Groups in Organic Synthesis , T. W. Greene and P. G. M. Wuts, 3rd edition, John Wiley & Sons, 1999, the entirety of each of which is hereby incorporated herein by reference.
[00428] As used herein, the phrase“leaving group” (LG) includes, but is not limited to, halogens (e.g. fluoride, chloride, bromide, iodide), sulfonates (e.g. mesylate, tosylate, benzenesulfonate, brosylate, nosylate, triflate), diazonium, and the like.
[00429] As used herein, the phrase“oxygen protecting group” includes, for example, carbonyl protecting groups, hydroxyl protecting groups, etc. Hydroxyl protecting groups are well known in the art and include those described in detail in Protecting Groups in Organic Synthesis , T. W. Greene and P. G. M. Wuts, 3rd edition, John Wiley & Sons, 1999, the entirety of which is incorporated herein by reference. Examples of suitable hydroxyl protecting groups include, but are not limited to, esters, allyl ethers, ethers, silyl ethers, alkyl ethers, arylalkyl ethers, and alkoxyalkyl ethers. Examples of such esters include formates, acetates, carbonates, and sulfonates. Specific examples include formate, benzoyl formate, chloroacetate, trifluoroacetate, methoxyacetate, triphenylmethoxyacetate, p-chlorophenoxyacetate, 3-phenylpropionate, 4- oxopentanoate, 4,4-(ethylenedithio)pentanoate, pivaloate (trimethylacetyl), crotonate, 4-methoxy- crotonate, benzoate, p-benzylbenzoate, 2,4,6-trimethylbenzoate, carbonates such as methyl, 9- fluorenylmethyl, ethyl, 2,2,2-trichloroethyl, 2-(trimethylsilyl)ethyl, 2-(phenylsulfonyl)ethyl, vinyl, allyl, and p-nitrobenzyl. Examples of such silyl ethers include trimethylsilyl, triethylsilyl, t-butyldimethylsilyl, t-butyldiphenylsilyl, triisopropylsilyl, and other trialkylsilyl ethers. Alkyl ethers include methyl, benzyl, p-methoxybenzyl, 3,4-dimethoxybenzyl, trityl, t-butyl, allyl, and allyloxycarbonyl ethers or derivatives. Alkoxyalkyl ethers include acetals such as methoxymethyl, methylthiomethyl, (2-methoxyethoxy)methyl, benzyloxymethyl, beta-
(trimethylsilyl)ethoxymethyl, and tetrahydropyranyl ethers. Examples of arylalkyl ethers include benzyl, p-methoxybenzyl (MPM), 3,4-dimethoxybenzyl, O-nitrobenzyl, p-nitrobenzyl, p-halobenzyl, 2,6-dichlorobenzyl, p-cyanobenzyl, and 2- and 4-picolyl.
[00430] Amino protecting groups are well known in the art and include those described in detail in Protecting Groups in Organic Synthesis , T. W. Greene and P. G. M. Wuts, 3rd edition, John Wiley & Sons, 1999, the entirety of which is incorporated herein by reference. Suitable amino protecting groups include, but are not limited to, aralkylamines, carbamates, cyclic imides, allyl amines, amides, and the like. Examples of such groups include t-butyloxycarbonyl (BOC), ethyloxycarbonyl, methyloxycarbonyl, trichloroethyloxycarbonyl, allyloxycarbonyl (Alloc), benzyloxocarbonyl (CBZ), allyl, phthalimide, benzyl (Bn), fluorenylmethylcarbonyl (Fmoc), formyl, acetyl, chloroacetyl, dichloroacetyl, trichloroacetyl, phenylacetyl, trifluoroacetyl, benzoyl, and the like.
[00431] One of skill in the art will appreciate that various functional groups present in compounds of the invention such as aliphatic groups, alcohols, carboxylic acids, esters, amides, aldehydes, halogens and nitriles can be interconverted by techniques well known in the art including, but not limited to reduction, oxidation, esterification, hydrolysis, partial oxidation, partial reduction, halogenation, dehydration, partial hydration, and hydration.“March’s Advanced Organic Chemistry,” 5th Ed., Ed.: Smith, M.B. and March, J., John Wiley & Sons, New York: 2001, the entirety of which is incorporated herein by reference. Such interconversions may require one or more of the aforementioned techniques.
[00432] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference in their entireties for all purposes and to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. EXEMPLIFICATION
[00433] Certain embodiments of the invention having been generally described above, the invention will be further illustrated by the following Examples below. It will be appreciated that, although the Examples depict particular methods, compounds, sequences, assays, conditions, and so on, one of ordinary skill will appreciate that these may be varied according to the knowledge of one of ordinary skill in the art, and that these variations can be applied to the embodiments of the invention described herein.
Example 1: Model System and Initial Ligation Experiment
Summary
[00434] The goal of these experiments was to develop a method that would allow an RNA species to ligate directly to dsDNA with a short overhang of, e.g., 2 bp, thus enabling RT-PCR of the whole product. The dsDNA mimics the DNA barcode on DNA-encoded libraries such as those used by Vipergen, and the RNA mimics a target RNA that is being screened for binding to a small- molecule ligand. Direct ligation of the RNA to the DNA followed by RT-PCR would create a DNA molecule containing both the sequence encoding the small-molecule ligand in the DNA- encoded library and the target RNA to which it bound, enabling convenient multiplexed screening.
[00435] Example 5 described below demonstrates feasibility of this approach using T4 DNA ligase and a helper oligo.
Table 5: Oligonucleotides Used in Ligation Experiments
Figure imgf000095_0001
Figure imgf000096_0001
[00436] As shown in FIG. 4, the“splint” and“ligation partner” form a DNA duplex that is meant to mimic the DNA tag encoding a compound structure in a DNA-encoded library (DEL). The splint oligo forms a 3 '-overhang that is designed to pair with the RNA. The overhang is either 2 bp (DirLig_Splint2bp-l) or 5 bp (DirLig_Splint5bp-l). The 2 bp overhang is designed to mimic the DNA tags in the Vipergen library design. The 5 bp overhang is designed as a positive control, since RNA ligases are reported to ligate across 5 bp splints. The control splints (DirLig_CtlSplint2bp-l and DirLig_CtlSplint5bp-l) have overhang sequences that do not match the RNA.
[00437] The RNA has a 3 '-end that is complementary to the splints. The 5 '-end of the RNA is tagged with a Cy5 fluorescent dye for easy imaging by gel electrophoresis. The RNA has a binding site for a helper oligo next to the region that binds the splint. The helper oligo is designed to enhance stacking interactions with the splint. If the helper oligo is phosphorylated (DirLig_pHelp2bp- 1 and DirLig_pHelp5bp- 1 ) then it can be ligated to the splint to extend the splint and increase the affinity to the RNA.
[00438] After ligation between the RNA and ligation partner, the splint (or splint + helper) acts as a primer for reverse transcription to copy the RNA sequence. At the 5 '-end of the splint and the 5 '-end of the RNA there are primer binding sites for PCR.
First Experiment
Goal
[00439] Determine if T4 RNA ligase 2 (T4 Rnl2) can ligate the RNA to the dsDNA complex. T4 Rnl2 has been reported to ligate the 5 '-phosphate of DNA to the 3 '-hydroxyl of RNA across from a DNA splint (Nandakumar and Shuman, 2004, Mol Cell, 16: 211-221), but not for a 2 bp splint. Method
Pre-annealing of mock DNA barcode
[00440] The ligation partner oligo (DirLig_LigPartner-l) was mixed with the splint oligo (DirLig_Splint2bp-l, DirLig_Splint5bp-l, DirLig_CtlSplint2bp-l, or DirLig_CtlSplint5bp-l) at a concentration of 2 mM of each oligo in water. The solution was heated to 60 °C for 5 min and then cooled on ice for 5 min to anneal the two oligos.
Ligation reaction
[00441] The pre-annealed mock DNA barcode was ligated to the RNA oligo (DirLig RNA-l) using T4 Rnl2 from New England Biolabs (catalog # M0239) with 2.67 pM DNA, 13.3 pM RNA, IX T4 Rnl2 buffer, and 1 U/pL T4 Rnl2 at a total volume of 7.5 pL. The reaction was incubated at 37 °C for 30 min and then quenched by adding 7.5 pL 2X TBE-urea sample loading buffer (Bio- Rad catalog # 1610768) and heating to 95 °C for 5 min.
PAGE analysis
[00442] For each sample, 13 pL was analyzed on a pre-cast, 8.6 x 6.7 cm, 10% TBE-urea PAGE gel (Bio-Rad catalog # 4566036). A sample containing 5 pmol of DirLig_RNA-l was included for reference. The gel was imaged on an Azure c600 gel imaging system (Azure Biosystems) in the Cy5 channel with auto-exposure.
Results and Conclusions
[00443] As shown in FIG. 5, successful ligation was observed with the 5 bp splint (DirLig_Splint5bp-l), but not the 2 bp splint (DirLig_Splint2bp-l). The control splints (DirLig_CtlSplint2bp-l and DirLig_CtlSplint5bp-l) did not enable ligation.
[00444] These results indicate that the 2 bp splint is less efficient for ligation under standard conditions.
Example 2: Modified Ligation Conditions
Goal
[00445] Test alternative ligation conditions (lower temperature, presence of helper oligo, presence of PEG crowing agent) to see if they enable ligation across the 2 bp splint with T4 Rnl2.
Method
Pre-annealing of mock DNA barcode [00446] The ligation partner oligo (DirLig_LigPartner-l) was mixed with the splint oligo (DirLig_Splint2bp-l, DirLig_Splint5bp-l, or DirLig_CtlSplint2bp-l) at a concentration of 4 pM each oligo in water. The solution was heated to 70 °C for 5 min and then cooled on ice for 5 min to anneal the two oligos.
Ligation reaction
[00447] The pre-annealed mock DNA barcode was ligated to the RNA oligo (DirLig_RNA-l) using T4 Rnl2 from New England Biolabs (catalog # M0239) with 1 pM DNA, 2 pM RNA, IX T4 Rnl2 buffer, and 1 U/pL T4 Rnl2 at a total volume of 10 pL. In addition, some reactions contained PEG or a helper oligo according to the table below. The reactions were incubated at 22 or 37 °C, according to the table below, for 2 h and then quenched by adding 10 pL of 2X TBE- urea sample loading buffer (Bio-Rad catalog # 1610768) and heating to 95 °C for 5 min.
Table 6: Modified Ligation Conditions Experiments
Figure imgf000098_0001
PAGE analysis
[00448] For each sample, 13 pL was analyzed on a pre-cast, 8.6 x 6.7 cm, 10% TBE-urea PAGE gel (Bio-Rad catalog # 4566036). A sample containing 5 pmol of DirLig_RNA-l was included for reference. The gel was imaged on an Azure c600 gel imaging system (Azure Biosystems) in the Cy5 channel with auto-exposure.
Results and Conclusions [00449] As shown in FIG. 6, the inclusion of 5% PEG400 or a helper oligo with incubation at 22 °C did not enable observable ligation across the 2 bp splint.
Example 3: Further Ligation Experiments With Lower Temperatures, SplintR Ligase, and
Crowding Agents
Goal
[00450] In these experiments, we tested lower temperatures (4 and 15 °C), the SplintR ligase, and high amounts of crowding agents (20% PEG4000) to see if they enhance ligation across the 2 bp splint.
Methods
[00451] These experiments were performed as Example 2, except that T4 Rnl2 was used at 0.5 U/pL and SplintR was used at 0.25 U/pL. The incubation temperature, ligase, PEG, and splint used in each experiment are outlined in the table below.
Table 7: Experimental Conditions
Figure imgf000099_0001
Results and Conclusions [00452] As shown in FIG. 7, there was a minimal amount of ligation with 20% PEG4000 for both enzymes, but no efficient ligation across the 2 bp splint in any condition.
Example 4: Ligation Experiments Including Helper Oligos and Combinations of Ligases Goal
[00453] In these experiments, we tested a combined DNA (splint-helper) and RNA (RNA- ligation partner) ligation scheme to enable more efficient RNA ligation. DNA ligases are known to ligate across shorter splints, so after ligating the splint to the helper they can create a larger splint for RNA ligation. SplintR was tested in combination with T4 and T7 DNA ligase. T4 DNA ligase was also tested alone, since it is known to ligate RNA to DNA in some contexts.
Methods
[00454] We performed the experiments as per Examples 2 and 3 above, except that different combinations of buffers, enzymes, and helper oligos were used according to the table below. All reactions were incubated at 22 °C for 1 h.
Table 8: Experimental Conditions
Figure imgf000100_0001
Calculation of ligation efficiency
[00455] To calculate the labeling efficiency, the intensity of the Cy5 bands representing the ligated and unligated RNA species were integrated using ImageJ software and the intensity of the ligated RNA was divided by half the intensity of the unligated RNA (since the RNA was added in 2-fold excess over the DNA). Results and Conclusions
[00456] As shown in FIG. 8, the use of DNA ligases and a ligatable helper oligo enabled efficient ligation of the RNA to DNA across a 2 bp splint. The best condition was T4 DNA ligase with no RNA ligase.
Example 5: Verification of Ligation Mechanism and RT-PCR
Goal
[00457] In these experiments, we verified that ligation via T4 DNA ligase is working through the expected mechanism by comparing to negative controls lacking individual components of the reaction. We also demonstrated that the ligated product can be amplified by RT-PCR.
Methods
Pre-annealing of mock DNA barcode
[00458] The ligation partner oligo (DirLig_LigPartner-l) was mixed with the splint oligo (DirLig_Splint2bp-l, DirLig_Splint5bp-l, or DirLig_CtlSplint2bp-l) at a concentration of 4 mM each oligo in water. The solution was heated to 70 °C for 5 min and then cooled on ice for 5 min to anneal the two oligos.
Ligation reaction
[00459] The pre-annealed mock DNA barcode, at a final concentration of 1 mM, was mixed with 2 pM RNA oligo (DirLig_RNA-l) and 2 pM helper oligo (DirLig_pHelp2bp- 1 ) in IX T4 DNA ligase buffer with 20 U/pL T4 DNA ligase from New England Biolabs (catalog # M0202) at a total volume of 10 pL. The reaction was incubated at 22 °C for 2 h. A sample for gel analysis was prepared by taking 7.5 pL of the reaction, adding 7.5 pL of 2X TBE-urea sample loading buffer (Bio-Rad catalog # 1610768), and then heating to 95 °C for 5 min.
[00460] Negative control reactions were prepared as above, except with no helper oligo, the mispaired splint DirLig_CtlSplint2bp-l, or no ligase.
[00461] For reference, pre-annealed mock DNA barcode containing DirLig_Splint5bp-l was mixed with 2 pM RNA oligo (DirLig_RNA-l) in IX T4 Rnl2 buffer with or without 0.5 U/pL T4 Rnl2 from New England Biolabs (catalog # M0239) at a total volume of 10 pL. The reactions were incubated and samples prepared in the same manner as the others.
PAGE analysis [00462] For each reaction, the entire 15 pL sample was analyzed on a pre-cast, 8.6 x 6.7 cm, 10% TBE-urea PAGE gel (Bio-Rad catalog # 4566036). After running the gel, the gel was stained with IX SYBR-gold in IX TBE buffer for 15 min. The gel was imaged on an Azure c600 gel imaging system (Azure Biosystems) in the Cy5 channel for the labeled RNA, as well as the Cy3 channel for the SYBR-gold stain with auto-exposure.
[00463] To calculate the labeling efficiency, the intensity of the Cy5 bands representing the ligated and unligated RNA species were integrated using ImageJ software and the intensity of the ligated RNA was divided by half the intensity of the unligated RNA (since the RNA was added in 2-fold excess over the DNA).
Reverse transcription and PCR
[00464] The ligation product was reverse transcribed using Superscript III according to the standard protocol (Thermo catalog # 18080093). Briefly, the reaction mixture was composed of IX first strand buffer, 0.5 mM dNTPs, 5 mM DTT, 1 U/pL Superase-In RNase Inhibitor (Thermo catalog # AM2694), and 10 U/pL superscript III with 0.6 pL ligation reaction in a total volume of 20 pL. The reaction was incubated at 55 °C for 30 min and then heat inactivated at 75 °C for 15 min. No primer was added since the splint or splint-helper ligation product acted as the primer.
[00465] After reverse transcription, each reaction was PCR amplified using standard Taq DNA polymerase (Thermo catalog # 10342020) according to the standard protocol. Briefly, the reaction mixture was composed of IX PCR buffer, 1.5 mM MgCh, 0.2 mM dNTPs, 0.2 pM DirLig For- 1, 0.2 pM DirLig_Rev-l, and 0.04 U/pL Taq DNA polymerase with 2 pL reverse transcription reaction in a total volume of 50 pL. The PCR method was 94 °C for 3 min; 35 cycles of 94 °C for 45 s, 55 °C for 30 s, and 72 °C for 30 s; and finally 72 °C for 3 min. A 2 pL sample of each PCR reaction was run on a 2% agarose E-gel (Thermo catalog # G402002) and then imaged by epi- fluorescence on an Azure c600 gel imager.
Results
[00466] FIG. 9 shows (panel A) 10% PAGE results of ligation reaction with Cy5-labeled RNA in red (appearing as dark grey spots in the picture of the PAGE gel as shown) and SYBR-gold- stained RNA and DNA in green (appearing as light grey spots in the gel picture). Panel B shows 2% agarose E-gel analysis of the RT-PCR reactions, with the Exactgene mini-DNA ladder in lane M and the desired product indicated with an arrow. Panel C provides a description of samples 1- 6 from panels A and B. [00467] Ligation of the RNA to the DNA barcode across a 2 bp splint using T4 DNA ligase and a helper DNA oligo was confirmed. The ligation product gave the desired PCR product after RT- PCR. Under the conditions tested here, the ligation and RT-PCR reactions worked best with the helper oligo, correct splint sequence, and enzyme for the 2 bp splint. For the 5 bp splint with T4 Rnl2 and no helper oligo, ligation and RT-PCR were also successful. Notably, the splint was able to act as a primer for RT-PCR even without ligation, as indicated by sample 6 lacking T4 Rnl2.
[00468] While we have described a number of embodiments of this invention, it is apparent that our basic examples may be altered to provide other embodiments that utilize the compounds, compositions, and methods of this invention. Therefore, it will be appreciated that the scope of this invention is to be defined by the appended claims rather than by the specific embodiments that have been represented by way of example.

Claims

CLAIMS We claim:
1. A method of ligating an RNA strand to a DNA strand, comprising:
(i) providing a partially double-stranded DNA molecule having a 3 '-overhang comprising at least one nucleotide, wherein the 3 '-overhang has at least partial sequence complementarity to the 3 '-end of the RNA strand; and
(ii) contacting the RNA strand and the partially double- stranded DNA molecule with at least one ligase, wherein the at least one ligase catalyzes ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand;
thereby producing a ligated product comprising the RNA strand ligated to the DNA strand.
2. The method of claim 1, wherein the partially double-stranded DNA molecule is produced by ligating together shorter fragments of DNA; or is produced by the steps of:
(i) ligating together single-stranded DNA fragments; and
(ii) performing primer extension.
3. The method of claim 1, wherein the partially double-stranded DNA molecule is produced by ligating together shorter fragments of DNA; or is produced by the steps of:
(i) ligating together single-stranded DNA fragments;
(ii) performing primer extension; and
(iii) ligating to the DNA strand or appending by chemical synthesis a single stranded DNA sequence having at least partial sequence complementarity to the 3 '-end of the RNA strand.
4. The method of claim 1, wherein the partially double-stranded DNA molecule is produced by hybridizing the DNA strand to a single-stranded DNA splint having at least partial sequence complementarity to the DNA strand.
5. The method of claim 1, wherein step (ii) further comprises contacting the RNA strand with a helper oligonucleic acid, wherein the helper oligonucleic acid has at least partial sequence complementarity to a region of the RNA strand adjacent to the portion of the RNA strand that has sequence complementarity to the 3 '-overhang.
6. The method of claim 5, wherein the helper oligonucleic acid is an oligonucleotide.
7. The method of claim 6, wherein the helper oligonucleotide is DNA, RNA, or LNA.
8. The method of claim 7, wherein the helper oligonucleotide further comprises one or more nucleotide modifications selected from:
(i) a sugar modification selected from 2'-OMe, 2'-F, 2',2'-difluoro, 2 '-Me, 2'- methoxyethyl, 2'-propyl, or replacement of a ribose or deoxyribose with an arabinose sugar, a BNA (Bridged Nucleic Acid) sugar, an LNA (Locked Nucleic Acid) sugar, or an ENA (2'-0,4'-C-ethylene-bridged nucleic acid) sugar;
(ii) a base modification selected from pseudouracil, 2-methyladenine, 2,6- diaminopurine, 2-C1 adenine, 2-F adenine, 5-azauracil, 5-azacytidine, N2- methylguanine, N7-methyl guanine, N6-methyladenine, or a C-nucleobase (7- deazapurine or l-deazapyrimidine); or
(iii) a nucleoside modification selected from 2'-Deoxypseudouridine, 2'-Deoxyuridine,
2-Thiothymidine, 4-Thio-2'-deoxyuridine, 4-Thiothymidine, 5' Aminothymidine, 5-(l-Pyrenylethynyl)-2'-deoxyuridine, 5-(Carboxy)vinyl-2'-deoxyuridine, 5,6- Dihydro-2'-deoxyuridine, 5-Bromo-2'-deoxycytidine, 5-Bromo-2'-deoxyuridine, 5-Carboxy-2'-deoxycytidine, 5-Fluoro-2'-deoxyuridine, 5-Formyl -2'- deoxycytidine, 5 -Hydroxy-2 '-deoxycyti dine, 5-Hydroxy-2'-deoxyuridine, 5- Hydroxym ethyl -2'-deoxycyti dine, 5 -Hydroxymethyl -2 '-deoxyuri dine, 5-
Hydroxybutynl-2 '-deoxyuri dine, 5 -Iodo-2 '-deoxycyti dine, 5 -Iodo-2 '-deoxyuri dine, 5-Methyl-2 '-deoxycyti dine, 5-Methyl-2'-deoxyisocytidine, 5-Propynyl-2'- deoxycytidine, 5-Propynyl-2'-deoxyuridine, 2-Aminopurine-2'-deoxyriboside, 6- Thio-2'-deoxyguanosine, 7-Deaza-2'-deoxyguanosine, 7-Deaza-2'- deoxyxanthosine, 7-Deaza-8-aza-2'-deoxyadenosine, 2,6-diaminopurine-riboside, 2-Aminopurine-riboside, Pseudouridine, Puromycin, Pyrrol ocyti dine, 2,6- diaminopurine-2'-0-methylriboside, N6-diaminopurine-riboside, 2-aminopurine- 2'-0-methylriboside, 2'-0-Methylinosine, 3-Deaza-5-Aza-2'-0-methylcytidine, 5- Bromo-2'-0-methyluridine, 5-Fluoro-2'-0-Methyluridine, 5-Fluoro-4-0-TMP-2'- O-Methyluridine, 5-Methyl-2'-0-Methylcytidine, 5-Methyl-2'-0-
Methylthymidine, 2' Deoxynebularine, 2'-Deoxyinosine, 2'-Deoxyisoguanosine, 3- Nitropyrrole-2'-deoxyribose, 5-Nitroindole-2'-deoxyriboside, pyridin-4-one ribonucleoside, 5-aza-uridine, 2-thio-5-aza-uridine, 2-thiouridine, 4-thio- pseudouridine, 2-thio-pseudouridine, 5-hydroxyuridine, 3-methyluridine, 5- carboxymethyl-uridine, l-carboxym ethyl-pseudouridine, 5-propynyl -uridine, 1- propynyl-pseudouridine, 5-taurinomethyluridine, 1 -taurinomethyl-pseudouridine, 5-taurinomethyl-2-thio-uridine, 1 -taurinomethyl-4-thio-uridine, 5-methyl -uridine, 1 -methylpseudouridine, 4-thio- 1 -methyl-pseudouridine, 2-thio- 1 -methyl- pseudouridine, 1 -methyl- l-deaza-pseudouri dine, 2-thio- 1 -methyl- l-deaza- pseudouridine, dihydrouridine, dihydropseudouridine, 2-thio-dihydrouridine, 2- thio-dihydropseudouridine, 2-methoxyuridine, 2-methoxy-4-thio-uridine, 4- methoxy-pseudouridine, 4-methoxy-2-thio-pseudouridine, 5-aza-cytidine, pseudoisocytidine, 3-methyl-cytidine, N4-acetylcytidine, 5-formylcytidine, N4- methylcytidine, 5 -hydroxymethyl cyti dine, 1 -methyl -pseudoisocytidine, pyrrolo- cytidine, pyrrolo-pseudoisocytidine, 2-thio-cytidine, 2-thio-5-methyl-cytidine, 4- thio-pseudoisocytidine, 4-thio- 1 -methyl -pseudoisocytidine, 4-thio- 1 -methyl- 1 - deaza-pseudoisocytidine, l-methyl-l-deaza-pseudoisocytidine, zebularine, 5-aza- zebularine, 5-methyl-zebularine, 5-aza-2-thio-zebularine, 2-thio-zebularine, 2- methoxy-cytidine, 2-methoxy-5-methyl-cytidine, 4-methoxy-pseudoisocytidine, 4- methoxy-l -methyl -pseudoisocytidine, 2-aminopurine, 2, 6-diaminopurine, 7- deaza-adeninosine, 7-deaza-deoxyadenosine, 7-deaza-8-aza-adenine, 7-deaza-2- aminopurine, 7-deaza-8-aza-2-aminopurine, 7-deaza-2, 6-diaminopurine, 7-deaza- 8-aza-2, 6-diaminopurine, l-methyladenosine, N6-methyladenosine, N6- isopentenyladenosine, N6-(cis-hydroxyisopentenyl)adenosine, 2-methylthio-N6- (cis-hydroxyisopentenyl) adenosine, N6-glycinylcarbamoyladenosine, N6- threonylcarbamoyladenosine, 2-methylthio-N6-threonyl carbamoyladenosine, N6,N6-dimethyladenosine, 7-methyl adenine, 2-methylthio-adenine, 2-methoxy- adenine, inosine, l-methyl-inosine, wyosine, wybutosine, 7-deaza-guanosine, 7- deaza-8-aza-guanosine, 6-thio-guanosine, 6-thio-7-deaza-guanosine, 6-thio-7- deaza-8-aza-guanosine, 7-methyl-guanosine, 6-thio-7-methyl-guanosine, 7- methylinosine, 6-methoxy-guanosine, 1 -methyl guanosine, N2-methylguanosine, N2,N2-dimethylguanosine, 8-oxo-guanosine, 7-methyl-8-oxo-guanosine, 1- methyl-6-thio-guanosine, N2-methyl-6-thio-guanosine, 2'-3'-dideoxyadenine, 2'- 3 '-dideoxycytidine, 2'-3'-dideoxyguanosine, 2'-3'-dideoxythymidine, inverted thymidine, or N2,N2-dimethyl-6-thio-guanosine.
9. The method of claim 5, wherein the helper oligonucleic acid further comprises a modification to the phosphate backbone selected from boranophosphate, methylphosphonate, P-ethoxy, phosphonoacetate, phosphorothioate, or phosphorodithioate.
10. The method of claim 5, wherein the helper oligonucleic acid is PNA or a morpholino oligomer.
11. The method of any one of claims 5-10, wherein the at least one ligase catalyzes ligation of the 3 '-overhang to the 5 '-end of the helper oligonucleic acid.
12. The method of any one of claims 5-11, wherein a first ligase and a second ligase are used in step (ii); the first ligase catalyzes ligation of the 3 '-overhang and the 5 '-end of the helper oligonucleic acid; and the second ligase catalyzes ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand.
13. The method of any one of claims 5-12, wherein the helper oligonucleic acid hybridizes to the RNA strand and facilitates the ligation of the 3 '-end of the RNA strand and the 5 '-end of the DNA strand.
14. The method of any one of claims 5-13, further comprising, before or after step (i), hybridizing the helper oligonucleic acid to the RNA strand under appropriate conditions to effect the hybridization.
15. The method of any one of claims 1-14, wherein the DNA strand comprises at the 5'-end a phosphate or analog thereof capable of participating in ligation.
16. The method of any one of claims 1-15, wherein the RNA strand comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
17. The method of any one of claims 4-16, wherein the single-stranded DNA splint comprises at the 3 '-end a phosphate or analog thereof capable of participating in ligation.
18. The method of any one of claims 5-17, wherein the helper oligonucleotide comprises at the 5 '-end a phosphate or analog thereof capable of participating in ligation.
19. The method of any one of claims 15-18, wherein the phosphate or analog thereof capable of participating in ligation has been added by chemical synthesis.
20. The method of any one of claims 15-19, wherein the phosphate or analog thereof capable of participating in ligation is a phosphate group.
21. The method of claim 20, wherein the phosphate group has been added from phosphorylation by a kinase.
22. The method of claim 21, wherein the phosphorylation is performed before step (ii) is performed.
23. The method of claim 21, wherein the kinase is allowed to contact the DNA strand during step (ii).
24. The method of claim 15, wherein the phosphate or analog thereof capable of participating in ligation is a 5 '-adenosine diphosphate group.
25. The method of any one of claims 1 -24, wherein the ligated product is a template for reverse transcription.
26. The method of claim 25, wherein the ligated product comprises at least one binding site for a primer sequence for a reverse transcriptase.
27. The method of any one of claims 1-24, wherein the ligated product is a template for PCR.
28. The method of claim 27, wherein the ligated product comprises at least one binding site for a primer for PCR.
29. The method of any one of claims 1-28, further comprising the step of ligating a single- stranded oligonucleic acid to the 5 '-end of the RNA strand, or to the DNA strand, thereby producing an extended ligated product.
30. The method of claim 29, wherein the extended ligated product is a template for reverse transcription.
31. The method of claim 30, wherein the extended ligated product comprises at least one binding site for a primer sequence for a reverse transcriptase.
32. The method of any one of claims 29-31, wherein the extended ligated product is a template for PCR.
33. The method of claim 31, wherein the extended ligated product comprises at least one binding site for a primer for PCR.
34. The method of any one of claims 1-33, wherein the partially double-stranded DNA molecule is a member of a DNA-encoded library (DEL).
35. The method of claim 34, wherein the partially double-stranded DNA molecule comprises a sequence that encodes the identity of a small molecule member of a DNA-encoded library (DEL).
36. The method of any one of claims 1-35, wherein the RNA strand is 30-1,000 nucleotides in length.
37. The method of any one of claims 1-36, wherein the 3'-overhang is 1-20 nucleotides.
38. The method of claim 37, wherein the 3 '-overhang is 2-10 nucleotides.
39. The method of claim 38, wherein the 3 '-overhang is 2, 3, 4, or 5 nucleotides.
40. The method of claim 39, wherein the 3 '-overhang is 2 or 3 nucleotides.
41. The method of any one of claims 5-40, wherein the helper oligonucleic acid is at least 5 nucleotides in length.
42. The method of claim 41, wherein the helper oligonucleic acid is about 10 to about 75 nucleotides in length.
43. The method of claim 42, wherein the helper oligonucleic acid is 10-50, 12-30, 14-25, 16- 22, 17-20, 18-19, or 18 nucleotides in length.
44. The method of claim 42, wherein the helper oligonucleic acid is about 10 to about 50, about 12 to about 30, about 14 to about 25, about 16 to about 22, about 17 to about 20, about 18 to about 19, or about 18 nucleotides in length.
45. The method of any one of claims 1-44, wherein the at least one ligase is selected from T4 RNA ligase 2, SplintR, ElectroLigase®, T4 DNA ligase, T3 DNA ligase, T4 RNA ligase
1, PBCV-l ligase, RtcB Ligase, bacteriophage TS2126 ligase, PBCV-l ligase, Taq DNA Ligase, HiFi Taq DNA Ligase, or 9°N™ DNA Ligase, CircLigase RNA ligase, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof.
46. The method of claim 45, wherein the at least one ligase is selected from T4 RNA ligase 2, SplintR, T4 DNA ligase, or T3 DNA ligase.
47. The method of claim 46, wherein the at least one ligase is T4 RNA ligase 2.
48. The method of claim 46, wherein the at least one ligase is SplintR.
49. The method of claim 46, wherein the at least one ligase is T4 DNA ligase or T3 DNA ligase.
50. The method of claim 12, wherein the first ligase is selected from T4 RNA ligase 2, SplintR, T4 DNA ligase, T3 DNA ligase, T4 RNA ligase 1, Ampligase, 5' App DNA/RNA ligase, T7 DNA ligase, E. coli DNA ligase, CircLigase I ssDNA ligase, or CircLigase II ssDNA ligase; or a truncated version thereof; and the second ligase is selected from T4 RNA ligase
2, SplintR, T4 DNA ligase, T4 RNA ligase I, RtcB Ligase, PBCV-l ligase, CircLigase RNA ligase, 5' App DNA/RNA ligase, or a truncated version thereof.
51. The method of claim 46, wherein the at least one ligase is a combination of T4 DNA ligase and SplintR.
52. The method of any one of claims 1-51, wherein step (ii) further comprises adding a crowding agent selected from polyethylene glycol (PEG), Ficoll, dextran, or albumin.
53. The method of any one of claims 1-52, wherein step (ii) is performed at about 2-50 °C.
54. The method of any one of claims 1-53, wherein step (ii) is performed at about 4, 12, 16, 22, or 37 °C.
55. The method of any one of claims 1-54, wherein step (ii) is performed in a reaction buffer comprising about 25-300 mM salt.
56. A ligation product prepared by the method of any one of claims 1-55.
57. A composition comprising:
(i) an RNA strand comprising at least a portion of a biologically relevant target RNA; and
(ii) an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 3 '-overhang of 2-5 nucleotides.
58. The composition of claim 57, wherein the 3 '-overhang has sequence complementarity to the 3 '-end of the RNA strand.
59. The composition of claim 57 or 58, further comprising one or more ligases capable of ligating the RNA strand to the DNA molecule.
60. A partially double-stranded RNA-DNA ligation product comprising:
(i) an RNA strand comprising at least a portion of a biologically relevant target RNA, or a homolog, isoform, or analog thereof; and
(ii) an at least partially double- stranded synthetic DNA molecule comprising a DNA strand at least partially hybridized to a single- stranded oligonucleic acid and comprising a 3 '-overhang of 2-5 nucleotides;
wherein the 3 '-end of the RNA strand and the 5 '-end of the DNA strand have been ligated to form a contiguous sequence.
61. An enriched DNA-encoded library (DEL) comprising library members comprising:
(i) a DNA barcode allowing identification of each library member;
(ii) a small molecule covalently conjugated to the DNA barcode; and
(iii) a nucleic acid target ligated to the DNA barcode.
62. A method of producing an enriched DNA-encoded library (DEL), comprising:
(i) providing a DEL of small molecules covalently conjugated to DNA barcodes;
(ii) contacting the DEL with a nucleic acid target under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes;
(iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture;
(iv) contacting the enriched mixture of step (iii) with an aqueous emulsion preparation under conditions selected to allow compartmentalization of at least one complex into at least one aqueous emulsion droplet;
(v) ligating the DEL and the nucleic acid target of the at least one complex in the at least one aqueous emulsion droplet of step (iv) to form at least one ligated product;
(vi) optionally, disrupting the at least one aqueous emulsion droplet of step (v); and
(vii) optionally, isolating the at least one ligated product.
63. A method of producing an enriched DNA-encoded library (DEL), comprising:
(i) providing a DEL of small molecules covalently conjugated to DNA barcodes;
(ii) contacting the DEL with a nucleic acid target under conditions selected to allow binding between the small molecules and the nucleic acid target to form a mixture of complexes;
(iii) performing a high factor dilution of the mixture of step (ii) and incubating for a time period and at a temperature sufficient for the dissociation of at least one weakly-associated complex in the mixture, thereby producing an enriched mixture;
(iv) ligating the DEL and the nucleic acid target of the at least one complex to form at least one ligated product; (v) optionally, isolating the at least one ligated product.
64. The method of claim 62 or 63, wherein the high factor dilution of step (iii) is 1 :2 to
1 : 10,000.
65. The method of any one of claims 62-64, wherein the time period of step (iii) is 1 minute to 48 hours.
66. The method of any one of claims 62-65, wherein the temperature of step (iii) is 4 °C to 65 °C.
67. The method of any one of claims 62-67, wherein the ligation is performed according to the method of any one of claims 1-55.
68. The method of claim 62, wherein step (vi) is performed by contacting with at least one reagent selected from a surfactant, an alcohol, or a halogenated hydrocarbon solvent.
69. The method of claim 62 or 68, wherein the aqueous emulsion preparation in step (iv) is a water-in-oil emulsion.
70. The method of claim 69, wherein the surfactant is selected from Triton X-100, or Tween 80.
71. The method of any one of claims 62-70, wherein the nucleic acid target is an RNA, or a homolog, isoform, chimera, fragment, or analog thereof.
72. The method of any one of claims 62 or 68-69, wherein step (iv) is performed using Binder Trap Enrichment® (BTE).
73. The method of claim 72, wherein the compartmentalization in step (iv) creates more compartments than there are members of the DEL in the sample.
74. The method of any one of claims 62-73, wherein the small molecules are covalently bound to their DNA barcodes by an amino-thiol linkage.
75. A method of processing a sample from an enriched DEL, comprising:
(i) providing an enriched DEL, wherein the DEL is enriched to bind a nucleic acid target;
(ii) performing a PCR amplification of the enriched DEL to form amplified products; and
(iii) sequencing the amplified products from step (ii) to produce a DEL library screen result.
76. The method of claim 75, further comprising, before step (ii), ligating a single-stranded oligonucleic acid to the 5 '-end of the RNA strand of the enriched DEL or to the DNA strand of the enriched DEL.
77. The method of claim 75 or 76, further comprising, before step (ii), contacting the enriched DEL with a reverse transcriptase (RT) to form an enriched DEL cDNA of the nucleic acid target.
78. A composition comprising a plurality of enriched DEL cDNAs produced according to the method of claim 77.
79. The method of any one of claims 75-77, wherein the sequencing in step (iii) is selected from microarray-based sequencing or high-throughput sequencing.
80. A method of screening an encoded library against a nucleic acid target, comprising:
(i) providing a DNA-encoded library (DEL) of small molecules covalently conjugated to DNA barcodes;
(ii) contacting the DEL with a nucleic acid target under conditions selected to allow the small molecules to bind to the nucleic acid target; (iii) performing an in vitro compartmentalization and/or high-factor dilution on at least a portion of the mixture from (ii);
(iv) performing a ligation of the nucleic acid target to a DNA barcode whose covalently conjugated small molecule has bound to the nucleic acid target, thus producing a ligated nucleic acid; and
(v) optionally, contacting the ligated nucleic acid with a reverse transcriptase (RT) under conditions selected such that the RT synthesizes a complementary DNA strand to the nucleic acid target to produce a double-stranded, ligated nucleic acid.
81. The method of claim 80, further comprising the step of:
(vi) performing a polymerase chain reaction (PCR) amplification of the double- stranded, ligated nucleic acid.
82. The method of claim 81, further comprising the step of:
(vii) sequencing the mixture of products obtained from the PCR amplification in (vi) to produce a DEL library screen result.
83. The method of any one of claims 80-82, wherein the nucleic acid target is an RNA, or a homolog, isoform, mutant, chimera, fragment, or analog thereof.
84. The method of any one of claims 80-83, wherein the in vitro compartmentalization is an emulsion technique.
85. The method of any one of claims 80-84, wherein the in vitro compartmentalization is Binder Trap Enrichment® (BTE).
86. The method of any one of claims 80-85, wherein the in vitro compartmentalization creates more compartments than there are members of the DEL in the sample.
87. The method of any one of claims 80-83, wherein a high-factor dilution is performed instead of an in vitro compartmentalization technique, and the dilution enables proximity-based ligation.
88. The method of any one of claims 80-87, wherein the DNA barcode is attached to a nucleotide overhang complementary to the 3 '-end of the nucleic acid target.
89. The method of any one of claims 80-88, wherein the overhang is 1-20 nucleotides.
90. The method of any one of claims 80-89, wherein the small molecules are covalently bound to their DNA barcodes by an amino-thiol linkage.
91. The method of any one of claims 80-90, wherein the method further comprises phosphorylating the 3' end of the nucleic acid target prior to the ligation of step (iv).
92. The method of any one of claims 80-86, wherein step (iv) further comprises breaking up the compartments created by an in vitro compartmentalization.
93. The method of any one of claims 80-92, wherein step (ii) is performed in an aqueous solution.
94. The method of any one of claims 80-93, wherein the method further comprises purifying the ligated nucleic acid and/or double-stranded, ligated nucleic acid.
95. An enriched DEL cDNA produced by the method of any one of claims 80-94.
96. A composition comprising a plurality of enriched DEL cDNA molecules that encode an enriched DEL, wherein the enriched DEL is produced according to the method of any one of claims 80-95.
97. A method of performing a multiplexed DEL screen, comprising: (i) providing a DNA-encoded library (DEL) of small molecules covalently conjugated to DNA barcodes;
(ii) contacting the DEL with a plurality of nucleic acid targets that have different sequences under conditions selected to allow the small molecules to bind to the nucleic acid targets;
(iii) performing an in vitro compartmentalization and/or high-factor dilution on at least a portion of the mixture from (ii);
(iv) performing a ligation of at least one of the nucleic acid targets to a DNA barcode whose covalently conjugated small molecule has bound to the nucleic acid target, thus producing a ligated nucleic acid; and
optionally, contacting the ligated nucleic acid with a reverse transcriptase (RT) under conditions selected such that the RT synthesizes a complementary DNA strand to the nucleic acid target to produce a double-stranded, ligated nucleic acid.
98. The method of claim 97, wherein at least 10 nucleic acid targets of different sequence are screened in parallel.
99. A kit for producing a ligated product comprising an RNA strand ligated to a DNA strand, comprising:
(i) an oligonucleic acid splint molecule;
(ii) at least one ligase;
(iii) a buffer comprising buffering molecule, a chloride salt of a divalent cation, and ATP;
(iv) optionally, a helper oligonucleotide; and
(v) directions for producing a ligated product.
PCT/US2019/035481 2018-06-05 2019-06-05 Encoded libraries and methods of use for screening nucleic acid targets WO2019236644A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862680946P 2018-06-05 2018-06-05
US62/680,946 2018-06-05

Publications (1)

Publication Number Publication Date
WO2019236644A1 true WO2019236644A1 (en) 2019-12-12

Family

ID=68769560

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/035481 WO2019236644A1 (en) 2018-06-05 2019-06-05 Encoded libraries and methods of use for screening nucleic acid targets

Country Status (1)

Country Link
WO (1) WO2019236644A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508563A (en) * 2020-05-22 2020-08-07 四川大学华西医院 Cancer-related alternative splicing database system of long non-coding RNA
CN111620921A (en) * 2020-05-25 2020-09-04 上海药明康德新药开发有限公司 Method for preparing On-DNA amide compound through oxidative amidation in construction of DNA coding compound library
CN113897414A (en) * 2021-10-11 2022-01-07 湖南大地同年生物科技有限公司 Trace nucleic acid library construction method
WO2023221234A1 (en) * 2022-05-18 2023-11-23 清华大学 In vitro screening method for covalent inhibitor and use thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001007657A1 (en) * 1999-07-27 2001-02-01 Phylos, Inc. Peptide acceptor ligation methods
WO2006071776A2 (en) * 2004-12-23 2006-07-06 Ge Healthcare Bio-Sciences Corp. Ligation-based rna amplification
WO2009077173A2 (en) * 2007-12-19 2009-06-25 Philochem Ag Dna-encoded chemical libraries
WO2014100473A1 (en) * 2012-12-21 2014-06-26 New England Biolabs, Inc. A novel ligase avtivity
WO2017136450A2 (en) * 2016-02-01 2017-08-10 Arrakis Therapeutics, Inc. Compounds and methods of treating rna-mediated diseases

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001007657A1 (en) * 1999-07-27 2001-02-01 Phylos, Inc. Peptide acceptor ligation methods
WO2006071776A2 (en) * 2004-12-23 2006-07-06 Ge Healthcare Bio-Sciences Corp. Ligation-based rna amplification
WO2009077173A2 (en) * 2007-12-19 2009-06-25 Philochem Ag Dna-encoded chemical libraries
WO2014100473A1 (en) * 2012-12-21 2014-06-26 New England Biolabs, Inc. A novel ligase avtivity
WO2017136450A2 (en) * 2016-02-01 2017-08-10 Arrakis Therapeutics, Inc. Compounds and methods of treating rna-mediated diseases

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IYER, EPR ET AL.: "Barcoded Oligonucleotides Ligated on RNA Amplified for Multiplex and Parallel In-Situ Analyses", BIORXIV, 13 March 2018 (2018-03-13), pages 1 - 49, XP055661867, DOI: 10.1101/281121 *
MINIKEL, E: "How to do a DNA-encoded Library Selection", CUREFFI.ORG, 17 November 2016 (2016-11-17), pages 1 - 15, XP055661873, Retrieved from the Internet <URL:http://www.cureffi.org/2016/11/17/how-to-do-a-dna-encoded-library-selection> *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508563A (en) * 2020-05-22 2020-08-07 四川大学华西医院 Cancer-related alternative splicing database system of long non-coding RNA
CN111508563B (en) * 2020-05-22 2023-04-18 四川大学华西医院 Cancer-related alternative splicing database system of long non-coding RNA
CN111620921A (en) * 2020-05-25 2020-09-04 上海药明康德新药开发有限公司 Method for preparing On-DNA amide compound through oxidative amidation in construction of DNA coding compound library
CN111620921B (en) * 2020-05-25 2023-06-13 上海药明康德新药开发有限公司 Method for preparing On-DNA amide compound by oxidative amidation in construction of DNA coding compound library
CN113897414A (en) * 2021-10-11 2022-01-07 湖南大地同年生物科技有限公司 Trace nucleic acid library construction method
WO2023221234A1 (en) * 2022-05-18 2023-11-23 清华大学 In vitro screening method for covalent inhibitor and use thereof

Similar Documents

Publication Publication Date Title
WO2019236644A1 (en) Encoded libraries and methods of use for screening nucleic acid targets
Pinheiro et al. Towards XNA nanotechnology: new materials from synthetic genetic polymers
EP2872680B1 (en) Dna-encoded libraries having encoding oligonucleotide linkages not readable by polymerases
EP3146068B1 (en) Nucleic acid processing of a nucleic acid fragment with a triazole linkage
EP1539980B1 (en) Library of complexes comprising small non-peptide molecules and double-stranded oligonucleotides identifying the molecules
JP2021531022A (en) Compositions Containing Cyclic Polyribonucleotides and Their Use
US20130046084A1 (en) Oligonucleotide ligation
JP2005296014A (en) Determination method for cellular transcription control
AU2015374309B2 (en) Methods for tagging DNA-encoded libraries
EP1539953A2 (en) Proximity-aided synthesis of templated molecules
CA3114892A1 (en) Methods and compositions for increasing capping efficiency of transcribed rna
TW202227100A (en) Reverse transcription of polynucleotides comprising unnatural nucleotides
US11584772B2 (en) N4-modified cytidine nucleotides and their use
Zhang et al. Functional nucleic acids with synthetic sugar or nucleobase moieties
US20220049291A1 (en) Method and products for producing functionalised single stranded oligonucleotides
JP2022547949A (en) Methods and kits for preparing RNA samples for sequencing
Hollenstein Enzymatic synthesis of base-modified nucleic acids
Eremeeva et al. Enzymatic synthesis using polymerases of modified nucleic acids and genes
Furuzono et al. Speeding drug discovery targeting RNAs: An iterative “RNA selection-compounds screening cycle “for exploring RNA-small molecule pairs
JP6703948B2 (en) Non-enzymatic nucleic acid chain binding method
Wei et al. Functional XNA and Biomedical Application
Wu et al. “Click handle”-modified 2′-deoxy-2′-fluoroarabino nucleic acid as a synthetic genetic polymer capable of post-polymerization functionalization
Chen Evolution and Computational Generation of Highly Functionalized Nucleic Acid Polymers
Wu Synthetic Nucleic Acid Capable of Post-Polymerization Functionalization and Evolution
Scheitl In vitro selected ribozymes for RNA methylation and labeling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19814278

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19814278

Country of ref document: EP

Kind code of ref document: A1