WO2023086767A1 - High-throughput drug discovery methods - Google Patents

High-throughput drug discovery methods Download PDF

Info

Publication number
WO2023086767A1
WO2023086767A1 PCT/US2022/079382 US2022079382W WO2023086767A1 WO 2023086767 A1 WO2023086767 A1 WO 2023086767A1 US 2022079382 W US2022079382 W US 2022079382W WO 2023086767 A1 WO2023086767 A1 WO 2023086767A1
Authority
WO
WIPO (PCT)
Prior art keywords
compound
target protein
protein
dsdna
attached
Prior art date
Application number
PCT/US2022/079382
Other languages
French (fr)
Inventor
Ian QUIGLEY
Andrew BLEVINS
Original Assignee
Leash Labs, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leash Labs, Inc. filed Critical Leash Labs, Inc.
Publication of WO2023086767A1 publication Critical patent/WO2023086767A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1041Ribosome/Polysome display, e.g. SPERT, ARM
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1062Isolating an individual clone by screening libraries mRNA-Display, e.g. polypeptide and encoding template are connected covalently
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1075Isolating an individual clone by screening libraries by coupling phenotype to genotype, not provided for in other groups of this subclass
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/58Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B20/00Methods specially adapted for identifying library members
    • C40B20/04Identifying library members by means of a tag, label, or other readable or detectable entity associated with the library members, e.g. decoding processes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2458/00Labels used in chemical analysis of biological material
    • G01N2458/10Oligonucleotides as tagging agents for labelling antibodies

Definitions

  • both target based screening and phenotypic screening are carried out by exposing the target to one compound at a time, which is time consuming and labor intensive and presents difficulties in scaling.
  • DNA-encoded chemical libraries which are libraries of small molecules that have a small, unique DNA barcode on each small molecule.
  • aspects of the present disclosure are directed to methods for identifying compoundprotein binding pairs in a high throughput assay.
  • the methods include providing a compound with a unique barcode and providing a target protein with a unique barcode.
  • the unique barcode for the compound and the unique barcode for the protein are attached to form a chimeric nucleic acid sequence including the unique barcode for the compound and the unique barcode for the protein.
  • the chimeric nucleic acid sequence is sequenced, the barcodes identified and the compound and protein forming the binding pair are identified.
  • Fig. 1 is a schematic depicting a target protein having attached thereto a double stranded DNA (“dsDNA”) including a barcode unique to the target protein.
  • dsDNA double stranded DNA
  • a candidate compound (depicted as a small molecule) is depicted as binding to a binding site on the target protein forming a candidate compound-target protein binding pair in solution.
  • the candidate compound has attached thereto a dsDNA including a barcode unique to the candidate compound.
  • the unbound end of the dsDNA attached to the protein is attached or ligated to the unbound end of the dsDNA attached to the candidate compound (identified as “ligate and sequence DNA”) generating a DNA construct including the barcode unique to the target protein and the barcode unique to the candidate compound.
  • the DNA construct is sequenced and the barcodes are identified which identify the candidate compound and the target protein.
  • Fig. 2A depicts in schematic a DNA construct including (1) a transcriptional start site such as an Sp6 site (“Sp6”), (2) a universal PCR primer (“primer”), (3) a barcode (“hash”) unique to the protein of interest, which may be a few nucleotides, (4) a universal primer binding site (“bridge” or “bridge landing site”) for binding to a primer on a bridging polynucleotide, (5) an internal ribosomal entry site (“IRES”) to be used for translation, (6) the coding sequence encoding for the protein or protein fragment of interest (“target protein”), (7) a peptide tag such as FLAG, followed by (8) a spacer with no stop codons.
  • the DNA construct is transcribed into mRNA using the transcriptional start site.
  • Fig. 2B depicts in schematic a target protein that has been translated from the mRNA transcribed from the DNA construct of Fig. 2A using ribosome display or ribosome stalling (labeled as “translation stalls”). Since translation begins downstream of the universal primer binding site, the translated protein has attached thereto mRNA including (2) the universal PCR primer if different from the transcriptional start site, (3) the barcode or hash unique to the protein of interest of a few nucleotides, and (4) the universal primer binding site (“bridge landing site”) for binding to a primer on a bridging polynucleotide.
  • the target protein is depicted as having a compound (“small molecule”) bound thereto.
  • the compound has attached thereto a dsDNA including a barcode unique to the candidate compound. Further depicted is a bridging polynucleotide for ligation (“Ligation”) to the dsDNA of the compound and for hybridization (“DNA Bridge”) to the universal primer binding site (“bridge landing site”).
  • Ligaation ligation
  • DNA Bridge hybridization
  • Fig. 2D depicts use of a template switching oligo to facilitate second strand synthesis.
  • Fig. 2E depicts second strand synthesis resulting in a dsDNA construct including the barcode of the target protein and the barcode of the small molecule.
  • the dsDNA construct is to be sequenced to identify the barcodes and, accordingly, the target protein and small molecule.
  • Fig. 3B depicts a mRNA for the target protein having puromycin attached thereto for use in a mRNA display method.
  • the 3’ end of the mRNA includes a stem and loop structure with the puromycin attached thereto.
  • Fig. 3C depicts translation of mRNA encoding the target protein and resulting in mRNA display using puromycin to connect the mRNA to the target protein.
  • the mRNA serves as the barcode for the target protein.
  • Fig. 3C also depicts reverse transcription of the mRNA into DNA.
  • a DNA binding protein is used to crosslink the dsDNA of the candidate compound to the target protein, stabilizing the target protein-compound complex.
  • the dsDNA of the candidate compound and the dsDNA of the target protein are ligated together and sequenced.
  • click chemistry is used to bind the dsDNA of the candidate compound to the dsDNA of the target protein.
  • a nucleic acid of the dsDNA of the candidate compound includes a click chemistry moiety.
  • a nucleic acid of the dsDNA of the target protein includes a click chemistry moiety. The corresponding click chemistry moieties are reacted together, stabilizing the target protein-compound complex.
  • Fig. 5A depicts a target protein-compound complex ("small molecule", “protein”) with ligated barcodes ("Ligation”). Positions labeled “Tn5" depict random insertion events where the transposase Tn5 cuts dsDNA and inserts sequencing adapters into the free ends of the cuts. "DEL primer” indicates the position of a primer identical across all small molecules, with a unique barcode per molecule downstream.
  • Fig. 5B depicts the positions of a sequencing adapter inserted by the Tn5 (“Tn5") and the DEL primer ("DEL primer”), enabling PCR amplification only of DNA fragments generated by a ligation event between the target protein and the compound, such as when generated using the mRNA display approach.
  • Fig. 5C depicts the positions of a universal primer ("primer") 5' of the target protein bridge and hash and the DEL primer, enabling PCR amplification only of DNA fragments generated by a ligation event between the target protein and the compound when generated using the ribosome/bridge display approach of Fig. 2A-D.
  • primer a universal primer
  • target proteins are uniquely barcoded, such as with either mRNA or DNA. If mRNA, then the barcode is reverse transcribed into cDNA. The DNA barcode of the compound and the DNA barcode of the protein are ligated together forming a chimeric nucleic acid including the compound barcode and the protein barcode. The chimeric nucleic acid is then sequenced, the barcodes identified and, accordingly, the compound-protein binding pairs are identified.
  • the candidate compounds and the target proteins are combined together under conditions to allow formation of candidate compound - target protein interactions.
  • the candidate compound - target protein interactions may be promoted or stabilized, such as by emulsion isolation, chemical crosslinking, DNA intercalation, protein-protein interactions, ligand-ligand interactions, etc.
  • a target candidate compound binds to a target protein.
  • a plurality of target candidate compounds binds to respective target proteins within a plurality of target proteins.
  • a DNA construct for barcoding a target protein.
  • the method contemplates a plurality of DNA constructs for creating a plurality of barcoded target proteins.
  • the DNA construct is a template comprising at least a universal primer hybridization site for amplifying the DNA construct, a barcode sequence, a second primer hybridization site to facilitate reverse transcription of the barcode, an internal ribosome entry site, and a target protein coding sequence.
  • In vitro transcription is carried out to synthesize a barcoded mRNA template.
  • In vitro translation is then carried out to generate a mRNA- ribosome-protein complex.
  • ribosome stalling useful in the present disclosure and adaptable to the present methods are known to those of skill in the art and are described in Hanes et al., In vitro selection and evolution of functional proteins by using ribosome display, Proc. Natl. Acad. Sci. USA, (1997); 94(10): 4937-4942 hereby incorporated by reference in its entirety for teaching methods of ribosome display.
  • the mRNA portion of the complex includes the barcode sequence and the two primer hybridization sites.
  • a candidate compound having a unique barcode attached thereto binds the protein of the mRNA-ribo some-protein complex.
  • a mRNA construct for barcoding a target protein.
  • the method contemplates a plurality of mRNA constructs for creating a plurality of barcoded target proteins.
  • the mRNA construct includes puromycin which is or becomes covalently linked to a target protein during translation of mRNA into the target protein, resulting in the target protein being barcoded with the mRNA encoding it.
  • Methods of mRNA display or cDNA display useful in the present disclosure and adaptable to the present methods are known to those of skill in the art. See Barendt et al., Streamlined Protocol for mRNA Display, ACS Comb. Sci.
  • cDNA display a novel screening method for functional disulfide-rich peptides by solid-phase synthesis and stabilization of mRNA-protein fusions, Nucleic Acids Research (2009); 37(16): el08; Ueno, S., & Nemoto, N. (2011).
  • cDNA Display Rapid Stabilization of mRNA Display. Methods in Molecular Biology, 113-135. doi:10.1007/978-l-61779-379-0_8, each of which are hereby incorporated by reference in its entirety for the teaching of mRNA display or cDNA display. The mRNA is then reverse transcribed into cDNA attached to the target protein.
  • a plurality of target proteins each having its own unique cDNA barcode forms a DNA encoded target protein library.
  • a plurality of candidate compounds each having its own unique cDNA barcode forms a DNA encoded compound library.
  • a candidate compound with its own unique DNA barcode binds the target protein.
  • the nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein are bound to one another, such as by ligation for example proximity ligation, forming a chimeric nucleic acid construct including the nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein.
  • the chimeric nucleic acid construct is then sequenced and the barcodes identified.
  • the identified barcodes identify the candidate compound and the target protein that bound to each other.
  • methods are provided to stabilize the candidate compound and the target protein to each other to facilitate binding of the candidate compound to the target protein.
  • the method includes determining the identity of a plurality candidate compounds bound to respective target proteins.
  • the protein of interest is expressed as a fusion with a modified form of the 20-kDa monomeric DNA repair enzyme, human O6-alkylguanine-DNA-alkyltransferase (AGT), or SNAP-tag.
  • AGT human O6-alkylguanine-DNA-alkyltransferase
  • SNAP-tag can be specifically labeled with synthetic O6-benzylguanine (BG) derivatives, resulting in a stable thioether bond between a reactive cysteine residue in the tag and the probe.
  • BG O6-benzylguanine
  • the SNAP-tag can be appended onto the N- or C-terminus of proteins without affecting the function of a large number of fusion proteins.
  • Other methods include designing DNA or mRNA to include the barcode.
  • Still further methods include incorporating the barcode into DNA or mRNA using primer/amplification methods known to those of skill in the art with or without in vitro transcription.
  • barcoding protocols include those used in next-generation sequencing methods.
  • a barcode unique to a candidate compound is attached to the candidate compound, for example, by using methods known to those of skill in the art, such as by covalent reaction.
  • barcoding protocols include those used in making DNA encoded libraries for high throughput drug discovery.
  • a candidate compound having a unique nucleic acid barcode binds to a target protein having a unique nucleic acid barcode.
  • the barcoded DNA construct or template includes a polymerase primer binding sequence (e.g., T7 polymerase), and mRNAs are synthesized from the barcoded DNA construct or template by in vitro transcription.
  • a plurality of mRNAs are synthesized from a plurality of barcoded DNA constructs or templates in a single container.
  • reverse transcription is performed, and the cDNA sequences are complementary upstream to a ribosome binding site of the barcoded mRNA template.
  • ribosomes stall at the 3' end of the mRNA sequence during in vitro translation due to one or both of a lack of stop codons or the presence of ribosome stalling peptide sequences.
  • the protein coding sequence encodes one or more affinity tags (e.g., FLAG tags and the like), e.g., at the N-terminal or C-terminal of a protein of interest.
  • a method for attaching a barcode to a polypeptide comprising the steps of providing a DNA template comprising at its 5' end an enzyme capable of receiving or otherwise attaching to a ligand, providing a fusion protein comprising an enzyme fragment specific for the ligand, and allowing the enzyme to covalently bind the ligand to produce a polypeptide comprising a barcode.
  • Enzyme fragments capable of this utility are known to those of skill in the art.
  • Exemplary enzyme fragments or tags include HaloTag, CLIP tag, SNAP-tag and the like.
  • the SNAP-tag is an enzyme based self-labeling protein tag.
  • the SNAP-tag protein is a modified form of the human repair protein O6-alkylguanine-DNA-alkyltransferase (AGT), a 20 kDa protein.
  • AGT human repair protein
  • the SNAP-tag protein undergoes a self-labeling reaction to form a covalent bond with 06-benzylguanine derivatives.
  • O6-Benzylguanine (BG) can be modified with a variety of reporter molecules such as fluorophores, peptides, or oligonucleotides. Using the SNAP-tag approach allows avoiding nonspecific labeling since most SNAP-tag substrates are chemically inert towards other proteins.
  • the method is performed using an automated high-throughput platform.
  • a plurality of uniquely barcoded candidate compounds and a plurality of uniquely barcoded target proteins are combined, such as in an aqueous media.
  • a plurality of uniquely barcoded candidate compounds bind to a plurality of respective uniquely barcoded target proteins.
  • the barcodes of a candidate compound bound to a target protein are attached or linked together to form a chimeric nucleic acid construct including both barcodes.
  • the chimeric nucleic acid construct is sequenced to determine the identity of the barcodes which in turn identifies the candidate compound and target protein as a binding pair.
  • the steps of attaching, sequencing and determining are carried out for a plurality of candidate compound - target protein binding pairs. Accordingly, the method provides a high throughput method for determining a plurality of candidate compound-target protein binding pairs within a mixture of a plurality of candidate compounds and target proteins.
  • a DNA encoded library of candidate compounds is screened against a DNA or RNA encoded library of target proteins for binding of candidate compounds to target proteins.
  • the library of candidate compounds may include at least 1 x 10 2 to 1 x 10 12 or more different candidate compounds.
  • the library of target proteins may include at least 1 x 10 2 to 1 x 10 12 different target proteins.
  • the library of candidate compounds and the library of target proteins may be combined and analyzed in a single assay.
  • target proteins include dsDNA including a barcode attached thereto. See Fig. 1 depicting a target protein as an exemplary candidate compound with a dsDNA including a barcode attached thereto.
  • target proteins include cellular proteins that can be obtained by translation of mRNA obtained from cells. In this manner, the transcriptome of cells can provide target proteins to be used in the methods described herein. For example, mRNAs from a cell or cells are isolated from other RNAs such as by poly-A selection.
  • T7 or similar reverse transcriptase binding site is added to the 5' end with PCR, for example, after cDNA synthesis, generating a library of cDNAs that include a transcription start site such as a T7 site at the 5’ end, the full-length cDNA as produced by reverse transcription, no stop codon in the protein-coding region, and a hybridization site for the puromycin linker.
  • the construct is then transcribed into mRNA using the transcriptional start site.
  • the DNA puromycin linker is ligated to the 3' end of this transcribed RNA using the hybridization site, as for example in Johnson et al., Molecular Cell, Vol. 81, 1-13 (2021) including Supplemental Materials Figure SI. Translation of this puromycin-ligated RNA is then carried out using a eukaryotic translation kit, using the endogenous ribosomal entry sites.
  • target proteins may be obtained from the transcriptome of cells as described above and as known in the art, target proteins may also be obtained in commercially available libraries.
  • the spacer sequence stays attached to the peptidyl tRNA and occupies the ribosomal tunnel, thereby allowing the protein of interest to protrude out of the ribosome and fold.
  • the mRNA includes a barcode sequence which is reverse transcribed into cDNA which becomes attached to the barcode for a candidate compound bound to the target protein.
  • Methods of attaching barcodes to target proteins include mRNA display and cDNA display.
  • mRNA display a target protein library is generated in which the target proteins are conjugated with their mRNA, for example by a puromycin linker.
  • the mRNA serves as the unique nucleic acid barcode for each target protein in the library.
  • an additional barcode or barcodes as known in the art beyond the mRNA may be added as desired, such as a UMI (unique molecular identifier) which may be attached to the puromycin linker.
  • UMI unique molecular identifier
  • Such a non-mRNA barcode may be used for any useful barcoding purpose including associated the barcode with the coding sequence of the mRNA via long-read sequencing.
  • Exemplary mRNA methods useful in the present disclosure and adaptable to the present methods include Roberts et al., RNA-peptide fusions for the in vitro selection of peptides and proteins. Proc. Natl. Acad. Sci. USA 94, 12297-12302 (1997); Barendt et al., Streamlined protocol for mRNA display, ACS Comb. Sci. 15, 77-81 (2013); Johnson et al., Molecular Cell 81, 1-13 (2021) (describing SMART-display mRNA display); Seelig, mRNA display for the selection and evolution of enzymes from in vitro-translated protein libraries, Nat. Protoc. 6, 540-552 (2011), Ueno, S., & Nemoto, N.
  • mRNA is collected from cells and purified.
  • a reverse transcription primer containing a random sixteen base pair region followed by the sequences for a FLAG tag or other peptide tag and a GC-rich puromycin linker hybridization site is annealed to the mRNA.
  • a genespecific primer for each gene may be used that falls short of or changes the endogenous stop codon may also be used in a similar manner.
  • Reverse transcription is then carried out with incorporation of a template switching oligo (TSO). PCR is performed with a primer that partially overlaps the TSO sequences to introduce a T7 promoter and complete the ribosome binding site. Double- stranded DNA is purified.
  • TSO template switching oligo
  • RNA is ligated to a puromycin- containing linker sequence and subsequently translated to form mRNA-protein fusion products. See Johnson et al., Molecular Cell 81, 1-13 (2021) and Ueno, S., & Nemoto, N. (2011). cDNA Display: Rapid Stabilization of mRNA Display. Methods in Molecular Biology, 113-135. doi:10.1007/978-l-61779-379-0_8.
  • barcode refers to a unique oligonucleotide sequence that allows a corresponding candidate compound or target nucleic acid to be identified.
  • barcodes can each have a length within a range of from 8 to 40 nucleotides, or from 10 to 32 nucleotides.
  • a barcode has a length of 10 nucleotides.
  • the melting temperatures of barcodes within a set are within 10 °C of one another, within 5 °C of one another, or within 2 °C of one another.
  • barcodes are members of a minimally cross -hybridizing set.
  • nucleotide sequence of each member of such a set is sufficiently different from that of every other member of the set that no member can form a stable duplex with the complement of any other member under stringent hybridization conditions.
  • nucleotide sequence of each member of a minimally cross -hybridizing set differs from those of every other member by at least two nucleotides. Barcode technologies useful in the present disclosure and adaptable in the present methods are known in the art and are described in Winzeler et al. (1999) Science 285:901; Brenner (2000) Genome Biol. 1:1 Kumar et al. (2001) Nature Rev. 2:302; Giaever et al. (2004) Proc. Natl. Acad. Sci. USA 101:793; Eason et al. (2004) Proc. Natl. Acad. Sci. USA 101:11046; and Brenner (2004) Genome Biol. 5:240.
  • barcodes may be single stranded nucleic acids or double stranded nucleic acids. Double stranded nucleic acid barcodes may be blunt ended or may have a 3’ or a 5’ overhang.
  • mRNA such as in mRNA display methods, can serve as a barcode for a target protein it encodes.
  • a library of DNA barcoded candidate compounds is mixed with a library of DNA or mRNA encoded target proteins to allow interactions.
  • the DNA barcode of the compound and the mRNA or DNA barcode of the target protein are attached together to generate a chimeric nucleic acid including the unique barcode of a candidate compound and the unique barcode of a target protein.
  • each protein may have a double stranded DNA barcode which are attached together to generate a chimeric nucleic acid including the unique barcode of a first protein and the unique barcode of a second protein of a protein-protein interaction or binding pair.
  • interacting small molecules and proteins i.e.
  • each small molecule and each protein of an interacting or binding pair may have a double stranded DNA barcode which are attached together to generate a chimeric nucleic acid including the unique barcode of a small molecule and the unique barcode of a target protein a small molecule-protein interaction or binding pair.
  • a bridging nucleotide is used which includes at one end a single stranded DNA and at the other end a double stranded DNA. The single stranded DNA portion anneals to the mRNA portion of the mRNA-ribosome-protein complex as described herein.
  • the double stranded DNA portion is ligated to the double stranded DNA barcode of the small molecule interacting with or otherwise bound to the protein of the mRNA-ribosome-protein complex.
  • DNA barcodes may be attached or “stitched” together using methods known to those of skill in the art, such as click methods, enzyme based methods and non-enzyme based methods, and accordingly, sequenced.
  • the DNA barcodes may be ligated together enzymatically.
  • the DNA barcodes may be linked together by a linker.
  • Exemplary methods of attaching or ligating nucleic acid barcodes together and sequencing useful in the present disclosure and adaptable in the present methods include Johnson et al., Molecular Cell 81, 1-13 (2021) (describing INLISE incubation, ligation and sequencing procedure); Dixon et al., Topological Domains in mammalian Genomes Identified by Analysis of Chromatin Interactions, Nature (2012); 485(7398): 376- 380 (describing Hi-C proximity ligation and sequencing); Lieberman- Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science.
  • click methods are used for proximity ligation.
  • click chemistry is used to connect DNA with nucleic acids. See Nicolo Zuin Fantoni, Afaf H. El-Sagheer, and Tom Brown, Chem. Rev. 2021, 121, 12, 7122-7154). Such connections may not interfere with enzymatic activity.
  • el-Sagheer et al. Efficient RNA synthesis by in vitro transcription of a triazole-modified DNA template, Chem Commun (Camb). 2011 Nov 28;47 (44): 12057-8).
  • a polynucleotide may be used to connect or bridge the binding pair as is depicted in Fig. 2B.
  • a polynucleotide is referred to herein as a “bridging polynucleotide.”
  • the bridging polynucleotide includes a single stranded DNA portion at one end and a double stranded portion at the other end as depicted in Fig. 2B.
  • the double stranded portion attaches to (for example is ligated to) a DNA sequence including a barcode and attached to a candidate compound where the candidate compound is bound to a target polynucleotide such as a target protein attached to its coding mRNA via ribosome display as depicted in Fig. 2B.
  • the single stranded portion attaches to (for example hybridizes with) a bridge landing hybridization site on a mRNA including a unique barcode attached to the target polynucleotide such as a target protein as depicted in Fig. 2B.
  • the bridging polynucleotide becomes bound to the DNA sequence attached to the candidate compound and hybridizes to the mRNA bound to the target protein.
  • the bridging polynucleotide may include a universal primer and a template switching oligonucleotide instead of the universal primer and the template switching oligonucleotide being present in the original DNA construct between the Sp6/T7 and the barcode.
  • DNA constructs as described herein include a (1) transcriptional start site, (2) a universal primer binding site, (3) a barcode (4), a primer binding site, (5) an internal ribosome entry site, (6) a protein coding sequence, (7) a peptide tag/FLAG, (8) a spacer with no stop codons.
  • a transcriptional start site as described herein is provided in the DNA constructs of the present disclosure so as to transcribe the DNA into mRNA
  • Exemplary transcriptional start sites include Sp6, T7, T3 and the like as are known in the art.
  • a universal primer as described herein is provided in the DNA constructs of the present disclosure so as to function as amplification or sequencing primers.
  • Exemplary universal primers bind to many different cognate sequences as is known in the art.
  • a barcode as described herein is used in the DNA construct according to the present disclosure to uniquely identify target proteins within a plurality of target proteins.
  • a barcode as described herein is also used according to the present disclosure to uniquely identify candidate compounds within a plurality of candidate compounds.
  • a primer binding site as described herein is provided in the DNA constructs of the present disclosure so as facilitate binding of a primer for purposes of transcription or reverse transcription as is known in the art.
  • An internal ribosome entry site as described herein is provided in the DNA constructs of the present disclosure so as to facilitate translation of mRNA into a target protein as is known in the art and as described herein.
  • protein synthesis is regulated by the sequence and structure of the 5' untranslated region (UTR) of the mRNA transcript.
  • UTR 5' untranslated region
  • RBS ribosome binding site
  • This purine-rich sequence of 5' UTR is complementary to the UCCU core sequence of the 3'-end of 16S rRNA (located within the 30S small ribosomal subunit).
  • Shine-Dalgamo sequences have been found in prokaryotic mRNAs. These sequences lie about 10 nucleotides upstream from the AUG start codon. Activity of a RBS can be influenced by the length and nucleotide composition of the spacer separating the RBS and the initiator AUG.
  • the Kozak sequence A/GCCACCAUGG SEQ ID NO:1
  • An mRNA lacking the Kozak consensus sequence may be translated efficiently in in vitro systems (Ambion) if it possesses a moderately long 5' UTR that lacks stable secondary structure.
  • Eukaryotic ribosomes (such as those found in reticulocyte lysate) can efficiently use either the Shine-Dalgamo or the Kozak ribosomal binding sites.
  • a protein coding sequence as described herein is provided in the DNA constructs of the present disclosure so as to facilitate translation from mRNA into the target protein as is known in the art and as described herein.
  • Exemplary protein coding sequences include those encoding proteins that are the target of drug screening libraries.
  • Such exemplary proteins encoded by genes include those described in Finan et al., The druggable genome and support for target identification and validation in drug development, Sci. Transl. Med. (2017); 9(383): eaagl 166; hereby incorporated by reference in its entirety.
  • target proteins include target of approved drugs and drugs in clinical development.
  • Such proteins that are targets of approved small molecule and biotherapeutic drugs may be identified using manually curated efficacy target information from release 17 of the ChEMBL database (see Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D,
  • Proteins closely related to drug targets or with associated drug-like compounds may be identified through a BLAST search (blastp) of Ensembl peptide sequences against the set of approved drug efficacy targets identified from ChEMBL previously (see Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Kruger Fa, Light Y, Mak L, McGlinchey S, Nowotka M, et al.
  • the ChEMBL bioactivity database an update. Nucleic Acids Res.
  • Extracellular proteins and members of key drug-target families may be identified through a BLAST search against the set of approved drug targets (as above), with any proteins sharing >25% identity over >75% of the sequence and with E-value ⁇ 0.001 being included in the set.
  • GPCRs kinases, ion channels, nuclear hormone receptors, and phosphodiesterases
  • lUPHARdb see Pawson AJ, Sharman JL, Benson HE, Faccenda E, Alexander SPH, Buneman OP, Davenport AP, McGrath JC, Peters JA, Southan C, Spedding M, et al. Nc-Iuphar, The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands.
  • Extracellular proteins may be identified using annotation in UniProt (see Pawson AJ, Sharman JL, Benson HE, Faccenda E, Alexander SPH, Buneman OP, Davenport AP, McGrath JC, Peters JA, Southan C, Spedding M, et al. Nc-Iuphar, The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands.
  • Drugs in clinical development may be identified from a number of sources: investor pipeline information from a number of large pharmaceutical companies [including Pfizer, Roche, GlaxoSmithKline, Novartis (oncology only), AstraZeneca, Sanofi, Lilly, Merck, Bayer, and Johnson & Johnson - accessed June-August 2013] monoclonal antibody candidates and USAN applications from the ChEMBL database (release 29), and drugs in active clinical trials from the NIH world wide website clinicaltrials.gov. Targets for these drug candidates may be assigned from company pipeline information and scientific literature, where available.
  • a spacer sequence or linker as described herein is provided in the DNA constructs of the present disclosure so as to provide spacing between components of the DNA construct and ultimately the fusions proteins.
  • the spacer provides distance between the terminal FLAG tag for example and the target protein to allow movement of the terminal FLAG tag relative to the target protein.
  • Property, design and functionality of exemplary spacer sequences include linkers described in Chen et al., Fusion Protein Linkers: Property, Design and Functionality, Adv. Drug Deliv. Rev. (2013); 65(10): 1357- 1369 hereby incorporated by reference in its entirety and lack stop codons.
  • “Complementary” or “substantially complementary” refers to the hybridization or base pairing or the formation of a duplex between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid.
  • Complementary nucleotides are, generally, A and T (or A and U), or C and G.
  • Kit refers to any delivery system for delivering materials or reagents for carrying out a method of the invention.
  • delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., primers, enzymes, microarrays, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another.
  • reaction reagents e.g., primers, enzymes, microarrays, etc. in the appropriate containers
  • supporting materials e.g., buffers, written instructions for performing the assay etc.
  • kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials for assays of the invention.
  • Such contents may be delivered to the intended recipient together or separately.
  • a first container may contain an enzyme for use in an assay, while a second container contains primers.
  • Nucleic acid molecules may be isolated from natural sources or purchased from commercial sources.
  • Oligonucleotide sequences e.g., barcodes
  • Oligonucleotide sequences may also be prepared by any suitable method, e.g., standard phosphoramidite methods such as those described by Beaucage and Carruthers ((1981) Tetrahedron Lett. 22: 1859) or the triester method according to Matteucci et al. (1981) J. Am. Chem. Soc. 103:3185), or by other chemical methods using either a commercial automated oligonucleotide synthesizer or high- throughput, high-density array methods known in the art (see U.S. Patent Nos.
  • Isolation, extraction or derivation of nucleic acid sequences may be carried out by any suitable method.
  • Isolating nucleic acid sequences from a biological sample generally includes treating a biological sample in such a manner that nucleic acid sequences present in the sample are extracted and made available for analysis. Any isolation method that results in extracted nucleic acid sequences may be used in the practice of the present invention. It will be understood that the particular method used to extract nucleic acid sequences will depend on the nature of the source.
  • Primer includes an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed.
  • the sequence of nucleotides added during the extension process are determined by the sequence of the template polynucleotide.
  • primers are extended by a DNA polymerase. Primers usually have a length in the range of between 3 to 36 nucleotides, also 5 to 24 nucleotides, also from 14 to 36 nucleotides.
  • Primers within the scope of the invention include orthogonal primers, amplification primers, constructions primers and the like. Pairs of primers can flank a sequence of interest or a set of sequences of interest. Primers and probes can be degenerate in sequence. Universal primers are contemplated. Universal primers are complementary to nucleotide sequences that are very common in a particular set of DNA molecules and cloning vectors. Thus, they are able to bind to a wide variety of DNA templates. Primers within the scope of the present invention bind adjacent to a target sequence (e.g., an oligonucleotide fragment, a barcode sequence or the like).
  • a target sequence e.g., an oligonucleotide fragment, a barcode sequence or the like.
  • “Specific” or “specificity” in reference to the binding of one molecule to another molecule means the recognition, contact, and formation of a stable complex between the two molecules, together with substantially less recognition, contact, or complex formation of that molecule with other molecules.
  • “specific” in reference to the binding of a first molecule to a second molecule means that to the extent the first molecule recognizes and forms a complex with another molecules in a reaction or sample, it forms the largest number of the complexes with the second molecule. In certain aspects, this largest number is at least fifty percent.
  • molecules involved in a specific binding event have areas on their surfaces or in cavities giving rise to specific recognition between the molecules binding to each other.
  • DNA barcodes in the form of short DNA fragments are conjugated to candidate compounds that serve as unique identification barcodes for each candidate compound. See Brenner et al., PNAS USA 89 (12): 5381-5383 (1992); Nielsen et al., JACS 115 (21): 9812-9813 (1993); Needels et al., PNAS USA 90 (22): 10700-4 (1993).
  • a library of target proteins each bearing a barcoding sequence is generated.
  • a Snap-tag protein library can be generated and such a library can be used to attach barcodes to target proteins. See Chan et al., Discovery of a Covalent Kinase Inhibitor from a DNA-encoded Small Molecule Library x Protein Library Selection, J. Am. Chem. Soc., 2017; 139(30): pp. 10192-10195 and Supplemental Materials and Methods at 10.1021/jacs.7b04880 hereby incorporated by reference for the teaching of DNA encoded libraries and libraries of barcoded target proteins, such as SNAP-tagged, DNA-barcoded target proteins.
  • a library of target proteins bearing a barcoding sequence are generated using Ribosome display.
  • One barcoding approach is to in vitro translate and display proteins on mRNA-ribosome-protein complexes, in which the mRNA contains a synthetic barcode.
  • the ribosome display is performed by using mRNA as a template and an in vitro translation (IVT) system, where the mRNA template lacks a stop codon such that translation stops to produce a mRNA-ribosome-protein.
  • mRNA-ribosome-protein complexes may be purified or enriched Flag-tag affinity purification.
  • the following oligonucleotides are generated or otherwise provided.
  • a DNA oligo construct containing the coding sequence of the gene(s) of interest which also includes, in the following 5 ’-3’ order: T7 or Sp6 or other transcription start site; a universal PCR primer site that is common to all genes in this library; a unique, short barcode (about 10 nucleotides) per coding region; a bridge landing site common to all genes in this library, e.g. GGGCGGCGGGGAAA(SEQ ID NO: 18); a ribosomal entry site (either endogenous or added); a coding sequence of a gene of interest; and lacking a stop codon. See Fig. 2A.
  • a reverse primer for PCR (“primer”) of the ligated construct as depicted in Fig. 5C 5' of the target protein bridge and hash and the DEL primer, enabling PCR amplification only of DNA fragments generated by a ligation event between the target protein and the compound when generated using the ribosome/bridge display approach of Fig. 2A-D.
  • the coding sequence oligo (a pool of which is a cDNA library) is made by chemical synthesis (e.g. gblocks from IDT) or from mRNAs isolated from cells or tissue. If the latter, random primers may be used to start first-strand cDNA synthesis (primers including the landing site) or gene-specific primers designed to be upstream of, or replace, the stop codon at the end of the protein-coding region.
  • a template-switching oligo is used for second-strand synthesis and to provide a site to add T7 or similar promoter, a universal PCR primer, a unique barcode, and a bridge landing site with PCR. If made from mRNA, sequencing is used to associate the unique barcode with the coding region of the gene.
  • the bridge and its primer are made by chemical synthesis.
  • the cDNA library is transcribed into RNA using a bacteria RNA polymerase, e.g. HiScribe T7 kit from New England Biolabs. Prior to translation, the RNA is denatured and the bridge and its primer are added, annealing the bridge to its 5’ landing site on the RNAs.
  • a bacteria RNA polymerase e.g. HiScribe T7 kit from New England Biolabs.
  • the RNA is denatured and the bridge and its primer are added, annealing the bridge to its 5’ landing site on the RNAs.
  • RNAs in the library are then subject to in vitro translation using a commercially available system selected based on the ribosomal entry site used. For example, if Shine- Delgamo sequences are used, a prokaryotic kit like NEBExpress Cell-free E. coli Protein Synthesis System from New England Biolabs may be used. If endogenous sequences are used, they are likely to have canonical Kozak sequences in the 5’ UTR, which would be preserved by the template- switching oligo approach in cDNA synthesis. Accordingly, a system including a wheat germ extract or rabbit reticulocyte commercially available from Promega, or other eukaryotic approach for translation can be used. The result is a ribosome- displayed library of proteins with a dsDNA oligo attached which can then be screened.
  • a commercially available system selected based on the ribosomal entry site used. For example, if Shine- Delgamo sequences are used, a prokaryotic kit like NEBEx
  • a library of target proteins bearing a barcoding sequence are generated using mRNA display shown generally at Fig. 3A-D.
  • Methods of barcoding a protein using mRNA are known to those of skill in the art as described herein.
  • a library of cDNA constructs are constructed as described herein.
  • the DNA construct includes a T7 RNA polymerase binding site or Sp6 transcription factor binding site or other transcription start site at the 5’ end of the DNA construct.
  • the DNA construct then includes a ribosomal entry site (which may be either endogenous or added).
  • the DNA construct lacks a stop codon.
  • a landing site for a DNA linker including a puromycin (e.g. GGGCGGCGGGGAAA) (SEQ ID NO: 19) is provided. See Fig. 3A and Fig. 3B.
  • the DNA construct can be made by chemical synthesis (e.g. gblocks from IDT) or from mRNAs isolated from cells or tissue. If the latter, random primers may be used to start first-strand cDNA synthesis (primers including the landing site) or gene-specific primers designed to be upstream of, or replace, the stop codon at the end of the proteincoding region.
  • a template- switching oligo is used for second-strand synthesis and to provide a site to add T7 or similar promoter with PCR.
  • a puromycin is covalently attached to a DNA oligo (commercially available from IDT or Trilink or Baseclick). See. Fig. 3B. See for example, Barendt et al., Streamlined protocol for mRNA Display, ACS Comb Scio. 2013: 15(2): 77- 81; Reyes et al., PURE mRNA display and cDNA display provide rapid detection of core epitope motif via high-throughput sequencing, Biotechnology and Bioengineering, vol. 118, issue 4, pp.
  • cDNA display a novel screening method for functional disulfide-rich peptides by solid-phase synthesis and stabilization of mRNA- protein fusions, Nucleic Acids Res., 37(16) el08 (2009); Ueno et al., cDNA display: rapid stabilization of mRNA display, Methods Mol Bio (2012);805: 113-135 (Fig. la referring to an “initiation site for reverse transcription”); Ueno et al., Improvement of a Puromycin-linker to Extend the Selection Target Varieties in cDNA Display Method, j. Biotechnol. (2012); 162(2-3): pp. 299-302; each of which are hereby incorporated by reference in its entirety for the teaching of mRNA and cDNA display methods.
  • coli Protein Synthesis System from New England Biolabs may be used. If endogenous sequences are used, they are likely to have canonical Kozak sequences in the 5’ UTR, which would be preserved by the template- switching oligo approach in cDNA synthesis. Accordingly, a system including a wheat germ extract or rabbit reticulocyte commercially available from Promega, or other eukaryotic approach for translation can be used. The result is a library of proteins covalently attached to RNAs encoding them through a puromycin linker. cDNA is then synthesized from the puromycin linker on the protein. See Fig. 3C showing reverse transcription of the mRNA attached to the target protein via puromycin.
  • the 3’ end of the mRNA Prior to reverse transcription, the 3’ end of the mRNA is trimmed with a restriction enzyme (see Fig. 3D showing trimmed 3’ end), allowing the mRNA strand to be displaced during second-strand synthesis and removed.
  • the puromycin linker is DNA and has a landing site for the reverse transcriptase (e.g., Ueno et al., Methods Mol Biol (2012);805: 113-35). See. Fig. 3C.
  • a template-switching oligo is then used for the second strand, generating a blunt- ended double- stranded cDNA of the gene’s RNA covalently attached to the protein.
  • Fig. 3D where the arrow indicates use of a template switching oligo for second strand synthesis.
  • a DNA-encoded protein library is generated for screening.
  • a candidate compound and a target protein are combined under conditions promoting binding of the compound to the target protein.
  • the DNA barcode attached to the compound and the DNA barcode attached to the target protein are attached to each other generating a DNA construct including the barcode of the compound and the barcode of the protein.
  • the DNA construct is then sequenced.
  • the barcodes are identified thereby identifying the compound and the protein bound to each other. See generally Fig. 1.
  • a commercially available DNA-encoded library as described herein is combined with or mixed with a library of nucleic acid encoded target proteins as described above, such as the library of mRNA display proteins or ribosome display proteins under suitable concentrations and temperature for a period of time to reach equilibrium and to form candidate compound-target protein binding pairs or complexes.
  • T4 ligase such as is commercially available as NEB’s Blunt/TA Ligase Master Mix which is a ready-to-use solution of T4 DNA Ligase, ligation enhancer, and optimized reaction buffer.
  • NEB Blunt/TA Ligase Master Mix
  • This master mix is specifically formulated to improve ligation and transformation of both blunt-end and single-base overhang substrates.
  • Other T4 DNA Ligase products include Quick Ligation Kit, Salt-T4, and Hi-T4.
  • first- strand synthesis is performed after ligation using the bridge as a primer to the proteinencoding RNA. See Fig. 2A-C.
  • the reverse transcriptase proceeds, it transcribes the target protein barcode upstream of the bridge binding site.
  • a template- switching oligo is then used to initiate second-strand synthesis, which will proceed down the target protein barcode, the bridge binding site, past the bridge primer and into the ligated small compound barcode.
  • RNAse may then be used to remove unwanted RNA products.
  • a candidate compound and a target protein are combined under conditions promoting binding of the compound to the target protein.
  • the DNA barcode attached to the compound and the DNA barcode attached to the target protein are attached to each other using Click chemistry generating a chimeric DNA construct including the barcode of the compound and the barcode of the protein.
  • the chimeric DNA construct is then sequenced.
  • the barcodes are identified thereby identifying the compound and the protein bound to each other.
  • the dsDNA with the barcode of the candidate compound and the dsDNA with the barcode of the target protein include click chemistry moieties that bind together under suitable conditions. See Fig. 4D.
  • the two dsDNA are connected together using click chemistry.
  • click-modified nucleotides are added to the end of the barcode of the candidate compound and the end of the barcode of the target protein with terminal transferase (New England Biolabs). The click-modified nucleotides at the ends of the barcodes are then reacted together when the candidate compounds and the target proteins form complexes, i.e.
  • the ends of the barcodes may be rendered blunt using NEB NEXT End Repair Module commercially available from New England Biolabs to reduce chance annealing between the two barcodes, especially via overhangs.
  • the click chemistry moieties bind together thereby forming a chimeric dsDNA that can be sequenced and the barcode of the candidate compound and the barcode of the target protein can be identified.
  • a plurality of candidate compound-target protein binding pairs are identified within a mixture of candidate compounds and target proteins as follows.
  • a plurality of uniquely barcoded candidate compounds and a plurality of uniquely barcoded target proteins are combined under conditions promoting binding of the candidate compounds to the target proteins to form a plurality of binding pairs.
  • the DNA barcode attached to the compound and the DNA barcode attached to the target protein are attached to each other generating a chimeric DNA construct including the barcode of the compound and the barcode of the protein.
  • the chimeric DNA construct is then sequenced.
  • the barcodes are identified thereby identifying the compound and the protein bound to each other.
  • a sequencing library is constructed.
  • Fig. 5A depicts a target protein-compound complex ("small molecule", “protein”) with ligated barcodes ("Ligation”). Positions labeled “Tn5" depict random insertion events where the transposase Tn5 cuts dsDNA and inserts sequencing adapters into the free ends of the cuts. "DEL primer” indicates the position of a primer identical across all small molecules, with a unique barcode per molecule downstream.
  • tagmentation is used to cut and insert sequencing primers along the dsDNA encoding the target protein.
  • transposases randomly cut the DNA into sizes between 50 to 500 bp fragments and adds adaptors simultaneously. See Clark, David P. (2 November 2018). Molecular biology. Pazdernik, Nanette Jean,, McGehee, Michelle R. (Third ed.). London. ISBN 978-0-12-813289-0. OCLC 1062496183 hereby incorporated by reference for the teaching tagmentation techniques adaptable to the methods described herein.
  • the dsDNA will be ligated to a small molecule library barcode.
  • the dsDNA will not be ligated to a small molecule library barcode.
  • the fragments are then amplified using one primer against the Tn5-inserted sequencing primer site and one primer directed against the universal portion of the small molecule library barcode, so that only protein-linked cDNA ligated to a small molecule barcode will be amplified.
  • Sequencing primer sequences are added to the primers directed against the universal portion of the small molecule library barcode, allowing for high-throughput sequencing.
  • the target protein is identified by the 3’ end of the coding sequence and the bound small molecule is identified by its barcode.
  • fragments are amplified using one primer directed against the primer site upstream of the protein barcode and one primer directed against the universal portion of the small molecule library barcode.
  • Library construction is then completed using end-repair and dA- tailing and sequencing primer ligation using NEBNext® UltraTM II DNA Library Prep Kit for Illumina®.
  • the target protein is then identified by its upstream barcode by sequencing the cDNA library as referenced above to link barcodes to proteins and the bound small molecule is identified by its barcode.
  • Protein barcodes are identified in advance by sequencing the cDNA constructs (those including the Sp6, the barcode, and the gene of interest) in order to link the protein barcode to the coding sequence. When screening for small molecules, the protein barcode is identified, which identifies the protein.
  • a method for screening DNA-encoded libraries against target proteins uses water-in-oil emulsion technology to isolate within a droplet an individual compound and an individual protein or a plurality of compounds and a plurality of target proteins to facilitate binding of a compound to a target protein in a single-tube approach.
  • the plurality of compounds and (2) the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes (1) a compound of the plurality, and (2) a target protein of the plurality under conditions creating a bound compound-protein binding pair, wherein the dsDNA attached to the compound is ligated to the dsDNA attached to the protein to create a dsDNA construct comprising the unique barcode sequence for the target protein and the unique barcode sequence of the compound.
  • Various water in oil emulsion techniques for isolating binding pairs and adaptable to the present disclosure are described by Petersen et al., Med. Chem.
  • the present disclosure provides various methods to promote binding of a candidate compound to a target protein to facilitate ligation of barcodes to provide a chimeric DNA construct for sequencing.
  • a sequence is added to the DNA barcode that is recognized by a small DNA binding protein.
  • the small DNA binding protein is also added. See for example Blanco et al., A Synthetic Miniprotein that Binds Specific DNA Sequences by Contacting Both the Major and Minor Groove, Chemistry & Biology, vol. 10, issue 8, (2003), pages 713- 722 hereby incorporated by reference in its entirety. This generates a small molecule library with attached DNA barcode and a small protein attached to the DNA barcode at high affinity.
  • the small molecule library is then mixed with the ds DNA barcoded protein library and the small DNA binding protein is crosslinked to the target protein with formaldehyde, increasing the stability of transient interactions. See Fig. 4B.
  • a similar approach is commonly used to stabilize transient interactions between higher-order chromatin interactions such as loops (see Lieberman- Aiden et al., Comprehensive mapping of Long Range Interactions Reveals Folding Principles of the Human Genome, Science, vol. 326, Issue 5950, pp. 289-293 (2009) hereby incorporated by reference in its entirety.
  • the two dsDNA fragments are ligated to form a chimeric
  • Formaldehyde may also directly crosslink DNA to DNA so that a DNA binding protein is not required. See Kawanishi et al., Front. Environ. Sci., 2014; vol. 2, article 36 pp. 1-8 (formaldehyde induces N-hydroxymethyl mono-adducts on guanine, adenine and cytosine, and N-methylene crosslinks between adjacent purine in DNA) hereby incorporated by reference in its entirety for the teaching of formaldehyde crosslinking DNA to DNA.
  • the dsDNA barcodes of the candidate compounds of the DNA encoded library are provided with an intercalating agent having maleimide attached thereto, such as a doxorubicin-maleimide conjugate.
  • the maleimide covalently reacts with neighboring cysteines of the target protein. See Fig. 4A. See Ravasco et al., Bioconjugation with Maleimides: A Useful Tool for Chemical Biology, Chemistry Europe, Vol. 25, Issue 1, pp. 43-59 (2019) hereby incorporated by reference in its entirety.
  • the intercalating agent with the maleimide such as a doxorubicin-maleimide conjugate
  • the DNA encoded library with the intercalator-maleimide conjugate is combined with the target protein library, where the intercalator-maleimide conjugate attached to the small molecule DNA barcode will bind to cysteines in its bound protein target partner, increasing the stability of transient interactions.
  • the two dsDNA fragments are ligated together to form a chimeric DNA molecule. See Fig. 4A.
  • nucleotides modified with click chemistry moieties are commercially available from Integrated DNA Technologies (IDT).
  • IDTT Integrated DNA Technologies
  • Appropriate pairs of click-compatible chemistries are provided on the dsDNA of the target protein and the dsDNA of the candidate compound/small molecule. See Fig. 4C.
  • Appropriate pairs of click-compatible chemistries may be provided on the puromycin linker or the bridge oligo for the display approaches and the barcodes for the small molecule library (e.g., azide on the puromycin linker, alkyne on the small molecule library).
  • a one-pot saturation mutagenesis technique described in Wrenbeck et al., Nat Methods (2016)(l l):928-930 hereby incorporated by reference in its entirety is a PCR-based approach for generating a customizable comprehensive mutagenesis library that’s ready to be tested in a functional screen.
  • the following steps can be carried out for the one -pot saturation mutagenesis technique: 1.
  • the present disclosure provides a method for determining interactions between a plurality of compounds and a plurality of target proteins, wherein each compound of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the compound, wherein each target protein of the plurality has a mRNA attached thereto, wherein the mRNA comprises (i) a first hybridization site, (ii) a barcode unique to the target protein, (iii) a universal PCR primer binding site and (iv) a transcriptional start site, the method includes combining (1) a bridging polynucleotide, (2) the plurality of compounds and (3) the plurality of target proteins under conditions creating a plurality of bound compoundprotein binding pairs, for each bound compound-protein binding pair, (A) the bridging polynucleotide hybridizes to the first hybridization site of the mRNA attached to the target protein, (B) the bridging polynucleotide is attached to the dsDNA attached to the compound,
  • each target protein of the plurality having a mRNA attached thereto is created by (A) transcribing a DNA construct comprising (1) a transcriptional start site, (2) a universal primer binding site which may be the transcriptional start site, (3) a barcode unique to a target protein, (4) a first hybridization site, (5) an internal ribosomal entry site, (6) a nucleic acid encoding the target protein, and (7) a nucleic acid encoding a peptide tag into mRNA, (B) reverse transcribing the mRNA using reverse transcription primers that bind to the 3’ end of the mRNA.
  • binding of a compound and a target protein is stabilized to facilitate hybridization and attachment of the bridging polynucleotide.
  • (1) the bridging polynucleotide, (2) the plurality of compounds and (3) the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes (1) bridging polynucleotide, (2) a compound of the plurality, and (3) a target protein of the plurality under conditions creating a bound compound-protein with bridging polynucleotide hybridized to the mRNA attached to the protein, which is then subject to ligation to the dsDNA attached to the compound and reverse transcription creating the first strand DNA sequence.
  • the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein.
  • a protein is attached to the dsDNA of the compound and the protein is covalently attached to the target protein.
  • a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound.
  • the present disclosure provides a method for determining interactions between a plurality of compounds and a plurality of target proteins, wherein each compound of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the compound, wherein each target protein of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the target protein, the method includes combining (1) the plurality of compounds and (2) the plurality of target proteins under conditions creating a plurality of bound compound-target protein binding pairs, for each compound-target protein binding pair, attaching the dsDNA attached to the compound to the dsDNA attached to the target protein to create a dsDNA construct comprising the unique barcode sequence for the target protein and the unique barcode sequence of the compound, sequencing the dsDNA construct to identify the unique barcode sequence for the target protein and the unique barcode sequence for the compound so as to identify the target protein and the compound bound thereto.
  • a protein is attached to the dsDNA of the compound and the protein is covalently attached to the target protein to promote binding of the compound to the target protein.
  • a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound to promote binding of the compound to the target protein.
  • binding of a compound and a target protein is stabilized to facilitate hybridization and attachment of the bridging polynucleotide.
  • the plurality of compounds and the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes a compound of the plurality and a target protein of the plurality under conditions creating a bound compoundprotein, to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
  • the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
  • a protein is attached to the dsDNA of the compound and the target protein is covalently attached to the target protein to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
  • a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.

Abstract

Methods for identifying compound-protein binding pairs in a high throughput assay are provided. The methods include providing a compound with a unique barcode and providing a target protein with a unique barcode. When a binding pair of a compound and protein is formed, the unique barcode for the compound and the unique barcode for the protein are attached to form a chimeric nucleic acid sequence including the unique barcode for the compound and the unique barcode for the protein. The chimeric nucleic acid sequence is sequenced, the barcodes identified and the compound and protein forming the binding pair is identified.

Description

HIGH-THROUGHPUT DRUG DISCOVERY METHODS
RELATED APPLICATION DATA
This application claims the benefit of U.S. Provisional Application No. 63/278,651, filed on November 12, 2021, which is hereby incorporated by reference in its entirety.
SEQUENCE LISTING
The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on October 20, 2022, is named “Sequence_Listing_009458_00002_ST26” and is 24 KB in size.
FIELD
The present invention relates to methods and compositions for high-throughput analysis of binding of candidate compounds to candidate proteins.
BACKGROUND
High-throughput drug discovery methods are known. One general method includes target based screening where a compound is screened against a target and the properties of the target are analyzed. Another general method includes phenotypic screening where a compound is screened in an animal or cellular disease model to determine if the compound causes a desirable change in phenotype.
According to certain methods, both target based screening and phenotypic screening are carried out by exposing the target to one compound at a time, which is time consuming and labor intensive and presents difficulties in scaling. More recently, efforts have been made to use DNA-encoded chemical libraries which are libraries of small molecules that have a small, unique DNA barcode on each small molecule. One can then screen the DNA- encoded library of chemicals and expose them to the protein target simultaneously, only capturing those that bind the best and identifying them by sequencing the barcodes. See for example Patel et al., Developments in Photoredox-Mediated Alkylation for DNA-Encoded Libraries, Trends Chem. 2021 Mar;3(3): 161-175 and Mannoicci et al., High-throughput sequencing allows the identification of binding molecules isolated from DNA-encoded chemical libraries, PNAS USA 2008 Nov. 18; 105(46): 17670-17675.
However, further methods are needed for high throughput or massively parallelized compound screening against many targets simultaneously.
SUMMARY
Aspects of the present disclosure are directed to methods for identifying compoundprotein binding pairs in a high throughput assay are provided. The methods include providing a compound with a unique barcode and providing a target protein with a unique barcode. When a binding pair of a compound and protein is formed, the unique barcode for the compound and the unique barcode for the protein are attached to form a chimeric nucleic acid sequence including the unique barcode for the compound and the unique barcode for the protein. The chimeric nucleic acid sequence is sequenced, the barcodes identified and the compound and protein forming the binding pair are identified.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other features and advantages of the present invention will be more fully understood from the following detailed description of illustrative embodiments taken in conjunction with the accompanying drawings in which:
Fig. 1 is a schematic depicting a target protein having attached thereto a double stranded DNA (“dsDNA”) including a barcode unique to the target protein. A candidate compound (depicted as a small molecule) is depicted as binding to a binding site on the target protein forming a candidate compound-target protein binding pair in solution. The candidate compound has attached thereto a dsDNA including a barcode unique to the candidate compound. According to methods described herein, the unbound end of the dsDNA attached to the protein is attached or ligated to the unbound end of the dsDNA attached to the candidate compound (identified as “ligate and sequence DNA”) generating a DNA construct including the barcode unique to the target protein and the barcode unique to the candidate compound. The DNA construct is sequenced and the barcodes are identified which identify the candidate compound and the target protein.
Fig. 2A depicts in schematic a DNA construct including (1) a transcriptional start site such as an Sp6 site (“Sp6”), (2) a universal PCR primer (“primer”), (3) a barcode (“hash”) unique to the protein of interest, which may be a few nucleotides, (4) a universal primer binding site (“bridge” or “bridge landing site”) for binding to a primer on a bridging polynucleotide, (5) an internal ribosomal entry site (“IRES”) to be used for translation, (6) the coding sequence encoding for the protein or protein fragment of interest (“target protein”), (7) a peptide tag such as FLAG, followed by (8) a spacer with no stop codons. The DNA construct is transcribed into mRNA using the transcriptional start site.
Fig. 2B depicts in schematic a target protein that has been translated from the mRNA transcribed from the DNA construct of Fig. 2A using ribosome display or ribosome stalling (labeled as “translation stalls”). Since translation begins downstream of the universal primer binding site, the translated protein has attached thereto mRNA including (2) the universal PCR primer if different from the transcriptional start site, (3) the barcode or hash unique to the protein of interest of a few nucleotides, and (4) the universal primer binding site (“bridge landing site”) for binding to a primer on a bridging polynucleotide. The target protein is depicted as having a compound (“small molecule”) bound thereto. The compound has attached thereto a dsDNA including a barcode unique to the candidate compound. Further depicted is a bridging polynucleotide for ligation (“Ligation”) to the dsDNA of the compound and for hybridization (“DNA Bridge”) to the universal primer binding site (“bridge landing site”).
Fig. 2C depicts in schematic reverse transcription of the portion of the mRNA in a first strand synthesis resulting in a DNA construct including the barcode unique to the target protein and the barcode unique to the compound resulting from use of the bridging polynucleotide of Fig. 2B.
Fig. 2D depicts use of a template switching oligo to facilitate second strand synthesis.
Fig. 2E depicts second strand synthesis resulting in a dsDNA construct including the barcode of the target protein and the barcode of the small molecule. The dsDNA construct is to be sequenced to identify the barcodes and, accordingly, the target protein and small molecule.
Fig. 3A depicts a dsDNA construct used to barcode a protein using mRNA display of the mRNA construct to the protein.
Fig. 3B depicts a mRNA for the target protein having puromycin attached thereto for use in a mRNA display method. The 3’ end of the mRNA includes a stem and loop structure with the puromycin attached thereto.
Fig. 3C depicts translation of mRNA encoding the target protein and resulting in mRNA display using puromycin to connect the mRNA to the target protein. The mRNA serves as the barcode for the target protein. Fig. 3C also depicts reverse transcription of the mRNA into DNA.
Fig. 3D depicts generation of dsDNA which is attached to the target protein. Figs. 4A-4D depict various embodiments of facilitating attachment of the dsDNA of a target compound to the dsDNA or a candidate compound such as a small molecule. In Fig. 4A, the dsDNA of a candidate compound is attached to the target protein by use of an intercalator (such as ethidium bromide) having a protein binding moiety (such as maleimide) attached thereto. The intercalator binds to the dsDNA and the protein binding moiety binds to the target protein, stabilizing the target protein-compound complex. The dsDNA of the candidate compound and the dsDNA of the target protein are ligated together and sequenced. In Fig. 4B, a DNA binding protein is used to crosslink the dsDNA of the candidate compound to the target protein, stabilizing the target protein-compound complex. The dsDNA of the candidate compound and the dsDNA of the target protein are ligated together and sequenced. In Fig. 4C, click chemistry is used to bind the dsDNA of the candidate compound to the dsDNA of the target protein. A nucleic acid of the dsDNA of the candidate compound includes a click chemistry moiety. A nucleic acid of the dsDNA of the target protein includes a click chemistry moiety. The corresponding click chemistry moieties are reacted together, stabilizing the target protein-compound complex. The dsDNA of the candidate compound and the dsDNA of the target protein are ligated together and sequenced. In Fig. 4D, the terminal portion of the dsDNA of the candidate compound includes a nucleic acid with a click chemistry moiety. The terminal portion of the dsDNA of the target protein includes a nucleic acid with a click chemistry moiety. A nucleic acid of the dsDNA of the target protein includes a click chemistry moiety. The corresponding click chemistry moieties are reacted together and sequenced.
Fig. 5A depicts a target protein-compound complex ("small molecule", "protein") with ligated barcodes ("Ligation"). Positions labeled "Tn5" depict random insertion events where the transposase Tn5 cuts dsDNA and inserts sequencing adapters into the free ends of the cuts. "DEL primer" indicates the position of a primer identical across all small molecules, with a unique barcode per molecule downstream.
Fig. 5B depicts the positions of a sequencing adapter inserted by the Tn5 ("Tn5") and the DEL primer ("DEL primer"), enabling PCR amplification only of DNA fragments generated by a ligation event between the target protein and the compound, such as when generated using the mRNA display approach.
Fig. 5C depicts the positions of a universal primer ("primer") 5' of the target protein bridge and hash and the DEL primer, enabling PCR amplification only of DNA fragments generated by a ligation event between the target protein and the compound when generated using the ribosome/bridge display approach of Fig. 2A-D.
DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
Embodiments of the present disclosure are directed to methods of determining binding of candidate compounds to target proteins within a mixture of a plurality of candidate compounds and a plurality of target proteins. According to one aspect, each candidate compound has a nucleic acid barcode unique to the candidate compound attached thereto. A plurality of candidate compounds each with its own unique barcode is referred to herein as a DNA-encoded compound library. According to one aspect, each target protein has a nucleic acid barcode unique to the target protein attached thereto. A plurality of target proteins each with its own unique barcode is referred to herein as a DNA-encoded target protein library. One aspect of the present disclosure is to combine a barcoded compound library with a barcoded protein library to generate compound-protein binding pairs. Since the barcode of the compound and the barcode of the protein are in proximity to one another in a compoundprotein binding pair, a chimeric barcode is generated for each compound-protein binding pair, thereby generating a plurality of chimeric barcodes. High throughout DNA sequencing can then be used to decode the chimeric barcodes and accordingly, the identities of the compound-protein binding pairs.
According to one aspect, therefore, target proteins are uniquely barcoded, such as with either mRNA or DNA. If mRNA, then the barcode is reverse transcribed into cDNA. The DNA barcode of the compound and the DNA barcode of the protein are ligated together forming a chimeric nucleic acid including the compound barcode and the protein barcode. The chimeric nucleic acid is then sequenced, the barcodes identified and, accordingly, the compound-protein binding pairs are identified.
The candidate compounds and the target proteins are combined together under conditions to allow formation of candidate compound - target protein interactions. According to one aspect, the candidate compound - target protein interactions may be promoted or stabilized, such as by emulsion isolation, chemical crosslinking, DNA intercalation, protein-protein interactions, ligand-ligand interactions, etc. According to one aspect, a target candidate compound binds to a target protein. According to one aspect, a plurality of target candidate compounds binds to respective target proteins within a plurality of target proteins. For a bound candidate compound-target protein, the nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein are attached to one another, such as by ligation, forming a chimeric nucleic acid construct including the nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein. The chimeric nucleic acid construct is then sequenced and the barcodes identified. The identified barcodes identify the candidate compound and the target protein that bound to each other. According to one aspect, methods are provided to stabilize the candidate compound and the target protein to each other to facilitate binding of the candidate compound to the target protein. The method includes determining the identity of a plurality of candidate compounds bound to respective target proteins. According to one aspect, a target candidate compound as described above binds to a target protein as described above. The nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein are ligated to one another, such as by proximity ligation, forming a chimeric nucleic acid construct including the nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein. The chimeric nucleic acid construct is then sequenced and the barcodes identified. The identified barcodes identify the candidate compound and the target protein that bound to each other. According to one aspect, methods are provided to stabilize the candidate compound and the target protein to each other to facilitate binding of the candidate compound to the target protein. The method includes determining the identity of a plurality candidate compounds bound to respective target proteins.
According to one aspect, a DNA construct is provided for barcoding a target protein. The method contemplates a plurality of DNA constructs for creating a plurality of barcoded target proteins. The DNA construct is a template comprising at least a universal primer hybridization site for amplifying the DNA construct, a barcode sequence, a second primer hybridization site to facilitate reverse transcription of the barcode, an internal ribosome entry site, and a target protein coding sequence. In vitro transcription is carried out to synthesize a barcoded mRNA template. In vitro translation is then carried out to generate a mRNA- ribosome-protein complex. Methods of ribosome stalling (ribosome display) useful in the present disclosure and adaptable to the present methods are known to those of skill in the art and are described in Hanes et al., In vitro selection and evolution of functional proteins by using ribosome display, Proc. Natl. Acad. Sci. USA, (1997); 94(10): 4937-4942 hereby incorporated by reference in its entirety for teaching methods of ribosome display. The mRNA portion of the complex includes the barcode sequence and the two primer hybridization sites. A candidate compound having a unique barcode attached thereto binds the protein of the mRNA-ribo some-protein complex. A single stranded polynucleotide referred to herein as a bridging polynucleotide, then hybridizes and is ligated to the barcode unique to the candidate compound. The single stranded polynucleotide hybridizes to the second primer binding site of the mRNA portion of the mRNA-ribo some-protein complex and is used as a primer to reverse transcribe the mRNA which includes the barcode unique to the target protein into cDNA generating a chimeric nucleic acid construct including the nucleic acid barcode unique to the target protein and the nucleic acid barcode unique to the candidate compound. The chimeric nucleic acid construct is then sequenced and the barcodes identified. The identified barcodes identify the candidate compound and the target protein that bound to each other. According to one aspect, methods are provided to stabilize the candidate compound and the target protein to each other to facilitate binding of the candidate compound to the target protein. The method includes determining the identity of a plurality candidate compounds bound to respective target proteins.
According to one aspect, a mRNA construct is provided for barcoding a target protein. The method contemplates a plurality of mRNA constructs for creating a plurality of barcoded target proteins. The mRNA construct includes puromycin which is or becomes covalently linked to a target protein during translation of mRNA into the target protein, resulting in the target protein being barcoded with the mRNA encoding it. Methods of mRNA display or cDNA display useful in the present disclosure and adaptable to the present methods are known to those of skill in the art. See Barendt et al., Streamlined Protocol for mRNA Display, ACS Comb. Sci. (2013); 15(2): 77-81 and Yamaguchi et al., cDNA display: a novel screening method for functional disulfide-rich peptides by solid-phase synthesis and stabilization of mRNA-protein fusions, Nucleic Acids Research (2009); 37(16): el08; Ueno, S., & Nemoto, N. (2011). cDNA Display: Rapid Stabilization of mRNA Display. Methods in Molecular Biology, 113-135. doi:10.1007/978-l-61779-379-0_8, each of which are hereby incorporated by reference in its entirety for the teaching of mRNA display or cDNA display. The mRNA is then reverse transcribed into cDNA attached to the target protein. A plurality of target proteins each having its own unique cDNA barcode forms a DNA encoded target protein library. A plurality of candidate compounds each having its own unique cDNA barcode forms a DNA encoded compound library. A candidate compound with its own unique DNA barcode binds the target protein. The nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein are bound to one another, such as by ligation for example proximity ligation, forming a chimeric nucleic acid construct including the nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein. The chimeric nucleic acid construct is then sequenced and the barcodes identified. The identified barcodes identify the candidate compound and the target protein that bound to each other. According to one aspect, methods are provided to stabilize the candidate compound and the target protein to each other to facilitate binding of the candidate compound to the target protein. The method includes determining the identity of a plurality candidate compounds bound to respective target proteins.
According to one aspect, a barcode unique to a target protein is attached to the target protein, for example, by using methods known to those of skill in the art, such as by covalent reaction exemplified by SNAP-tag described by Cole, Site-Specific Protein Labeling with SNAP-Tags, Curr. Protoc. Protein Sci. (2013); 73:30.1.1-30.1.16 published online doi:10.1002/0471140864.ps3001s73 hereby incorporated by reference in its entirety for the teaching of the use of SNAP-tags. With a self-labeling SNAP-tag, the protein of interest is expressed as a fusion with a modified form of the 20-kDa monomeric DNA repair enzyme, human O6-alkylguanine-DNA-alkyltransferase (AGT), or SNAP-tag. The SNAP-tag can be specifically labeled with synthetic O6-benzylguanine (BG) derivatives, resulting in a stable thioether bond between a reactive cysteine residue in the tag and the probe. The SNAP-tag can be appended onto the N- or C-terminus of proteins without affecting the function of a large number of fusion proteins. Other methods include designing DNA or mRNA to include the barcode. Still further methods include incorporating the barcode into DNA or mRNA using primer/amplification methods known to those of skill in the art with or without in vitro transcription. Such barcoding protocols include those used in next-generation sequencing methods. According to one aspect, a barcode unique to a candidate compound is attached to the candidate compound, for example, by using methods known to those of skill in the art, such as by covalent reaction. Such barcoding protocols include those used in making DNA encoded libraries for high throughput drug discovery. A candidate compound having a unique nucleic acid barcode binds to a target protein having a unique nucleic acid barcode. The nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein are bound or ligated to one another, such as by proximity ligation, forming a chimeric nucleic acid construct including the nucleic acid barcode unique to the candidate compound and the nucleic acid barcode unique to the target protein. The chimeric nucleic acid construct is then sequenced and the barcodes identified. The identified barcodes identify the candidate compound and the target protein that bound to each other. According to one aspect, methods are provided to stabilize the candidate compound and the target protein to each other to facilitate binding of the candidate compound to the target protein. The method includes determining the identity of a plurality of candidate compounds bound to respective target proteins.
In certain aspects, the barcoded DNA construct or template includes a polymerase primer binding sequence (e.g., T7 polymerase), and mRNAs are synthesized from the barcoded DNA construct or template by in vitro transcription. A plurality of mRNAs are synthesized from a plurality of barcoded DNA constructs or templates in a single container. In other aspects, reverse transcription is performed, and the cDNA sequences are complementary upstream to a ribosome binding site of the barcoded mRNA template. In certain aspects, ribosomes stall at the 3' end of the mRNA sequence during in vitro translation due to one or both of a lack of stop codons or the presence of ribosome stalling peptide sequences. In yet other aspects, the protein coding sequence encodes one or more affinity tags (e.g., FLAG tags and the like), e.g., at the N-terminal or C-terminal of a protein of interest.
In certain exemplary embodiments, a method for attaching a barcode to a polypeptide is provided, comprising the steps of providing a DNA template comprising at its 5' end an enzyme capable of receiving or otherwise attaching to a ligand, providing a fusion protein comprising an enzyme fragment specific for the ligand, and allowing the enzyme to covalently bind the ligand to produce a polypeptide comprising a barcode. Enzyme fragments capable of this utility are known to those of skill in the art. Exemplary enzyme fragments or tags include HaloTag, CLIP tag, SNAP-tag and the like. As is known in the art, the SNAP-tag is an enzyme based self-labeling protein tag. The SNAP-tag protein is a modified form of the human repair protein O6-alkylguanine-DNA-alkyltransferase (AGT), a 20 kDa protein. The SNAP-tag protein undergoes a self-labeling reaction to form a covalent bond with 06-benzylguanine derivatives. O6-Benzylguanine (BG) can be modified with a variety of reporter molecules such as fluorophores, peptides, or oligonucleotides. Using the SNAP-tag approach allows avoiding nonspecific labeling since most SNAP-tag substrates are chemically inert towards other proteins.
In certain aspects, the method is performed using an automated high-throughput platform. A plurality of uniquely barcoded candidate compounds and a plurality of uniquely barcoded target proteins are combined, such as in an aqueous media. A plurality of uniquely barcoded candidate compounds bind to a plurality of respective uniquely barcoded target proteins. The barcodes of a candidate compound bound to a target protein are attached or linked together to form a chimeric nucleic acid construct including both barcodes. The chimeric nucleic acid construct is sequenced to determine the identity of the barcodes which in turn identifies the candidate compound and target protein as a binding pair. The steps of attaching, sequencing and determining are carried out for a plurality of candidate compound - target protein binding pairs. Accordingly, the method provides a high throughput method for determining a plurality of candidate compound-target protein binding pairs within a mixture of a plurality of candidate compounds and target proteins.
In certain aspects, a DNA encoded library of candidate compounds is screened against a DNA or RNA encoded library of target proteins for binding of candidate compounds to target proteins. The library of candidate compounds may include at least 1 x 102 to 1 x 1012 or more different candidate compounds. The library of target proteins may include at least 1 x 102 to 1 x 1012 different target proteins. The library of candidate compounds and the library of target proteins may be combined and analyzed in a single assay.
CANDIDATE COMPOUNDS AND ATTACHING BARCODES THERETO
According to the present disclosure, candidate compounds include small molecules, macrocycles, proteins, polypeptides, ligands, aptamers, antibodies, carbohydrates, lipids, metabolites, and nucleic acids. The candidate compounds have a barcode attached thereto and may be included in a DNA encoded library. See Fig. 1 depicting a small molecule as an exemplary candidate compound with a dsDNA including a barcode attached thereto. The barcode may be single stranded or double stranded. The DNA encoded library may include from 1 x 102 to 1 x 1012 candidate compounds. Methods of making DNA encoded libraries of candidate compounds for screening assays useful in the present disclosure and adaptable to the present methods are known in the art. See for example, Clark et al., Design, synthesis and selection of DNA-encoded small molecule libraries, Nature Chemical Biology, 5, 647-654 (2009); Castanon et al., Design and Development of a Technology Platform for DNA- Encoded Library Production and Affinity Selection, SLAS Discovery, 2018, Vol. 23(5), 387- 396; Gartner et al., DNA-templated organic synthesis and selection of a library of macrocycles, Science (2004); 305(5690): 1601- 1605 each of which is hereby incorporated by reference in its entirety for the teaching of making and using DNA-encoded small molecule libraries.
TARGET PROTEINS AND ATTACHING BARCODES THERETO
According to the present disclosure, target proteins include dsDNA including a barcode attached thereto. See Fig. 1 depicting a target protein as an exemplary candidate compound with a dsDNA including a barcode attached thereto. According to the present disclosure, target proteins include cellular proteins that can be obtained by translation of mRNA obtained from cells. In this manner, the transcriptome of cells can provide target proteins to be used in the methods described herein. For example, mRNAs from a cell or cells are isolated from other RNAs such as by poly-A selection. Reverse transcription is then carried out, using either random primers or gene-specific primers, where either primer set provides a site to initiate first-strand cDNA synthesis, hybridizes upstream of the endogenous stop codon and also includes a site for a puromycin linker to hybridize. A template-switching oligo is then used to get the 5' end and start second-strand synthesis of the cDNA. T7 or similar reverse transcriptase binding site is added to the 5' end with PCR, for example, after cDNA synthesis, generating a library of cDNAs that include a transcription start site such as a T7 site at the 5’ end, the full-length cDNA as produced by reverse transcription, no stop codon in the protein-coding region, and a hybridization site for the puromycin linker. The construct is then transcribed into mRNA using the transcriptional start site. The DNA puromycin linker is ligated to the 3' end of this transcribed RNA using the hybridization site, as for example in Johnson et al., Molecular Cell, Vol. 81, 1-13 (2021) including Supplemental Materials Figure SI. Translation of this puromycin-ligated RNA is then carried out using a eukaryotic translation kit, using the endogenous ribosomal entry sites.
It is to be understood that while target proteins may be obtained from the transcriptome of cells as described above and as known in the art, target proteins may also be obtained in commercially available libraries.
Methods of attaching barcodes to target proteins are known to those of skill in the art and include direct attachment by linker chemistry, binding pairs and the like. According to one aspect, SNAP proteins/methods are used as is known in the art. Alternatively, useful methods are described in Shimada et al., Conjugation of DNA with protein using His-tag chemistry and its application to the aptamer-based detection system, Biotechnol. Let. 2008; 30(11): 2001-2006 hereby incorporated by reference in its entirety for the description of conjugating DNA with protein using His-tag chemistry.
Methods of attaching barcodes to target proteins include ribosome display. As is known in the art, ribosome display is a process that results in translated proteins that are associated with their mRNA progenitor. Ribosome display begins with a native library of DNA sequences coding for polypeptides, such as target proteins described herein. Each sequence is transcribed, and then translated in vitro into polypeptide. However, the DNA library coding for a particular library of target proteins includes a spacer sequence lacking a stop codon before its end. The lack of a stop codon prevents release factors from binding and triggering the disassembly of the translational complex. The spacer sequence stays attached to the peptidyl tRNA and occupies the ribosomal tunnel, thereby allowing the protein of interest to protrude out of the ribosome and fold. What results is a complex of mRNA, ribosome, and protein. According to aspects described herein, the mRNA includes a barcode sequence which is reverse transcribed into cDNA which becomes attached to the barcode for a candidate compound bound to the target protein. The following references are instructive in creating a complex of mRNA, ribosome, and protein useful in the present disclosure and adaptable to the present methods. Hanes, J.; Pliickthun, A. (1997). "In vitro selection and evolution of functional proteins by using ribosome display". Proc. Natl. Acad. Sci. U.S.A. 94 (10): 4937-42; Lipovsek, D.; Pliickthun, A. (2004). "In-vitro protein evolution by ribosome display and mRNA display". J. Imm. Methods. 290 (1-2): 51-67; He, M.; Taussig, M. (2007). "Eukaryotic ribosome display with in situ DNA recovery". Nature Methods. 4 (3): 281-288; X Yan; Z Xu (2006). "Ribosome-display technology: applications for directed evolution of functional proteins". Drug Discovery Today. 11 (19-20): 911-916 each of which are hereby incorporated by reference for their teaching of using ribosome display to attach a barcode to a target protein.
Methods of attaching barcodes to target proteins include mRNA display and cDNA display. In mRNA display, a target protein library is generated in which the target proteins are conjugated with their mRNA, for example by a puromycin linker. The mRNA serves as the unique nucleic acid barcode for each target protein in the library. According to one aspect, an additional barcode or barcodes as known in the art beyond the mRNA may be added as desired, such as a UMI (unique molecular identifier) which may be attached to the puromycin linker. Such a non-mRNA barcode may be used for any useful barcoding purpose including associated the barcode with the coding sequence of the mRNA via long-read sequencing. Exemplary mRNA methods useful in the present disclosure and adaptable to the present methods include Roberts et al., RNA-peptide fusions for the in vitro selection of peptides and proteins. Proc. Natl. Acad. Sci. USA 94, 12297-12302 (1997); Barendt et al., Streamlined protocol for mRNA display, ACS Comb. Sci. 15, 77-81 (2013); Johnson et al., Molecular Cell 81, 1-13 (2021) (describing SMART-display mRNA display); Seelig, mRNA display for the selection and evolution of enzymes from in vitro-translated protein libraries, Nat. Protoc. 6, 540-552 (2011), Ueno, S., & Nemoto, N. (2011). cDNA Display: Rapid Stabilization of mRNA Display. Methods in Molecular Biology, 113-135. doi: 10.1007/978- l-61779-379-0_8 (describing cDNA display methods useful and adaptable herein). Such methods are useful for creating mRNA-protein fusions by adding an amino acid analog puromycin near the 3’ end of the mRNA. The translated protein from this mRNA is then covalently linked with its mRNA when puromycin enters the A site of the ribosome and is joined to the amino acid chain. This generates an mRNA-protein fusion, which is then released from the ribosome.
According to one aspect, mRNA is collected from cells and purified. According to one aspect, a reverse transcription primer containing a random sixteen base pair region followed by the sequences for a FLAG tag or other peptide tag and a GC-rich puromycin linker hybridization site is annealed to the mRNA. According to an additional aspect, a genespecific primer for each gene may be used that falls short of or changes the endogenous stop codon may also be used in a similar manner. Reverse transcription is then carried out with incorporation of a template switching oligo (TSO). PCR is performed with a primer that partially overlaps the TSO sequences to introduce a T7 promoter and complete the ribosome binding site. Double- stranded DNA is purified. Transcribed RNA is ligated to a puromycin- containing linker sequence and subsequently translated to form mRNA-protein fusion products. See Johnson et al., Molecular Cell 81, 1-13 (2021) and Ueno, S., & Nemoto, N. (2011). cDNA Display: Rapid Stabilization of mRNA Display. Methods in Molecular Biology, 113-135. doi:10.1007/978-l-61779-379-0_8.
According to one aspect, sequencing libraries are made after ligation of the dsDNA barcode of the candidate compound to the dsDNA barcode of the target protein/mRNA display complex using a Tn5 transposase kit commercially available from Illumina and described at world wide website illumina.com/products/by-type/sequencing-kits/library-prep- kits/nextera-xt-dna.html). The Tn5 transposase kit is used to cut and insert primers randomly along the length of the ligated barcode construct.
BARCODES
As used herein, the term “barcode” refers to a unique oligonucleotide sequence that allows a corresponding candidate compound or target nucleic acid to be identified. In certain embodiments, barcodes can each have a length within a range of from 8 to 40 nucleotides, or from 10 to 32 nucleotides. In certain exemplary embodiments, a barcode has a length of 10 nucleotides. In certain aspects, the melting temperatures of barcodes within a set are within 10 °C of one another, within 5 °C of one another, or within 2 °C of one another. In other aspects, barcodes are members of a minimally cross -hybridizing set. That is, the nucleotide sequence of each member of such a set is sufficiently different from that of every other member of the set that no member can form a stable duplex with the complement of any other member under stringent hybridization conditions. In one aspect, the nucleotide sequence of each member of a minimally cross -hybridizing set differs from those of every other member by at least two nucleotides. Barcode technologies useful in the present disclosure and adaptable in the present methods are known in the art and are described in Winzeler et al. (1999) Science 285:901; Brenner (2000) Genome Biol. 1:1 Kumar et al. (2001) Nature Rev. 2:302; Giaever et al. (2004) Proc. Natl. Acad. Sci. USA 101:793; Eason et al. (2004) Proc. Natl. Acad. Sci. USA 101:11046; and Brenner (2004) Genome Biol. 5:240.
According to certain aspects, barcodes may be single stranded nucleic acids or double stranded nucleic acids. Double stranded nucleic acid barcodes may be blunt ended or may have a 3’ or a 5’ overhang. According to certain aspects, mRNA, such as in mRNA display methods, can serve as a barcode for a target protein it encodes.
LIGATING DOUBLE STRANDED DNA TOGETHER
According to certain aspects, a library of DNA barcoded candidate compounds is mixed with a library of DNA or mRNA encoded target proteins to allow interactions. For interacting compounds and target proteins, the DNA barcode of the compound and the mRNA or DNA barcode of the target protein are attached together to generate a chimeric nucleic acid including the unique barcode of a candidate compound and the unique barcode of a target protein. For example, for interacting proteins, i.e. protein-protein interactions, each protein may have a double stranded DNA barcode which are attached together to generate a chimeric nucleic acid including the unique barcode of a first protein and the unique barcode of a second protein of a protein-protein interaction or binding pair. For example, for interacting small molecules and proteins, i.e. small molecule-protein interactions, each small molecule and each protein of an interacting or binding pair may have a double stranded DNA barcode which are attached together to generate a chimeric nucleic acid including the unique barcode of a small molecule and the unique barcode of a target protein a small molecule-protein interaction or binding pair. For certain methods which utilize the ribosome display method described herein, a bridging nucleotide is used which includes at one end a single stranded DNA and at the other end a double stranded DNA. The single stranded DNA portion anneals to the mRNA portion of the mRNA-ribosome-protein complex as described herein. The double stranded DNA portion is ligated to the double stranded DNA barcode of the small molecule interacting with or otherwise bound to the protein of the mRNA-ribosome-protein complex. DNA barcodes may be attached or “stitched” together using methods known to those of skill in the art, such as click methods, enzyme based methods and non-enzyme based methods, and accordingly, sequenced. According to one aspect, the DNA barcodes may be ligated together enzymatically. According to one aspect, the DNA barcodes may be linked together by a linker. Exemplary methods of attaching or ligating nucleic acid barcodes together and sequencing useful in the present disclosure and adaptable in the present methods include Johnson et al., Molecular Cell 81, 1-13 (2021) (describing INLISE incubation, ligation and sequencing procedure); Dixon et al., Topological Domains in mammalian Genomes Identified by Analysis of Chromatin Interactions, Nature (2012); 485(7398): 376- 380 (describing Hi-C proximity ligation and sequencing); Lieberman- Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289-293 (describing Hi-C proximity ligation and sequencing) each of which are hereby incorporated by reference for the teaching of making and using paired end libraries such as by using proximity ligation and sequencing methods. According to one aspect, click methods are used for proximity ligation. According to the present disclosure, click chemistry is used to connect DNA with nucleic acids. See Nicolo Zuin Fantoni, Afaf H. El-Sagheer, and Tom Brown, Chem. Rev. 2021, 121, 12, 7122-7154). Such connections may not interfere with enzymatic activity. el-Sagheer et al., Efficient RNA synthesis by in vitro transcription of a triazole-modified DNA template, Chem Commun (Camb). 2011 Nov 28;47 (44): 12057-8).
BRIDGING POLYNUCLEOTIDES
According to the present disclosure where a candidate compound is interacting with or otherwise bound to a target protein, a polynucleotide may be used to connect or bridge the binding pair as is depicted in Fig. 2B. Such a polynucleotide is referred to herein as a “bridging polynucleotide.” The bridging polynucleotide includes a single stranded DNA portion at one end and a double stranded portion at the other end as depicted in Fig. 2B. The double stranded portion attaches to (for example is ligated to) a DNA sequence including a barcode and attached to a candidate compound where the candidate compound is bound to a target polynucleotide such as a target protein attached to its coding mRNA via ribosome display as depicted in Fig. 2B. The single stranded portion attaches to (for example hybridizes with) a bridge landing hybridization site on a mRNA including a unique barcode attached to the target polynucleotide such as a target protein as depicted in Fig. 2B. The bridging polynucleotide becomes bound to the DNA sequence attached to the candidate compound and hybridizes to the mRNA bound to the target protein. The hybridization site can serve as a transcription primer to transcribe the mRNA including the unique barcode for the target polynucleotide into cDNA which becomes bound to the bridging polynucleotide. The result is a DNA sequence including the unique barcode for the target polypeptide, the bridging polynucleotide and the unique barcode for the candidate compound. Upstream of the unique barcode for the target polypeptide/protein is a universal primer to selectively amplify ligated fragments. This DNA sequence can be sequenced and the unique barcodes identified, as described herein. According to one aspect, the bridging polynucleotide may include a universal primer and a template switching oligonucleotide instead of the universal primer and the template switching oligonucleotide being present in the original DNA construct between the Sp6/T7 and the barcode.
DNA CONSTRUCTS
DNA constructs as described herein (see Fig. 2A for example) include a (1) transcriptional start site, (2) a universal primer binding site, (3) a barcode (4), a primer binding site, (5) an internal ribosome entry site, (6) a protein coding sequence, (7) a peptide tag/FLAG, (8) a spacer with no stop codons.
(1) Transcriptional start sites
A transcriptional start site as described herein is provided in the DNA constructs of the present disclosure so as to transcribe the DNA into mRNA Exemplary transcriptional start sites include Sp6, T7, T3 and the like as are known in the art.
(2) Universal primers
A universal primer as described herein is provided in the DNA constructs of the present disclosure so as to function as amplification or sequencing primers. Exemplary universal primers bind to many different cognate sequences as is known in the art.
(3) Barcodes
A barcode as described herein is used in the DNA construct according to the present disclosure to uniquely identify target proteins within a plurality of target proteins. A barcode as described herein is also used according to the present disclosure to uniquely identify candidate compounds within a plurality of candidate compounds.
(4) Primer binding sites
A primer binding site as described herein is provided in the DNA constructs of the present disclosure so as facilitate binding of a primer for purposes of transcription or reverse transcription as is known in the art.
(5) Internal ribosome entry sites
An internal ribosome entry site as described herein is provided in the DNA constructs of the present disclosure so as to facilitate translation of mRNA into a target protein as is known in the art and as described herein. As is known in the art, protein synthesis is regulated by the sequence and structure of the 5' untranslated region (UTR) of the mRNA transcript. In prokaryotes, one ribosome binding site (RBS), which promotes efficient and accurate translation of mRNA, is called the Shine-Dalgamo sequence. This purine-rich sequence of 5' UTR is complementary to the UCCU core sequence of the 3'-end of 16S rRNA (located within the 30S small ribosomal subunit). Various Shine-Dalgamo sequences have been found in prokaryotic mRNAs. These sequences lie about 10 nucleotides upstream from the AUG start codon. Activity of a RBS can be influenced by the length and nucleotide composition of the spacer separating the RBS and the initiator AUG. In eukaryotes, the Kozak sequence A/GCCACCAUGG (SEQ ID NO:1), which lies within a short 5' untranslated region, directs translation of mRNA. An mRNA lacking the Kozak consensus sequence may be translated efficiently in in vitro systems (Ambion) if it possesses a moderately long 5' UTR that lacks stable secondary structure. Eukaryotic ribosomes (such as those found in reticulocyte lysate) can efficiently use either the Shine-Dalgamo or the Kozak ribosomal binding sites.
(6) Protein coding sequences
A protein coding sequence as described herein is provided in the DNA constructs of the present disclosure so as to facilitate translation from mRNA into the target protein as is known in the art and as described herein. Exemplary protein coding sequences include those encoding proteins that are the target of drug screening libraries. Such exemplary proteins encoded by genes include those described in Finan et al., The druggable genome and support for target identification and validation in drug development, Sci. Transl. Med. (2017); 9(383): eaagl 166; hereby incorporated by reference in its entirety. Such target proteins include target of approved drugs and drugs in clinical development. Such proteins that are targets of approved small molecule and biotherapeutic drugs may be identified using manually curated efficacy target information from release 17 of the ChEMBL database (see Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D,
Al-Lazikani B, Overington JP, Nucleic Acids Res. 2012 Jan; 40(Database issue):Dl 100-7) hereby incorporated by reference in its entirety. Proteins closely related to drug targets or with associated drug-like compounds may be identified through a BLAST search (blastp) of Ensembl peptide sequences against the set of approved drug efficacy targets identified from ChEMBL previously (see Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Kruger Fa, Light Y, Mak L, McGlinchey S, Nowotka M, et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 2014;42:D1083-90 hereby incorporated by reference in its entirety). Extracellular proteins and members of key drug-target families may be identified through a BLAST search against the set of approved drug targets (as above), with any proteins sharing >25% identity over >75% of the sequence and with E-value <0.001 being included in the set. Members of five major ‘druggable’ protein families (GPCRs, kinases, ion channels, nuclear hormone receptors, and phosphodiesterases) may be were extracted from KinaseS arfari, GPCRSarfari, and lUPHARdb (see Pawson AJ, Sharman JL, Benson HE, Faccenda E, Alexander SPH, Buneman OP, Davenport AP, McGrath JC, Peters JA, Southan C, Spedding M, et al. Nc-Iuphar, The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Res. 2014;42:D1098-D1106 hereby incorporated by reference in its entirety). Extracellular proteins may be identified using annotation in UniProt (see Pawson AJ, Sharman JL, Benson HE, Faccenda E, Alexander SPH, Buneman OP, Davenport AP, McGrath JC, Peters JA, Southan C, Spedding M, et al. Nc-Iuphar, The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Res. 2014;42:D1098-D1106 hereby incorporated by reference in its entirety) and Gene Ontology (GO) (see Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25-9 hereby incorporated by reference in its entirety).
Target proteins corresponding to cancer driven genes and mutations may be identified by literature search. See Bailey et al., Cell (2018)1 173(2): 371-385 el8 hereby incorporated by reference in its entirety. Target proteins for many diseases may also be identified via academic research. See Improving target assessment in biomedical research: the GOT-IT recommendations Christoph H. Emmerich, Lorena Martinez Gamboa, Martine C. J. Hofmann, Marc Bonin- Andresen, Olga Arbach, Pascal Schendel, Bjorn Gerlach, Katja Hempel, Anton Bespalov, Ulrich Dirnagl & Michael J. Pamham, Nature Reviews Drug Discovery volume 20, pages64-81 (2021).
Drugs in clinical development may be identified from a number of sources: investor pipeline information from a number of large pharmaceutical companies [including Pfizer, Roche, GlaxoSmithKline, Novartis (oncology only), AstraZeneca, Sanofi, Lilly, Merck, Bayer, and Johnson & Johnson - accessed June-August 2013] monoclonal antibody candidates and USAN applications from the ChEMBL database (release 29), and drugs in active clinical trials from the NIH world wide website clinicaltrials.gov. Targets for these drug candidates may be assigned from company pipeline information and scientific literature, where available. Where no reported target information may be found, a potential target may be assigned through analysis of bioactivity data in ChEMBL, with the target having the highest dose-response measurement < 100 nM for the compound being assigned. Genes involved in ADME/drug disposition (phase I and II metabolic enzymes, transporters, and modifiers) may be identified from the PharmaADME.org extended set.
(7) Protein tags A protein tag as described herein is provided in the DNA constructs of the present disclosure so as to facilitate binding and separation of the target protein (affinity tags), separation when using chromatography (chromatography tags), visualization (fluorescent tags) and the like as is known in the art. As used herein, the term “protein tag” refers to a heterologous polypeptide sequence linked to a target protein. Protein tags include, but are not limited to, Avi tag (GLNDIFEAQKIEWHE) (SEQ ID NO:2), calmodulin tag (KRRWKKNFIAVSAANRFKKISSSGAL) (SEQ ID NOG), FLAG tag (DYKDDDDK) (SEQ ID NO:4), HA tag (YPYDVPDYA) (SEQ ID NOG), His tag (HHHHHH) (SEQ ID NOG), Myc tag (EQKLISEEDL) (SEQ ID NO:7), S tag (KETAAAKFERQHMDS) (SEQ ID NOG), SBP tag (MDEKTTGWRGGHVVEGLAGELEQLRARLEHHPQGQREP) (SEQ ID NO:9), Softag 1 (SLAELLNAGLGGS) (SEQ ID NO: 10), Softag 3 (TQDPSRVG) (SEQ ID NO: 11), V5 tag (GKPIPNPLLGLDST) (SEQ ID NO: 12), Xpress tag (DLYDDDDK) (SEQ ID NO: 13), Isopep tag (TDKDMTITFTNKKDAE) (SEQ ID NO: 14), SpyTag (AHIVMVDAYKPTK) (SEQ ID NO: 15), streptactin tag (Strep-tag II: WSHPQFEK) (SEQ ID NO: 16), Tyl (EVHTNQDPLD) (SEQ ID NO: 17), and the like.
(8) Spacer sequences
A spacer sequence or linker as described herein is provided in the DNA constructs of the present disclosure so as to provide spacing between components of the DNA construct and ultimately the fusions proteins. In a fusion protein, the spacer provides distance between the terminal FLAG tag for example and the target protein to allow movement of the terminal FLAG tag relative to the target protein. Property, design and functionality of exemplary spacer sequences include linkers described in Chen et al., Fusion Protein Linkers: Property, Design and Functionality, Adv. Drug Deliv. Rev. (2013); 65(10): 1357- 1369 hereby incorporated by reference in its entirety and lack stop codons. TRANSCRIPTION MATERIALS AND METHODS
In various embodiments, the methods disclosed herein comprise in vitro transcription of a DNA construct as described herein to mRNA. In vitro transcription includes use of a linear DNA template containing a promoter, ribonucleotide triphosphates, a buffer system that includes DTT and magnesium ions, and an appropriate phage RNA polymerase. One of skill will recognize that exact conditions used in the transcription reaction can be tailored for a particular desired result.
An exemplary commercially available transcription kit includes HISCRIBE™ T7 Quick High Yield RNA Synthesis Kit commercially available from New England Biolabs. The HISCRIBE™ T7 Quick High Yield RNA Synthesis Kit is designed for quick set-up and production of large amounts of RNA in vitro. The reaction can be set up conveniently by combining the NTP buffer mix, T7 RNA Polymerase mix and a suitable DNA template. The kit also allows for capped RNA or dye-labeled RNA synthesis by incorporation of cap analog (ARCA) or dye-modified nucleotides. A DNA template, such as linearized plasmid DNA, PCR products or synthetic DNA oligonucleotides can be used as templates for in vitro transcription with the HISCRIBE™ T7 Quick High Yield RNA Synthesis Kit, provided that the DNA template contains a double- stranded T7 promoter region upstream of the sequence to be transcribed. A minimal T7 promoter sequence is known in the art. Components of commercially available transcription kits include DNase I, T7 RNA polymerase mix, and associated solutions and buffers.
It is to be understood that some translation systems include polymerases for transcription so the system will transcribe a DNA template into RNA which is then translated into the protein. See for example TnT® Coupled Reticulocyte Lysate Systems available from Promega. However, one of skill will recognize that transcription and translation can be carried out as separate steps using different systems. TRANSLATION MATERIALS AND METHODS
In various embodiments, the methods disclosed herein comprise in vitro translation of mRNA into a protein, such as for use with the mRNA display methods described herein. According to one aspect, exemplary cell-free translation systems include extracts from rabbit reticulocytes, wheat germ and Escherichia coli which include the macromolecular components (70S or 80S ribosomes, tRNAs, aminoacyl-tRNA synthetases, initiation, elongation and termination factors, etc.) required for translation of exogenous RNA. The extracts are supplemented with amino acids, energy sources (ATP, GTP), energy regenerating systems (creatine phosphate and creatine phosphokinase for eukaryotic systems, and phosphoenol pyruvate and pyruvate kinase for the E. coli lysate), and other co-factors (Mg2+, K+, etc.).
Rabbit reticulocyte lysate is an efficient in vitro eukaryotic protein synthesis system used for translation of exogenous RNAs (either natural or generated in vitro). Wheat germ extract is an alternative to the rabbit reticulocyte lysate cell-free system. Wheat germ lysate translates exogenous RNA from a variety of different organisms, from viruses and yeast to higher plants and mammals. E. coli cell-free systems include a relatively simple translational apparatus with less complicated control at the initiation level, allowing this system to be efficient in protein synthesis. The Retie Lysate IVT Kit is an in vitro translation kit commercially available from ThermoFisher. Use of a particular system will require either appropriate Shine-Delgarno sequences (for a prokaryotic system) or Kozak sequences (for a eukaryotic one). For DNA constructs obtained by reverse transcription from a biological sample, one method is to use the endogenous Kozak sequences in the 5’ UTRs of genes from those samples. There are a variety of eukaryotic and prokaryotic protein expression systems, for a review see (A User’s Guide to Cell-Free Protein Synthesis Nicole E. Gregorio, Max Z. Levine and Javin P. Ozal, Methods Protoc. 2019 Mar; 2(1): 24. Exemplary systems include PUREEXPRESS™ commercially available from New England Biolabs or rabbit or wheat germ based systems commercially available from Promega.
AMPLIFICATION MATERIALS AND METHODS
In various embodiments, the methods disclosed herein comprise amplification of nucleic acids including, for example, polynucleotides, oligonucleotides and/or oligonucleotide fragments, such as the barcodes described herein. Amplification methods may comprise contacting a nucleic acid sequence with one or more primers (e.g., primers that are complementary to barcode sequences or sequences flanking the barcodes, such as for purposes of obtaining sequencing libraries) that specifically hybridize to the nucleic acid under conditions that facilitate hybridization and chain extension. Exemplary methods for amplifying nucleic acids include the polymerase chain reaction (PCR) (see, e.g., Mullis et al. (1986) Cold Spring Harb. Symp. Quant. Biol. 51 Pt 1:263 and Cleary et al. (2004) Nature Methods 1:241; and U.S. Patent Nos. 4,683,195 and 4,683,202), anchor PCR, RACE PCR, ligation chain reaction (LCR) (see, e.g., Landegran et al. (1988) Science 241:1077-1080; and Nakazawa et al. (1994) Proc. Natl. Acad. Sci. U.S.A. 91:360-364), self-sustained sequence replication (Guatelli et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87:1874), transcriptional amplification system (Kwoh et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86:1173), Q-Beta Replicase (Lizardi et al. (1988) BioTechnology 6:1197), recursive PCR (Jaffe et al. (2000) J. Biol. Chem. 275:2619; and Williams et al. (2002) J. Biol. Chem. 277:7790), the amplification methods described in U.S. Patent Nos. 6,391,544, 6,365,375, 6,294,323, 6,261,797, 6,124,090 and 5,612,199, isothermal amplification (e.g., isothermal bridge amplification (IBA), rolling circle amplification (RCA), hyperbranched rolling circle amplification (HRCA), strand displacement amplification (SDA), helicase-dependent amplification (HDA), PWGA or any other nucleic acid amplification method using techniques well known to those of skill in the art.
“Polymerase chain reaction,” or “PCR,” refers to a reaction for the in vitro amplification of specific DNA sequences by the simultaneous primer extension of complementary strands of DNA. In other words, PCR is a reaction for making multiple copies or replicates of a target nucleic acid flanked by primer binding sites, such reaction comprising one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Usually, the reaction is cycled through different temperatures optimized for each step in a thermal cycler instrument. Particular temperatures, durations at each step, and rates of change between steps depend on many factors well-known to those of ordinary skill in the art, e.g., exemplified by the references: McPherson et al., editors, PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively). For example, in a conventional PCR using Taq DNA polymerase, a double stranded target nucleic acid may be denatured at a temperature greater than 90 °C, primers annealed at a temperature in the range 50-75 °C, and primers extended at a temperature in the range 72-78 °C. In certain aspects, a double stranded target nucleic acid may be denatured at a temperature greater than 90 °C in a conventional PCR using Taq DNA polymerase, or by adding formamide at 60 °C in isothermal bridge amplification using Bst polymerase.
The term “PCR” encompasses derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, assembly PCR and the like. Reaction volumes range from a few hundred nanoliters, e.g., 200 nL, to a few hundred microliters, e.g., 200 microliters. “Reverse transcription PCR,” or “RT- PCR,” means a PCR that is preceded by a reverse transcription reaction that converts a target RNA to a complementary single stranded DNA, which is then amplified, e.g., Tecott et al., U.S. Patent No. 5,168,038. “Real-time PCR” means a PCR for which the amount of reaction product, i.e., amplicon, is monitored as the reaction proceeds. There are many forms of realtime PCR that differ mainly in the detection chemistries used for monitoring the reaction product, e.g., Gelfand et al., U.S. Patent No. 5,210,015 (“Taqman”); Wittwer et al., U.S. Patent Nos. 6,174,670 and 6,569,627 (intercalating dyes); Tyagi et al., U.S. Patent No. 5,925,517 (molecular beacons). Detection chemistries for real-time PCR are reviewed in Mackay et al., Nucleic Acids Research, 30:1292-1305 (2002). “Nested PCR” means a two- stage PCR wherein the amplicon of a first PCR becomes the sample for a second PCR using a new set of primers, at least one of which binds to an interior location of the first amplicon. As used herein, “initial primers” in reference to a nested amplification reaction mean the primers used to generate a first amplicon, and “secondary primers” mean the one or more primers used to generate a second, or nested, amplicon. “Multiplexed PCR” means a PCR wherein multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously carried out in the same reaction mixture, e.g. Bernard et al. (1999) Anal. Biochem., 273:221-228 (two-color real-time PCR). Usually, distinct sets of primers are employed for each sequence being amplified. “Quantitative PCR” means a PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Techniques for quantitative PCR are well-known to those of ordinary skill in the art, as exemplified in the following references: Freeman et al., Biotechniques, 26:112-126 (1999); Becker- Andre et al., Nucleic Acids Research, 17:9437-9447 (1989); Zimmerman et al., Biotechniques, 21:268-279 (1996); Diviacco et al., Gene, 122:3013-3020 (1992); Becker- Andre et al., Nucleic Acids Research, 17:9437-9446 (1989); and the like.
SEQUENCING MATERIALS AND METHODS In certain embodiments, methods of determining the sequence of one or more nucleic acid sequences of interest, e.g., polynucleotides, oligonucleotides and/or oligonucleotide fragments, such as the barcodes described herein, are provided. Determination of the sequence of a nucleic acid sequence of interest can be performed using variety of sequencing methods known in the art including, but not limited to, sequencing by hybridization (SBH), sequencing by ligation (SBL), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter probe digestion, pyrosequencing, and multiplex sequencing (Porreca et al (2007) Nat. Methods 4:931). Commercially available high-throughput sequencing methods, e.g., on cyclic array sequencing using platforms such as Roche 454, Illumina Solexa, AB-SOLiD, Helicos, Polonator, Ion Torrent semiconductor sequencing technology, single-molecule real-time (SMRT) sequencing from Pacific Biosciences, Nanopore-based sequencing from Oxford Nanopore Technologies, platforms and the like, can be utilized. Exemplary sequencing platforms useful with the present disclosure and adaptable to the methods described herein are described in Reuter et al., High- Throughput Sequencing Technologies, Mol. Cell (2015); 58(4): 586-597 hereby incorporated by reference in its entirety.
DEFINITIONS
Terms and symbols of nucleic acid chemistry, biochemistry, genetics, and molecular biology used herein follow those of standard treatises and texts in the field, e.g., Kornberg and Baker, DNA Replication, Second Edition (W.H. Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, Second Edition (Wiley-Liss, New York, 1999); Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, Oxford,
1984); and the like.
“Complementary” or “substantially complementary” refers to the hybridization or base pairing or the formation of a duplex between nucleotides or nucleic acids, such as, for instance, between the two strands of a double stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single stranded nucleic acid. Complementary nucleotides are, generally, A and T (or A and U), or C and G. Two singlestranded RNA or DNA molecules are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared and with appropriate nucleotide insertions or deletions, pair with at least about 80% of the nucleotides of the other strand, usually at least about 90% to 95%, and more preferably from about 98 to 100%. Alternatively, substantial complementarity exists when an RNA or DNA strand will hybridize under selective hybridization conditions to its complement. Typically, selective hybridization will occur when there is at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, more preferably at least about 90% complementary. See Kanehisa (1984) Nucl. Acids Res. 12:203.
“Complex” refers to an assemblage or aggregate of molecules in direct or indirect contact with one another. In one aspect, “contact,” or more particularly, “direct contact,” in reference to a complex of molecules or in reference to specificity or specific binding, means two or more molecules are close enough so that attractive noncovalent interactions, such as van der Waal forces, hydrogen bonding, ionic and hydrophobic interactions, and the like, dominate the interaction of the molecules. In such an aspect, a complex of molecules is stable in that under assay conditions the complex is thermodynamically more favorable than a non-aggregated, or non-complexed, state of its component molecules. As used herein, “complex” refers to a duplex or triplex of polynucleotides, a stable aggregate of two or more proteins, a stable aggregate of a target protein and a candidate compound, such as with drug screening assays, or a stable aggregate formed by an antibody specifically binding to its corresponding antigen.
“Duplex” refers to at least two oligonucleotides and/or polynucleotides that are fully or partially complementary undergo Watson-Crick type base pairing among all or most of their nucleotides so that a stable complex is formed. The terms “annealing” and “hybridization” are used interchangeably to mean the formation of a stable duplex. In one aspect, stable duplex means that a duplex structure is not destroyed by a stringent wash, e.g., conditions including temperature of about 5 °C less that the Tm of a strand of the duplex and low monovalent salt concentration, e.g., less than 0.2 M, or less than 0.1 M. “Perfectly matched” in reference to a duplex means that the polynucleotide or oligonucleotide strands making up the duplex form a double stranded structure with one another such that every nucleotide in each strand undergoes Watson-Crick base pairing with a nucleotide in the other strand. The term “duplex” comprehends the pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine bases, PNAs, and the like, that may be employed. A “mismatch” in a duplex between two oligonucleotides or polynucleotides means that a pair of nucleotides in the duplex fails to undergo Watson-Crick bonding.
“Kit” refers to any delivery system for delivering materials or reagents for carrying out a method of the invention. In the context of assays, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., primers, enzymes, microarrays, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials for assays of the invention. Such contents may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme for use in an assay, while a second container contains primers.
“Ligation” means to form a covalent bond or linkage between the termini of two or more nucleic acids, e.g., oligonucleotides and/or polynucleotides, in a template-driven reaction. The nature of the bond or linkage may vary widely and the ligation may be carried out enzymatically or chemically. As used herein, ligations may be carried out using known “click” chemistry, non-enzyme mediated stitching together of phosphate backbones, or enzymatic methods to form a phosphodiester linkage between a 5' carbon of a terminal nucleotide of one oligonucleotide with 3' carbon of another oligonucleotide. A variety of template-driven ligation reactions are described in the following references: Whitely et al., U.S. Patent No. 4,883,750; Letsinger et al., U.S. Patent No. 5,476,930; Fung et al., U.S. Patent No. 5,593,826; Kool, U.S. Patent No. 5,426,180; Landegren et al., U.S. Patent No. 5,871,921; Xu and Kool (1999) Nucl. Acids Res. 27:875; Higgins et al., Meth, in Enzymol. (1979) 68:50; Engler et al. (1982) The Enzymes, 15:3 (1982); and Namsaraev, U.S. Patent Pub. 2004/0110213.
Nucleic acid molecules may be isolated from natural sources or purchased from commercial sources. Oligonucleotide sequences (e.g., barcodes) may also be prepared by any suitable method, e.g., standard phosphoramidite methods such as those described by Beaucage and Carruthers ((1981) Tetrahedron Lett. 22: 1859) or the triester method according to Matteucci et al. (1981) J. Am. Chem. Soc. 103:3185), or by other chemical methods using either a commercial automated oligonucleotide synthesizer or high- throughput, high-density array methods known in the art (see U.S. Patent Nos. 5,602,244, 5,574,146, 5,554,744, 5,428,148, 5,264,566, 5,141,813, 5,959,463, 4,861,571 and 4,659,774, incorporated herein by reference in its entirety for all purposes). Pre- synthesized oligonucleotides may also be obtained commercially from a variety of vendors. Nucleic acid molecules may be obtained from one or more biological samples. As used herein, a “biological sample” may be a single cell or many cells. A biological sample may comprise a single cell type or a combination of two or more cell types. A biological sample further includes a collection of cells that perform a similar function such as those found, for example, in a tissue. Accordingly, certain aspects of the invention are directed to biological samples containing one or more tissues. As used herein, a tissue includes, but is not limited to, epithelial tissue (e.g., skin, the lining of glands, bowel, skin and organs such as the liver, lung, kidney), endothelium (e.g., the lining of blood and lymphatic vessels), mesothelium (e.g., the lining of pleural, peritoneal and pericardial spaces), mesenchyme (e.g., cells filling the spaces between the organs, including fat, muscle, bone, cartilage and tendon cells), blood cells (e.g., red and white blood cells), neurons, germ cells (e.g., spermatozoa, oocytes), amniotic fluid cells, placenta, stem cells and the like. A tissue sample includes microscopic samples as well as macroscopic samples.
In certain aspects, nucleic acid sequences derived or obtained from one or more organisms are provided. The term organism is generally understood to mean an individual animal, plant, or single-celled life form. As used herein, the term “organism” includes, but is not limited to, a human, a non-human primate, a cow, a horse, a sheep, a goat, a pig, a dog, a cat, a rabbit, a mouse, a rat, a gerbil, a frog, a toad, a fish (e.g., Danio rerio) a roundworm (e.g., C. elegans) and any transgenic species thereof. The term “organism” further includes, but is not limited to, a yeast (e.g., S. cerevisiae) cell, a yeast tetrad, a yeast colony, a bacterium, a bacterial colony, a virion, virosome, virus-like particle and/or cultures thereof, and the like. The term “organism” further includes a plant, and crops in particular.
Isolation, extraction or derivation of nucleic acid sequences may be carried out by any suitable method. Isolating nucleic acid sequences from a biological sample generally includes treating a biological sample in such a manner that nucleic acid sequences present in the sample are extracted and made available for analysis. Any isolation method that results in extracted nucleic acid sequences may be used in the practice of the present invention. It will be understood that the particular method used to extract nucleic acid sequences will depend on the nature of the source.
Methods of DNA extraction are well-known in the art. A classical DNA isolation protocol is based on extraction using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbour Laboratory Press: New York, N.Y.). Other methods include: salting out DNA extraction (P. Sunnucks et al., Genetics, 1996, 144: 747-756; S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25: 4692-4693), trimethylammonium bromide salts DNA extraction (S. Gustincich et al., BioTechniques, 1991, 11: 298-302) and guanidinium thiocyanate DNA extraction (J. B. W. Hammond et al., Biochemistry, 1996, 240: 298-300). A variety of kits are commercially available for extracting DNA from biological samples (e.g., BD Biosciences Clontech (Palo Alto, CA): Epicentre Technologies (Madison, WI); Gentra Systems, Inc. (Minneapolis, MN); MicroProbe Corp. (Bothell, WA); Organon Teknika (Durham, NC); and Qiagen Inc. (Valencia, CA)).
Methods of RNA extraction are also well known in the art (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual” 1989, 2nd Ed., Cold Spring Harbour Laboratory Press: New York) and several kits for RNA extraction from bodily fluids are commercially available (e.g., Ambion, Inc. (Austin, TX); Amersham Biosciences (Piscataway, NJ); BD Biosciences Clontech (Palo Alto, CA); BioRad Laboratories (Hercules, CA); Dynal Biotech Inc. (Lake Success, NY); Epicentre Technologies (Madison, WI); Gentra Systems, Inc. (Minneapolis, MN); GIBCO BRL (Gaithersburg, MD); Invitrogen Life Technologies (Carlsbad, CA); MicroProbe Corp. (Bothell, WA); Organon Teknika (Durham, NC); Promega, Inc. (Madison, WI); and Qiagen Inc. (Valencia, CA)).
“Primer” includes an oligonucleotide, either natural or synthetic, that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed. The sequence of nucleotides added during the extension process are determined by the sequence of the template polynucleotide. Usually primers are extended by a DNA polymerase. Primers usually have a length in the range of between 3 to 36 nucleotides, also 5 to 24 nucleotides, also from 14 to 36 nucleotides. Primers within the scope of the invention include orthogonal primers, amplification primers, constructions primers and the like. Pairs of primers can flank a sequence of interest or a set of sequences of interest. Primers and probes can be degenerate in sequence. Universal primers are contemplated. Universal primers are complementary to nucleotide sequences that are very common in a particular set of DNA molecules and cloning vectors. Thus, they are able to bind to a wide variety of DNA templates. Primers within the scope of the present invention bind adjacent to a target sequence (e.g., an oligonucleotide fragment, a barcode sequence or the like).
“Specific” or “specificity” in reference to the binding of one molecule to another molecule, such as an amplification or sequencing primer to a barcode sequence, means the recognition, contact, and formation of a stable complex between the two molecules, together with substantially less recognition, contact, or complex formation of that molecule with other molecules. In one aspect, “specific” in reference to the binding of a first molecule to a second molecule means that to the extent the first molecule recognizes and forms a complex with another molecules in a reaction or sample, it forms the largest number of the complexes with the second molecule. In certain aspects, this largest number is at least fifty percent. Generally, molecules involved in a specific binding event have areas on their surfaces or in cavities giving rise to specific recognition between the molecules binding to each other. Examples of specific binding include compound-protein interactions, antibody- antigen interactions, enzyme-substrate interactions, formation of duplexes or triplexes among polynucleotides and/or oligonucleotides, receptor-ligand interactions, and the like. As used herein, “contact” in reference to specificity or specific binding means two molecules are close enough that weak non-covalent chemical interactions, such as van der Waal forces, hydrogen bonding, base-stacking interactions, ionic and hydrophobic interactions, and the like, dominate the interaction of the molecules.
It is to be understood that the embodiments of the present invention which have been described are merely illustrative of some of the applications of the principles of the present invention. Numerous modifications may be made by those skilled in the art based upon the teachings presented herein without departing from the true spirit and scope of the invention. The contents of all references, patents and published patent applications cited throughout this application are hereby incorporated by reference in their entirety for all purposes.
The following examples are set forth as being representative of the present invention. These examples are not to be construed as limiting the scope of the invention as these and other equivalent embodiments will be apparent in view of the present disclosure, figures, tables and accompanying claims.
EXAMPLE I Methods of Making a Library of Barcoded Compounds
To analyze binding of candidate compounds to target proteins in a high throughput method, a library of candidate compounds coupled to DNA bearing a barcoding sequence is used, referred to herein as a DNA encoded library. According to one aspect, DNA-encoded chemical libraries (DEL) are either commercially available or manufactured as known in the art and used for screening collections of compounds, such as small molecule compounds, against a library of DNA barcoded target proteins, where compound-protein binding is determined by sequencing a construct including the barcode of the compound and the barcode of the protein.
According to one aspect, DNA barcodes in the form of short DNA fragments are conjugated to candidate compounds that serve as unique identification barcodes for each candidate compound. See Brenner et al., PNAS USA 89 (12): 5381-5383 (1992); Nielsen et al., JACS 115 (21): 9812-9813 (1993); Needels et al., PNAS USA 90 (22): 10700-4 (1993).
DNA encoded libraries for screening against target proteins are commercially available from Sigma as DyNAbind (world wide website sigmaaldrich.com/US/en/product/sial/dyna001) or Genscript as GenDECL 9world wide website genscript.com/dna-encoded-chemical-library-kit.html). Additional commercially available DNA encoded libraries are available from Comlnnex Zrt., X-Chem, and AlphaMa.
Specific examples of methods of making DNA encoded libraries are known in the art and include Shi et al., DNA-encoded libraries (DELs): a review of on-DNA chemistries and their output, RSC Adv., 2021, 11, 2359; DOI: 10.1039/d0ra09889b, hereby incorporated by reference in its entirety for the disclosure of methods of making and using DNA encoded libraries.
EXAMPLE II
Methods of Making a Library of Barcoded Target Proteins by Direct Attachment of the Barcode to the Target Protein
To analyze binding of candidate compounds to target proteins in a high throughput method, a library of target proteins each bearing a barcoding sequence is generated. According to one aspect, a Snap-tag protein library can be generated and such a library can be used to attach barcodes to target proteins. See Chan et al., Discovery of a Covalent Kinase Inhibitor from a DNA-encoded Small Molecule Library x Protein Library Selection, J. Am. Chem. Soc., 2017; 139(30): pp. 10192-10195 and Supplemental Materials and Methods at 10.1021/jacs.7b04880 hereby incorporated by reference for the teaching of DNA encoded libraries and libraries of barcoded target proteins, such as SNAP-tagged, DNA-barcoded target proteins.
EXAMPLE III
Methods of Making a Library of Barcoded Target Proteins by Ribosome Display
To analyze binding of candidate compounds to target proteins in a high throughput method, a library of target proteins bearing a barcoding sequence are generated using Ribosome display. One barcoding approach is to in vitro translate and display proteins on mRNA-ribosome-protein complexes, in which the mRNA contains a synthetic barcode. Specifically, the ribosome display is performed by using mRNA as a template and an in vitro translation (IVT) system, where the mRNA template lacks a stop codon such that translation stops to produce a mRNA-ribosome-protein. mRNA-ribosome-protein complexes may be purified or enriched Flag-tag affinity purification.
According to one aspect, the following oligonucleotides are generated or otherwise provided.
1. A DNA oligo construct containing the coding sequence of the gene(s) of interest which also includes, in the following 5 ’-3’ order: T7 or Sp6 or other transcription start site; a universal PCR primer site that is common to all genes in this library; a unique, short barcode (about 10 nucleotides) per coding region; a bridge landing site common to all genes in this library, e.g. GGGCGGCGGGGAAA(SEQ ID NO: 18); a ribosomal entry site (either endogenous or added); a coding sequence of a gene of interest; and lacking a stop codon. See Fig. 2A. More particularly, a DNA construct is provided including (1) a transcriptional start site such as an Sp6 site (“Sp6”), (2) a universal PCR primer (“primer”), (3) a barcode (“hash”) unique to the protein of interest, which may be a few nucleotides, (4) a universal primer binding site (“bridge” or “bridge landing site”) for binding to a primer on a bridging polynucleotide, (5) an internal ribosomal entry site (“IRES”) to be used for translation, (6) the coding sequence encoding for the protein or protein fragment of interest (“target protein”), (7) a peptide tag such as FLAG, followed by (8) a spacer with no stop codons. The DNA construct is transcribed into mRNA using the transcriptional start site.
2. A bridge oligo containing the reverse complement of the bridge landing site at its 3’ end and a dsDNA 5’ site roughly the size of a primer (about 20 bp). See Fig. 2B depicting the bridging polynucleotide hybridized to the mRNA at the bridge landing site and being ligated to the dsDNA of the small molecule.
3. A reverse primer for PCR (“primer”) of the ligated construct as depicted in Fig. 5C 5' of the target protein bridge and hash and the DEL primer, enabling PCR amplification only of DNA fragments generated by a ligation event between the target protein and the compound when generated using the ribosome/bridge display approach of Fig. 2A-D.
The coding sequence oligo (a pool of which is a cDNA library) is made by chemical synthesis (e.g. gblocks from IDT) or from mRNAs isolated from cells or tissue. If the latter, random primers may be used to start first-strand cDNA synthesis (primers including the landing site) or gene-specific primers designed to be upstream of, or replace, the stop codon at the end of the protein-coding region. After first-strand synthesis by either method, a template-switching oligo is used for second-strand synthesis and to provide a site to add T7 or similar promoter, a universal PCR primer, a unique barcode, and a bridge landing site with PCR. If made from mRNA, sequencing is used to associate the unique barcode with the coding region of the gene. The bridge and its primer are made by chemical synthesis.
The cDNA library is transcribed into RNA using a bacteria RNA polymerase, e.g. HiScribe T7 kit from New England Biolabs. Prior to translation, the RNA is denatured and the bridge and its primer are added, annealing the bridge to its 5’ landing site on the RNAs.
The RNAs in the library are then subject to in vitro translation using a commercially available system selected based on the ribosomal entry site used. For example, if Shine- Delgamo sequences are used, a prokaryotic kit like NEBExpress Cell-free E. coli Protein Synthesis System from New England Biolabs may be used. If endogenous sequences are used, they are likely to have canonical Kozak sequences in the 5’ UTR, which would be preserved by the template- switching oligo approach in cDNA synthesis. Accordingly, a system including a wheat germ extract or rabbit reticulocyte commercially available from Promega, or other eukaryotic approach for translation can be used. The result is a ribosome- displayed library of proteins with a dsDNA oligo attached which can then be screened.
EXAMPLE IV
Methods of Making a Library of Barcoded Target Proteins by mRNA Display
To analyze binding of candidate compounds to target proteins in a high throughput method, a library of target proteins bearing a barcoding sequence are generated using mRNA display shown generally at Fig. 3A-D. Methods of barcoding a protein using mRNA are known to those of skill in the art as described herein. According to one aspect, a library of cDNA constructs are constructed as described herein. According to one aspect, the DNA construct includes a T7 RNA polymerase binding site or Sp6 transcription factor binding site or other transcription start site at the 5’ end of the DNA construct. The DNA construct then includes a ribosomal entry site (which may be either endogenous or added). The DNA construct lacks a stop codon. Instead, a landing site for a DNA linker including a puromycin (e.g. GGGCGGCGGGGAAA) (SEQ ID NO: 19) is provided. See Fig. 3A and Fig. 3B. According to one aspect, the DNA construct can be made by chemical synthesis (e.g. gblocks from IDT) or from mRNAs isolated from cells or tissue. If the latter, random primers may be used to start first-strand cDNA synthesis (primers including the landing site) or gene-specific primers designed to be upstream of, or replace, the stop codon at the end of the proteincoding region. After first-strand synthesis by either method, a template- switching oligo is used for second-strand synthesis and to provide a site to add T7 or similar promoter with PCR. According to one aspect, a puromycin is covalently attached to a DNA oligo (commercially available from IDT or Trilink or Baseclick). See. Fig. 3B. See for example, Barendt et al., Streamlined protocol for mRNA Display, ACS Comb Scio. 2013: 15(2): 77- 81; Reyes et al., PURE mRNA display and cDNA display provide rapid detection of core epitope motif via high-throughput sequencing, Biotechnology and Bioengineering, vol. 118, issue 4, pp. 1702-1715 (2021); Yamaguchi et al., cDNA display: a novel screening method for functional disulfide-rich peptides by solid-phase synthesis and stabilization of mRNA- protein fusions, Nucleic Acids Res., 37(16) el08 (2009); Ueno et al., cDNA display: rapid stabilization of mRNA display, Methods Mol Bio (2012);805: 113-135 (Fig. la referring to an “initiation site for reverse transcription”); Ueno et al., Improvement of a Puromycin-linker to Extend the Selection Target Varieties in cDNA Display Method, j. Biotechnol. (2012); 162(2-3): pp. 299-302; each of which are hereby incorporated by reference in its entirety for the teaching of mRNA and cDNA display methods.
The cDNA library is transcribed into RNA using a bacteria RNA polymerase, e.g. HiScribe T7 kit from New England Biolabs. The result is an RNA library for different genes with ribosomal entry sites, no stop codons, and a landing site for the puromycin linker. The puromycin linker is then linked to the 3’ end of the RNAs in the library using T4 RNA ligase 1. See Fig. 3B. The RNAs in the library are then subject to in vitro translation using a commercially available system selected based on the ribosomal entry site used. For example, if Shine-Delgarno sequences are used, a prokaryotic kit like NEBExpress Cell-free E. coli Protein Synthesis System from New England Biolabs may be used. If endogenous sequences are used, they are likely to have canonical Kozak sequences in the 5’ UTR, which would be preserved by the template- switching oligo approach in cDNA synthesis. Accordingly, a system including a wheat germ extract or rabbit reticulocyte commercially available from Promega, or other eukaryotic approach for translation can be used. The result is a library of proteins covalently attached to RNAs encoding them through a puromycin linker. cDNA is then synthesized from the puromycin linker on the protein. See Fig. 3C showing reverse transcription of the mRNA attached to the target protein via puromycin. Prior to reverse transcription, the 3’ end of the mRNA is trimmed with a restriction enzyme (see Fig. 3D showing trimmed 3’ end), allowing the mRNA strand to be displaced during second-strand synthesis and removed. The puromycin linker is DNA and has a landing site for the reverse transcriptase (e.g., Ueno et al., Methods Mol Biol (2012);805: 113-35). See. Fig. 3C. A template-switching oligo is then used for the second strand, generating a blunt- ended double- stranded cDNA of the gene’s RNA covalently attached to the protein. See Fig. 3D where the arrow indicates use of a template switching oligo for second strand synthesis. As a result of the above methods, a DNA-encoded protein library is generated for screening.
EXAMPLE V
Creating Candidate Compound-Target Protein Binding Pairs and Ligation of Barcodes To identify a candidate compound-target protein binding pair, the following method is carried out. A candidate compound and a target protein are combined under conditions promoting binding of the compound to the target protein. The DNA barcode attached to the compound and the DNA barcode attached to the target protein are attached to each other generating a DNA construct including the barcode of the compound and the barcode of the protein. The DNA construct is then sequenced. The barcodes are identified thereby identifying the compound and the protein bound to each other. See generally Fig. 1.
According to one aspect, a commercially available DNA-encoded library as described herein is combined with or mixed with a library of nucleic acid encoded target proteins as described above, such as the library of mRNA display proteins or ribosome display proteins under suitable concentrations and temperature for a period of time to reach equilibrium and to form candidate compound-target protein binding pairs or complexes.
For a candidate compound-target protein binding pair or complex (i.e., small molecule-display target complex), the two blunt dsDNA fragments including the barcodes are ligated together with T4 ligase, such as is commercially available as NEB’s Blunt/TA Ligase Master Mix which is a ready-to-use solution of T4 DNA Ligase, ligation enhancer, and optimized reaction buffer. This master mix is specifically formulated to improve ligation and transformation of both blunt-end and single-base overhang substrates. Other T4 DNA Ligase products include Quick Ligation Kit, Salt-T4, and Hi-T4.
When using the ribosome display library and ligating or attaching the bridging polynucleotide to the barcode of the candidate compound bound to the target protein, first- strand synthesis is performed after ligation using the bridge as a primer to the proteinencoding RNA. See Fig. 2A-C. As the reverse transcriptase proceeds, it transcribes the target protein barcode upstream of the bridge binding site. A template- switching oligo is then used to initiate second-strand synthesis, which will proceed down the target protein barcode, the bridge binding site, past the bridge primer and into the ligated small compound barcode. See
Fig. 2D-E. RNAse may then be used to remove unwanted RNA products.
EXAMPLE VI
Creating Candidate Compound-Target Protein Binding Pairs and Joining of Barcodes Using Click Chemistry
To identify a candidate compound-target protein binding pair, the following method is carried out. A candidate compound and a target protein are combined under conditions promoting binding of the compound to the target protein. The DNA barcode attached to the compound and the DNA barcode attached to the target protein are attached to each other using Click chemistry generating a chimeric DNA construct including the barcode of the compound and the barcode of the protein. The chimeric DNA construct is then sequenced. The barcodes are identified thereby identifying the compound and the protein bound to each other.
According to one aspect, a commercially available DNA-encoded library as described herein is combined with or mixed with a library of nucleic acid encoded target proteins as described above, such as the library of mRNA display proteins or ribosome display proteins, or proteins having a dsDNA barcode directly attached thereto under suitable concentrations and temperature for a period of time to reach equilibrium and to form candidate compoundtarget protein binding pairs or complexes.
For a candidate compound-target protein binding pair or complex (i.e., small molecule-display target complex), the dsDNA with the barcode of the candidate compound and the dsDNA with the barcode of the target protein include click chemistry moieties that bind together under suitable conditions. See Fig. 4D. The two dsDNA are connected together using click chemistry. For example, click-modified nucleotides are added to the end of the barcode of the candidate compound and the end of the barcode of the target protein with terminal transferase (New England Biolabs). The click-modified nucleotides at the ends of the barcodes are then reacted together when the candidate compounds and the target proteins form complexes, i.e. small molecule-display target complexes, either with Cu+ or by simple addition. Click-modified nucleotides have been shown to function as wild-type ones in various cellular machineries (see Nicolo Zuin Fantoni, Afaf H. El-Sagheer, and Tom Brown, A Hitchhiker’s Guide to Click-Chemistry with Nucleic Acids, Chem. Rev. 2021, 121, 12, 7122-7154 and el-Sagheer AH, Brown T., Efficient RNA synthesis by in vitro transcription of a triazole-modified DNA template, Chem Commun (Camb). 2011 Nov 28;47(44): 12057-8 each of which are hereby incorporated by reference in its entirety for the teaching of click chemistry methods.) The ends of the barcodes may be rendered blunt using NEB NEXT End Repair Module commercially available from New England Biolabs to reduce chance annealing between the two barcodes, especially via overhangs. The click chemistry moieties bind together thereby forming a chimeric dsDNA that can be sequenced and the barcode of the candidate compound and the barcode of the target protein can be identified.
EXAMPLE VII High-Throughput Screening
A plurality of candidate compound-target protein binding pairs are identified within a mixture of candidate compounds and target proteins as follows. A plurality of uniquely barcoded candidate compounds and a plurality of uniquely barcoded target proteins are combined under conditions promoting binding of the candidate compounds to the target proteins to form a plurality of binding pairs. For a binding pair, the DNA barcode attached to the compound and the DNA barcode attached to the target protein are attached to each other generating a chimeric DNA construct including the barcode of the compound and the barcode of the protein. The chimeric DNA construct is then sequenced. For the plurality of binding pairs, the barcodes are identified thereby identifying the compound and the protein bound to each other.
According to one aspect, a sequencing library is constructed.
In general, Fig. 5A depicts a target protein-compound complex ("small molecule", "protein") with ligated barcodes ("Ligation"). Positions labeled "Tn5" depict random insertion events where the transposase Tn5 cuts dsDNA and inserts sequencing adapters into the free ends of the cuts. "DEL primer" indicates the position of a primer identical across all small molecules, with a unique barcode per molecule downstream.
With reference to Fig. 5B, if mRNA display is used to generate the barcoded protein library, tagmentation is used to cut and insert sequencing primers along the dsDNA encoding the target protein. With tagmentation, transposases randomly cut the DNA into sizes between 50 to 500 bp fragments and adds adaptors simultaneously. See Clark, David P. (2 November 2018). Molecular biology. Pazdernik, Nanette Jean,, McGehee, Michelle R. (Third ed.). London. ISBN 978-0-12-813289-0. OCLC 1062496183 hereby incorporated by reference for the teaching tagmentation techniques adaptable to the methods described herein. According to one aspect, the dsDNA will be ligated to a small molecule library barcode. According to one aspect, the dsDNA will not be ligated to a small molecule library barcode. The fragments are then amplified using one primer against the Tn5-inserted sequencing primer site and one primer directed against the universal portion of the small molecule library barcode, so that only protein-linked cDNA ligated to a small molecule barcode will be amplified. Sequencing primer sequences are added to the primers directed against the universal portion of the small molecule library barcode, allowing for high-throughput sequencing. The target protein is identified by the 3’ end of the coding sequence and the bound small molecule is identified by its barcode. With reference to Fig. 5C, if ribosome display is used to generate the barcoded protein library, fragments are amplified using one primer directed against the primer site upstream of the protein barcode and one primer directed against the universal portion of the small molecule library barcode. Library construction is then completed using end-repair and dA- tailing and sequencing primer ligation using NEBNext® Ultra™ II DNA Library Prep Kit for Illumina®. The target protein is then identified by its upstream barcode by sequencing the cDNA library as referenced above to link barcodes to proteins and the bound small molecule is identified by its barcode. Protein barcodes are identified in advance by sequencing the cDNA constructs (those including the Sp6, the barcode, and the gene of interest) in order to link the protein barcode to the coding sequence. When screening for small molecules, the protein barcode is identified, which identifies the protein.
EXAMPLE VIII
Emulsion Technique to Promote Binding
A method for screening DNA-encoded libraries against target proteins is provided which uses water-in-oil emulsion technology to isolate within a droplet an individual compound and an individual protein or a plurality of compounds and a plurality of target proteins to facilitate binding of a compound to a target protein in a single-tube approach. According to one aspect, (1) the plurality of compounds and (2) the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes (1) a compound of the plurality, and (2) a target protein of the plurality under conditions creating a bound compound-protein binding pair, wherein the dsDNA attached to the compound is ligated to the dsDNA attached to the protein to create a dsDNA construct comprising the unique barcode sequence for the target protein and the unique barcode sequence of the compound. Various water in oil emulsion techniques for isolating binding pairs and adaptable to the present disclosure are described by Petersen et al., Med. Chem. Commim., 2016, 7, 1332- 1339; and Turner et al., Nat Protoc. 2009;4(12):pp. 1711-1783 each of which are hereby incorporated by reference in its entirety for the teaching of emulsion techniques to isolate components for binding.
EXAMPLE IX Stabilizing Candidate Compound - Target Protein Interactions with Formaldehyde and DNA- binding Proteins Followed by Ligation
The present disclosure provides various methods to promote binding of a candidate compound to a target protein to facilitate ligation of barcodes to provide a chimeric DNA construct for sequencing.
According to one aspect, to each DNA barcode attached to a small molecule of a small molecule library, a sequence is added to the DNA barcode that is recognized by a small DNA binding protein. The small DNA binding protein is also added. See for example Blanco et al., A Synthetic Miniprotein that Binds Specific DNA Sequences by Contacting Both the Major and Minor Groove, Chemistry & Biology, vol. 10, issue 8, (2003), pages 713- 722 hereby incorporated by reference in its entirety. This generates a small molecule library with attached DNA barcode and a small protein attached to the DNA barcode at high affinity. The small molecule library is then mixed with the ds DNA barcoded protein library and the small DNA binding protein is crosslinked to the target protein with formaldehyde, increasing the stability of transient interactions. See Fig. 4B. A similar approach is commonly used to stabilize transient interactions between higher-order chromatin interactions such as loops (see Lieberman- Aiden et al., Comprehensive mapping of Long Range Interactions Reveals Folding Principles of the Human Genome, Science, vol. 326, Issue 5950, pp. 289-293 (2009) hereby incorporated by reference in its entirety. Once stabilized, the two dsDNA fragments (one from the display target, one from the small molecule) are ligated to form a chimeric
DNA molecule.
Formaldehyde may also directly crosslink DNA to DNA so that a DNA binding protein is not required. See Kawanishi et al., Front. Environ. Sci., 2014; vol. 2, article 36 pp. 1-8 (formaldehyde induces N-hydroxymethyl mono-adducts on guanine, adenine and cytosine, and N-methylene crosslinks between adjacent purine in DNA) hereby incorporated by reference in its entirety for the teaching of formaldehyde crosslinking DNA to DNA.
EXAMPLE X
Stabilizing Candidate Compound - Target Protein Interactions with Intercalators and Maleimide Followed by Ligation
A variety of agents noncovalently intercalate into DNA, such as ethidium bromide or doxorubicin. According to one aspect, the dsDNA barcodes of the candidate compounds of the DNA encoded library are provided with an intercalating agent having maleimide attached thereto, such as a doxorubicin-maleimide conjugate. The maleimide covalently reacts with neighboring cysteines of the target protein. See Fig. 4A. See Ravasco et al., Bioconjugation with Maleimides: A Useful Tool for Chemical Biology, Chemistry Europe, Vol. 25, Issue 1, pp. 43-59 (2019) hereby incorporated by reference in its entirety. The intercalating agent with the maleimide, such as a doxorubicin-maleimide conjugate, is added to the DNA- encoded library where it will intercalate into the DNA barcodes (see Kruger et al., Synthesis and Stability of Four Maleimide Derivatives of the Anticancer Drug Doxorubicin for the Preparation of Chemoimmunoconjugates, Chemical and Pharmaceutical Bulletin, vol. 45, issue 2, pp. 399-401 (1997). The DNA encoded library with the intercalator-maleimide conjugate is combined with the target protein library, where the intercalator-maleimide conjugate attached to the small molecule DNA barcode will bind to cysteines in its bound protein target partner, increasing the stability of transient interactions. Once stabilized, the two dsDNA fragments (one from the display target, one from the small molecule) are ligated together to form a chimeric DNA molecule. See Fig. 4A.
EXAMPLE XI
Stabilizing Candidate Compound - Target Protein Interactions with Click Chemistry and Modified Oligos Followed by Ligation
According to one aspect, nucleotides modified with click chemistry moieties (e.g. azide, dibenzocyclooctyl, alkynes,) are commercially available from Integrated DNA Technologies (IDT). Appropriate pairs of click-compatible chemistries are provided on the dsDNA of the target protein and the dsDNA of the candidate compound/small molecule. See Fig. 4C. Appropriate pairs of click-compatible chemistries may be provided on the puromycin linker or the bridge oligo for the display approaches and the barcodes for the small molecule library (e.g., azide on the puromycin linker, alkyne on the small molecule library). The pairs react with one another as is known in the art thereby stabilizing the binding of the small molecule to the target protein. Once stabilized, the two dsDNA fragments (one from the display target, one from the small molecule) are ligated to form a chimeric DNA molecule. See Fig. 4C.
EXAMPLE XII
Mutational Scanning Methods and Creating Libraries of Mutants of a Target Protein
The present disclosure provides methods of generating a collection of mutations of the desired coding sequence with imprecise or sloppy PCR, ligating a random barcode to those coding sequences, and performing long-read DNA sequencing to associate a barcode with its unique mutations. Exemplary mutation scanning methods where protein variants are the target proteins are provided where a plasmid library that expresses all desired variants of a protein, a library of mutant proteins, is generated. Applying a selective pressure may winnow the pool down to plasmids expressing variants with optimal function. High-throughput DNA sequencing is then used to measure the frequency of each variant during the selection process. Each variant is assigned a functional score based on its library frequency before selection compared to its library frequency after selection. For example, Error-prone PCR can be used where a wild-type template is amplified with a “sloppy” version of PCR that results in a polymerase error rate of up to 2% per nucleotide position. The “sloppy” PCR reaction is created by making some or all of the following modifications: 1) increased concentration of Taq polymerase, 2) increased PCR extension time, 3) increased concentration of MgCk, 4) increased concentration of dNTP, and/or 5) the addition of MnCh. As an alternate example, Gibson assembly may be used to join homologous DNA fragments in a single-tube reaction. For mutational scanning experiments, libraries of DNA fragments that contain variable bases (NNN) at any given codon site of interest can be cloned into an expression vector with Gibson assembly. As an alternate example, PFunkel may be used. A uracil labeled wild-type ssDNA is used as template for a site-directed mutagenesis PCR that uses mutagenic primers to introduce all desired codon changes. A second universal primer is used to synthesize a complementary mutant strand. Then the uracil-labeled template strand is degraded by uracil DNA glycosylase (UDG) and Exonuclease III (ExoIII). UDG recognizes uracils that are in DNA and ExoIII recognizes nicked duplexed DNA. This results in the final product having a codon change present on both strands of the plasmid. The ssDNA template is generated by transforming a backbone phagemid vector into an E. coli strain that is deficient in dUTPase and uracil deglycosidase. dUTPase prevents the buildup of dUTP and uracil deglycosidase removes uracil from newly synthesized DNA. Lack of both enzymes leads to a buildup of uracil and the insertion of uracil into newly synthesized DNA. This labeled ssDNA is packaged into phage and then extracted. DNA shuffling may be used to combine pairs of mutations together to test combinatory effects. Alternatively, a one-pot saturation mutagenesis technique described in Wrenbeck et al., Nat Methods (2016)(l l):928-930 hereby incorporated by reference in its entirety is a PCR-based approach for generating a customizable comprehensive mutagenesis library that’s ready to be tested in a functional screen. The following steps can be carried out for the one -pot saturation mutagenesis technique: 1. Prepare ssDNA template: The wild-type plasmid backbone is nicked by one of a pair of restriction enzymes. These enzymes, Nt.BbvCI and Nb.BbvCI, recognize the same restriction site, but nick opposite strands of DNA. Treating the plasmid with ExoIII and Exol then fully degrades the nicked strand. 2. Synthesize mutant strand: A mix of degenerate primers and the high-fidelity Phusion polymerase are used to introduce point mutations into the newly synthesized DNA strand. Each degenerate primer set contains three consecutive randomized bases (NNN) at a given codon site and multiple primer sets tile across the protein or region of interest. A low primer-to-template ratio is used to promote annealing of one primer to each template. The PCR product is then column purified. 3. Degrade wild-type template strand: Next, the opposite wild-type DNA strand is nicked by the BbvCI variant not used in step 1. Then this strand is degraded with ExoIII and Exol. 4. Synthesize 2nd mutant strand: The second mutant strand is synthesized in the same manner as the first mutant strand, but a universal primer is used for this round of PCR. The PCR product is digested with Dpnl to remove residual starting template. After transformation and harvest, the library is ready for use. See world wide website blog.addgene.org/deep-mutational-scanning-with-one-pot- saturation-mutagenesis.
EXAMPLE XIII Embodiments
The present disclosure provides a method for determining interactions between a plurality of compounds and a plurality of target proteins, wherein each compound of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the compound, wherein each target protein of the plurality has a mRNA attached thereto, wherein the mRNA comprises (i) a first hybridization site, (ii) a barcode unique to the target protein, (iii) a universal PCR primer binding site and (iv) a transcriptional start site, the method includes combining (1) a bridging polynucleotide, (2) the plurality of compounds and (3) the plurality of target proteins under conditions creating a plurality of bound compoundprotein binding pairs, for each bound compound-protein binding pair, (A) the bridging polynucleotide hybridizes to the first hybridization site of the mRNA attached to the target protein, (B) the bridging polynucleotide is attached to the dsDNA attached to the compound, (C) reverse transcribing the mRNA to cDNA using the single stranded bridging polynucleotide hybridized to the mRNA as a primer thereby creating a first strand DNA sequence including the unique barcode sequence for the target protein and the unique barcode sequence for the compound, (D) synthesizing a second strand complementary to the first strand DNA sequence resulting in a dsDNA construct comprising the unique barcode sequence for the compound and the unique barcode sequence for the target protein, (E) sequencing the dsDNA construct to identify the unique barcode sequence for the target protein and the unique barcode sequence for the compound so as to identify the target protein and the compound bound thereto. According to one aspect, each target protein of the plurality having a mRNA attached thereto is created by (A) transcribing a DNA construct comprising (1) a transcriptional start site, (2) a universal primer binding site which may be the transcriptional start site, (3) a barcode unique to a target protein, (4) a first hybridization site, (5) an internal ribosomal entry site, (6) a nucleic acid encoding the target protein, and (7) a nucleic acid encoding a peptide tag into mRNA, (B) translating a portion of the mRNA downstream of the first hybridization site including the nucleic acid encoding the target protein into the target protein resulting in the target protein being attached to the mRNA including (1) the transcriptional start site, (2) the universal primer binding site which may be the transcriptional start site, (3) the barcode unique to the target protein, and (4) the first hybridization site. According to one aspect, each target protein of the plurality having a mRNA attached thereto is created by (A) transcribing a DNA construct comprising (1) a transcriptional start site, (2) a universal primer binding site which may be the transcriptional start site, (3) a barcode unique to a target protein, (4) a first hybridization site, (5) an internal ribosomal entry site, (6) a nucleic acid encoding the target protein, and (7) a nucleic acid encoding a peptide tag into mRNA, (B) translating the mRNA into the target protein and linking the mRNA having puromycin fused thereto to the target protein. According to one aspect, each target protein of the plurality having a mRNA attached thereto is created by (A) transcribing a DNA construct comprising (1) a transcriptional start site, (2) a universal primer binding site which may be the transcriptional start site, (3) a barcode unique to a target protein, (4) a first hybridization site, (5) an internal ribosomal entry site, (6) a nucleic acid encoding the target protein, and (7) a nucleic acid encoding a peptide tag into mRNA, (B) reverse transcribing the mRNA using reverse transcription primers that bind to the 3’ end of the mRNA. According to one aspect, binding of a compound and a target protein is stabilized to facilitate hybridization and attachment of the bridging polynucleotide. According to one aspect, (1) the bridging polynucleotide, (2) the plurality of compounds and (3) the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes (1) bridging polynucleotide, (2) a compound of the plurality, and (3) a target protein of the plurality under conditions creating a bound compound-protein with bridging polynucleotide hybridized to the mRNA attached to the protein, which is then subject to ligation to the dsDNA attached to the compound and reverse transcription creating the first strand DNA sequence. According to one aspect, the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein. According to one aspect, a protein is attached to the dsDNA of the compound and the protein is covalently attached to the target protein. According to one aspect, a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound.
The present disclosure provides a method for determining interactions between a plurality of compounds and a plurality of target proteins, wherein each compound of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the compound, wherein each target protein of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the target protein, the method includes combining (1) the plurality of compounds and (2) the plurality of target proteins under conditions creating a plurality of bound compound-target protein binding pairs, for each compound-target protein binding pair, attaching the dsDNA attached to the compound to the dsDNA attached to the target protein to create a dsDNA construct comprising the unique barcode sequence for the target protein and the unique barcode sequence of the compound, sequencing the dsDNA construct to identify the unique barcode sequence for the target protein and the unique barcode sequence for the compound so as to identify the target protein and the compound bound thereto. According to one aspect, (1) the plurality of compounds and (2) the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes (1) a compound of the plurality, and (2) a target protein of the plurality under conditions creating a bound compound-protein binding pair, wherein the dsDNA attached to the compound is ligated to the dsDNA attached to the protein to create a dsDNA construct comprising the unique barcode sequence for the target protein and the unique barcode sequence of the compound. According to one aspect, the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein. According to one aspect, a protein is attached to the dsDNA of the compound and the protein is covalently attached to the target protein to promote binding of the compound to the target protein. According to one aspect, a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound to promote binding of the compound to the target protein. According to one aspect, binding of a compound and a target protein is stabilized to facilitate hybridization and attachment of the bridging polynucleotide. According to one aspect, the plurality of compounds and the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes a compound of the plurality and a target protein of the plurality under conditions creating a bound compoundprotein, to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein. According to one aspect, the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein. According to one aspect, a protein is attached to the dsDNA of the compound and the target protein is covalently attached to the target protein to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein. According to one aspect, a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
OTHER EMBODIMENTS
Other embodiments will be evident to those of skill in the art. It should be understood that the foregoing description is provided for clarity only and is merely exemplary. The spirit and scope of the present invention are not limited to the above examples, but are encompassed by the following claims. All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference.

Claims

Claims:
1. A method for determining interactions between a plurality of compounds and a plurality of target proteins, wherein each compound of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the compound, wherein each target protein of the plurality has a mRNA attached thereto, wherein the mRNA comprises (i) a first hybridization site, (ii) a barcode unique to the target protein, (iii) a universal PCR primer binding site and (iv) a transcriptional start site, the method comprising combining (1) a bridging polynucleotide, (2) the plurality of compounds and (3) the plurality of target proteins under conditions creating a plurality of bound compound-protein binding pairs, for each bound compound-protein binding pair,
(A) the bridging polynucleotide hybridizes to the first hybridization site of the mRNA attached to the target protein,
(B) the bridging polynucleotide is attached to the dsDNA attached to the compound,
(C) reverse transcribing the mRNA to cDNA using the single stranded bridging polynucleotide hybridized to the mRNA as a primer thereby creating a first strand DNA sequence including the unique barcode sequence for the target protein and the unique barcode sequence for the compound,
(D) synthesizing a second strand complementary to the first strand DNA sequence resulting in a dsDNA construct comprising the unique barcode sequence for the compound and the unique barcode sequence for the target protein, (E) sequencing the dsDNA construct to identify the unique barcode sequence for the target protein and the unique barcode sequence for the compound so as to identify the target protein and the compound bound thereto.
2. The method of claim 1 wherein each target protein of the plurality having a mRNA attached thereto is created by (A) transcribing a DNA construct comprising (1) a transcriptional start site, (2) a universal primer binding site which may be the transcriptional start site, (3) a barcode unique to a target protein, (4) a first hybridization site, (5) an internal ribosomal entry site, (6) a nucleic acid encoding the target protein, and (7) a nucleic acid encoding a peptide tag into mRNA,
(B) translating a portion of the mRNA downstream of the first hybridization site including the nucleic acid encoding the target protein into the target protein resulting in the target protein being attached to the mRNA including (1) the transcriptional start site, (2) the universal primer binding site which may be the transcriptional start site, (3) the barcode unique to the target protein, and (4) the first hybridization site.
3. The method of claim 1 wherein each target protein of the plurality having a mRNA attached thereto is created by
(A) transcribing a DNA construct comprising (1) a transcriptional start site, (2) a universal primer binding site which may be the transcriptional start site, (3) a barcode unique to a target protein, (4) a first hybridization site, (5) an internal ribosomal entry site, (6) a nucleic acid encoding the target protein, and (7) a nucleic acid encoding a peptide tag into mRNA, (B) translating the mRNA into the target protein and linking the mRNA having puromycin fused thereto to the target protein.
4. The method of claim 1 wherein each target protein of the plurality having a mRNA attached thereto is created by
(A) transcribing a DNA construct comprising (1) a transcriptional start site, (2) a universal primer binding site which may be the transcriptional start site, (3) a barcode unique to a target protein, (4) a first hybridization site, (5) an internal ribosomal entry site, (6) a nucleic acid encoding the target protein, and (7) a nucleic acid encoding a peptide tag into mRNA,
(B) reverse transcribing the mRNA using reverse transcription primers that bind to the 3’ end of the mRNA.
5. The method of claim 1 wherein binding of a compound and a target protein is stabilized to facilitate hybridization and attachment of the bridging polynucleotide.
6. The method of claim 1 wherein (1) the bridging polynucleotide, (2) the plurality of compounds and (3) the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes (1) bridging polynucleotide, (2) a compound of the plurality, and (3) a target protein of the plurality under conditions creating a bound compound-protein with bridging polynucleotide hybridized to the mRNA attached to the protein, which is then subject to ligation to the dsDNA attached to the compound and reverse transcription creating the first strand DNA sequence.
7. The method of claim 1 wherein the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein.
8. The method of claim 1 wherein a protein is attached to the dsDNA of the compound and the protein is covalently attached to the target protein.
9. The method of claim 1 wherein a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound.
10. A method for determining interactions between a plurality of compounds and a plurality of target proteins, wherein each compound of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the compound, wherein each target protein of the plurality has a dsDNA attached thereto wherein the dsDNA comprises a barcode unique to the target protein, the method comprising combining (1) the plurality of compounds and (2) the plurality of target proteins under conditions creating a plurality of bound compound-target protein binding pairs, for each compound-target protein binding pair, attaching the dsDNA attached to the compound to the dsDNA attached to the target protein to create a dsDNA construct comprising the unique barcode sequence for the target protein and the unique barcode sequence of the compound, sequencing the dsDNA construct to identify the unique barcode sequence for the target protein and the unique barcode sequence for the compound so as to identify the target protein and the compound bound thereto.
11. The method of claim 10 wherein (1) the plurality of compounds and (2) the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes (1) a compound of the plurality, and (2) a target protein of the plurality under conditions creating a bound compound-protein binding pair, wherein the dsDNA attached to the compound is ligated to the dsDNA attached to the protein to create a dsDNA construct comprising the unique barcode sequence for the target protein and the unique barcode sequence of the compound.
12. The method of claim 10 wherein the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein.
13. The method of claim 10 wherein a protein is attached to the dsDNA of the compound and the protein is covalently attached to the target protein to promote binding of the compound to the target protein.
14. The method of claim 10 wherein a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound to promote binding of the compound to the target protein.
15. The method of claim 10 wherein binding of a compound and a target protein is stabilized to facilitate hybridization and attachment of the bridging polynucleotide.
16. The method of claim 10 wherein the plurality of compounds and the plurality of target proteins are combined under conditions creating an emulsion having a plurality of emulsion droplets, wherein each emulsion droplet of the plurality includes a compound of the plurality and a target protein of the plurality under conditions creating a bound compoundprotein, to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
17. The method of claim 10 wherein the dsDNA of the compound is crosslinked to the target protein to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
18. The method of claim 1 wherein a protein is attached to the dsDNA of the compound and the target protein is covalently attached to the target protein to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
19. The method of claim 1 wherein a DNA intercalator is bound to the target protein via a moiety and the intercalator intercalates the dsDNA of the compound to promote binding of the compound to the target protein and to facilitate attachment of the dsDNA attached to the compound to the dsDNA attached to the protein.
PCT/US2022/079382 2021-11-12 2022-11-07 High-throughput drug discovery methods WO2023086767A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163278651P 2021-11-12 2021-11-12
US63/278,651 2021-11-12

Publications (1)

Publication Number Publication Date
WO2023086767A1 true WO2023086767A1 (en) 2023-05-19

Family

ID=86336783

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/079382 WO2023086767A1 (en) 2021-11-12 2022-11-07 High-throughput drug discovery methods

Country Status (1)

Country Link
WO (1) WO2023086767A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140018257A1 (en) * 2010-12-03 2014-01-16 The University Of Tokyo Peptide Library Production Method, Peptide Library, and Screening Method
US20210254047A1 (en) * 2018-09-04 2021-08-19 Encodia, Inc. Proximity interaction analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140018257A1 (en) * 2010-12-03 2014-01-16 The University Of Tokyo Peptide Library Production Method, Peptide Library, and Screening Method
US20210254047A1 (en) * 2018-09-04 2021-08-19 Encodia, Inc. Proximity interaction analysis

Similar Documents

Publication Publication Date Title
EP3377625B1 (en) Method for controlled dna fragmentation
US11965209B2 (en) Method for obtaining structural information concerning an encoded molecule and method for selecting compounds
US10308978B2 (en) Transposon nucleic acids comprising a calibration sequence for DNA sequencing
US20200102598A1 (en) Multiplex End-Tagging Amplification of Nucleic Acids
EP2807292B1 (en) Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
US20130281308A1 (en) Methods for sorting nucleic acids and preparative in vitro cloning
EP2545183B1 (en) Production of single-stranded circular nucleic acid
JP7058839B2 (en) Cell-free protein expression using rolling circle amplification products
EP2882870A1 (en) High sensitivity mutation detection using sequence tags
US20220090161A1 (en) Devices and methods for producing nucleic acids and proteins
US20160362680A1 (en) Compositions and methods for negative selection of non-desired nucleic acid sequences
WO2018005720A1 (en) Method of determining the molecular binding between libraries of molecules
CN108350492A (en) The nucleic acid cyclisation and amplification of ligase auxiliary
JP2024028958A (en) Composition and method for orderly and continuous synthesis of complementary DNA (cDNA) from multiple discontinuous templates
US20180030435A1 (en) Multiplex characterization of microbial traits using dual barcoded nucleic acid fragment expression library
WO2023086767A1 (en) High-throughput drug discovery methods
JP2023507876A (en) Detection and analysis of methylation in mammalian DNA
US11136576B2 (en) Method for controlled DNA fragmentation
WO2021058145A1 (en) Phage t7 promoters for boosting in vitro transcription
CN112041461A (en) Methods for attaching adaptors to single-stranded regions of double-stranded polynucleotides
US20230002758A1 (en) Tethered ribosomes and methods of making and using thereof
JP2009125001A (en) OLIGO(dT) PRIMER, cDNA LIBRARY-FORMING KIT AND METHOD FOR FORMING cDNA LIBRARY
JP2000184887A (en) Preparation of labeled dna

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22893777

Country of ref document: EP

Kind code of ref document: A1