WO1999006839A1 - Procede d'identification et de developpement de chefs de file de medicaments - Google Patents

Procede d'identification et de developpement de chefs de file de medicaments Download PDF

Info

Publication number
WO1999006839A1
WO1999006839A1 PCT/US1998/015943 US9815943W WO9906839A1 WO 1999006839 A1 WO1999006839 A1 WO 1999006839A1 US 9815943 W US9815943 W US 9815943W WO 9906839 A1 WO9906839 A1 WO 9906839A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
proteins
descriptors
interest
reactivity
Prior art date
Application number
PCT/US1998/015943
Other languages
English (en)
Inventor
H. Holden Thorp
Original Assignee
Novalon Pharmaceutical Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novalon Pharmaceutical Corporation filed Critical Novalon Pharmaceutical Corporation
Priority to EP98938201A priority Critical patent/EP1002235A1/fr
Priority to AU86781/98A priority patent/AU8678198A/en
Priority to CA002298629A priority patent/CA2298629A1/fr
Publication of WO1999006839A1 publication Critical patent/WO1999006839A1/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/94Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving narcotics or drugs or pharmaceuticals, neurotransmitters or associated receptors
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids

Definitions

  • This invention relates to an improvement in the art of using combinatorial chemistry to develop drug leads.
  • Combinatorial chemistry- permits the rapid and relatively inexpensive synthesis of large numbers of compounds in the small quantities suitable for automated assays directed at molecular targets.
  • Numerous small companies and academic laboratories have successfully engineered combinatorial chemical libraries with a significant range of diversity (reviewed in Doyle, 1995, Gordon et al , 1994a, Gordon et al , 1994b) .
  • Combinatorial Libraries In a combinatorial library, chemical building blocks are randomly combined into a large number (as high as 10E15) of different compounds, which are then simultaneously screened for binding (or other) activity against one or more targets . Libraries of thousands, even millions, of random oligopeptides have been prepared by chemical synthesis (Houghten et al . , Nature, 354:84-6(1991)), or gene expression (Marks et al . , J Mol Biol, 222:581-97(1991)), displayed on chromatographic supports (Lam et al . , Nature, 354:82-4(1991)), inside bacterial cells (Colas et al .
  • the first combinatorial libraries were composed of peptides or proteins, in which all or selected amino acid positions were randomized. Peptides and proteins can exhibit high and specific binding activity, and can act as catalysts. In consequence, they are of great importance in biological systems. Unfortunately, peptides per se have limited utility for use as therapeutic entities. They are costly to synthesize, unstable in the presence of proteases and in general do not transit cellular membranes. Other classes of compounds have better properties for drug candidates. Nucleic acids have also been used in combinatorial libraries. Their great advantage is the ease with which a nucleic acid with appropriate binding activity can be amplified. As a result, combinatorial libraries composed of nucleic acids can be of low redundancy and hence, of high diversity.
  • oligonucleotides are not suitable as drugs for several reasons.
  • the oligonucleotides have high molecular weights and cannot be synthesized conveniently in large quantities.
  • deoxy- and ribo-nucleotides are hydrolytically digested by nucleases that occur in all living systems and are therefore usually decomposed before reaching the target .
  • Structure descriptors can be based on a variety of structural features . These approaches provide arrays of molecular descriptors that can be used to assess the similarity of molecules in a library.
  • the "activity" database (A) contains the activities against 60 cell lines for 60,000 compounds that have been screened at NCI.
  • the similarity in the activity profile against the panel of cell lines can then be calculated for any two compounds, and is generally assessed by a pairwise correlation coefficient (PCC) , which is determined by an algorithm called COMPARE, which calculates the similarity of all of the compounds in the database to a user-supplied "seed" compound.
  • PCC pairwise correlation coefficient
  • the "target” database (T) has been created for 100 proteins (targets) whose level of expression was determined in the same 60 cell lines. These expression levels were assessed by standard biological techniques that determine either the quantity of expressed protein (e.g., by Western blots or immunocytochemistry) or the quantity of messenger RNA (e.g., by quantitative PCR or Northern blots) for each protein in each cell line. Relation of the A and T databases then provides information on the molecular pharmacology of the compounds in A; inhibition of one of the heavily expressed proteins emerges as a possible mechanism for the activity of the compound.
  • a "structure" database (S) contains structural descriptors for a library of 460,000 compounds that includes the compounds in A. Similarities between the structural descriptors can be calculated for all of the compounds in S, so for a given active compound in A, unscreened, but structurally similar, compounds can be identified in S. These unscreened compounds have an increased likelihood of being active in the cell lines for which the screened compounds are active. The latter process therefore provides a means for "lead optimization" after a compound with a given biological activity has been identified.
  • the NCI approach in defining the target database (T) is significantly different from that described here in that it relies solely on biological activity assays.
  • Kauvar, et al . , Chemistry & Biology, 2: 107-118 (1995) "fingerprinted” over 5,000 compounds by the binding potency (concentration needed to inhibit 50% of the protein's activity) of each compound to each member of a reference panel of eight proteins. (These proteins were selected on the basis of readily assayable activity, broad cross-reactivity with small organic molecules, and low correlation between each other in binding patterns.)
  • a screening library of 54 compounds was then selected based on the diversity in their "fingerprints" (inhibitory activity against the reference panel proteins) .
  • This "training set” was used to evaluate the similarity of the ligand binding characteristics of a new protein to one of the reference panel proteins.
  • a computational surrogate (a weighted sum of two or more reference panel proteins) for the new protein is determined.
  • the activity of all fingerprinted compounds to inhibit the activity of the new protein is predicted as the sum of their appropriately weighted inhibitory activities against the component reference proteins of the computational surrogate. Predictions may be improved by testing additional sets of compounds against the new protein. See also L. M. Kauvar, H. 0. Villar. Method to identify binding partners. US Patent 5587293.
  • proteins are fingerprinted on the basis of their chemical reactivity, in the presence or absence of a binding partner, rather than on the basis of their biological activity. Therefore, we do not need to identify a large number of diverse affinity molecules for each reference protein.
  • proteins are fingerprinted on the basis of their affinity for peptides or nucleic acids in a high-diversity library.
  • This library provides a far greater range of conformational variation than is provided by Kauvar' s training set .
  • Biomolecule reactivi ty Chemical reactions that modify nucleotides have been extremely successful in probing the structures of complex DNA and RNA molecules ( 16, 17) .
  • a reagent that oxidizes nucleic acids by a particular reaction pathway is used to create backbone lesions in the polyanion. The sites of modification are then determined by high-resolution gel electrophoresis.
  • a diplatinum compound, PtPop has been used to footprint DNA. Breiner, K. M. , M. A. Daugherty, T. G. Oas and H. H.
  • Levine et al oxidized glutamine synthetase by a metal-catalyzed system, and found that a significant number of methionine residues could be oxidized without an increase in proteolytic susceptibility.
  • Levine et al suggested engineering proteins to increase the number of surface methionines (for longer half-life) .
  • a prospective "query" protein is characterized by a "reactivity descriptor", by which it is related to previously studied proteins (called “library” or “reference”, or “database” proteins) for which both a "reactivity descriptor” and one or more drug leads are known.
  • a combinatorial chemical library enriched in or even limited to chemical compounds similar to the drug leads previously identified for the related proteins in the database is then synthesized and screened for binding or other activity against the "query protein” . This invention thereby reduces the number of chemical compounds that must be screened against an unknown protein target, and/or increases the likelihood of "hits".
  • the reactivity descriptors here contemplated relate to the reactivity of the target protein (the term "target protein” refers to both “query” and “reference” proteins, both being described by their chemical reactivity) , especially in both a ligand bound and in a free state, with one or more chemical reagents.
  • target protein refers to both "query” and “reference” proteins, both being described by their chemical reactivity
  • the difference in the two reactivities is characteristic of the part of the protein which is occluded or otherwise shielded as a result of the ligand binding.
  • the protected portion of the protein will include the actual ligand binding site.
  • the ligands used to define the binding site of the protein may be the natural ligands therefor, if known, or they may be "surrogate ligands".
  • a surrogate ligand is a molecule which binds the target protein, and has the potential of binding it at a site at which the target protein is bound by one of its natural ligands.
  • Surrogate ligands may be obtained by synthesizing and screening combinatorial libraries.
  • the library may be chosen from the point of view of obtaining the most structural diversity for the least synthetic effort, ignoring the suitability of the library members as drugs. For this reason, a preferred surrogate ligand library is a peptide ("BioKey") or oligonucleotide library.
  • a query protein is related to reference proteins by its ability to bind similar surrogate ligands.
  • a combinatorial oligomeric library of surrogate ligands typically peptides or nucleic acids, is screened, and the oligomers which bind the target protein (and thus are called "aptamers") are characterized to yield "aptamer" or descriptors.
  • the aptamer descriptors sequences and, possibly, additional information such as contact points and secondary structure
  • identified for the query protein are compared to those identified for the database proteins, and the drug leads previously identified for the database proteins characterized by the most similar surrogate ligands are favored .
  • the aptamer are nucleic acids
  • the bases involved in protein binding are determined by footprinting techniques.
  • the epitope of the nucleic acid may be characterized as a sequence whose elements are unpaired G, A, T, or C, or any of the sixteen possible pairings (matched or mismatched) of those four bases. This sequence, for each surrogate ligand binding a reference protein, may be compared to that of the epitope of each aptamer binding the query protein.
  • characterization of the proteins by reactivity descriptors be combined with its characterization by aptamer descriptors (surrogate ligand binding) .
  • the work involved may be reduced by first screening a potential surrogate ligand library to obtain "aptamer descriptors" for the target protein, and then using the bound molecules (aptamers) from that library to modulate the chemical reactivity of the protein and thereby help characterize its binding site(s) by means of chemical reactivity descriptors.
  • a potential surrogate ligand library to obtain "aptamer descriptors" for the target protein
  • the bound molecules (aptamers) from that library to modulate the chemical reactivity of the protein and thereby help characterize its binding site(s) by means of chemical reactivity descriptors.
  • our preference is to use peptides to alter reactivity and nucleic acids to generate aptamer descriptors.
  • the similarity of the query protein to each of the reference proteins is determined.
  • one or more drug leads are known, so these drug leads may be rated or ranked, as drug leads for modulators of the query protein, based on the similarity of their reference protein to the query protein, and, optionally, their own drug characteristics (e.g., potency, half-life, side effects).
  • a combinatorial library is then synthesized which is enriched for members which are structurally similar to the aforementioned drug leads.
  • Structurally similar members may be identified in a formal manner by use of the chemical structure descriptors available in the art, or more informally through a chemist's expert judgment of structural similarity.
  • the chemist may also develop the drug leads without resorting to a combinatorial library, i.e., by synthesizing lead analogues on an individual, noncombinatorial basis.
  • the lead analogues are then screened for the ability to modulate the target protein's activity, in vitro or in vivo.
  • Successful analogues are added to the database as leads associated with the target protein, which now becomes a reference protein.
  • the initial lead discovery occurs through querying a database of reference proteins and their associated descriptors and drugs.
  • the combinatorial library is screened primarily for purpose of optimization of these leads, although it is likely to be sufficiently different in structure from the lead so that there will be some secondary lead discovery as well .
  • Figure 1 Cartoon showing the reactivity of the transition- metal reagents toward nucleotides.
  • Ru (typ) (bpy) 0 2+ (RuO) reacts by hydrogen abstraction at 1' and oxygen transfer at guanine C8.
  • Pt 2 (pop) 4 4" reacts by hydrogen abstraction at 4' and 5' by outer-sphere electron transfer at guanine.
  • Ru (bpy) 3 3 ⁇ reacts only by outer-sphere electron transfer.
  • the reactivity of the complexes towards amino acids should therefore be different for all three reagents as well .
  • Figure 3 Scheme showing the solvent accessibility of reactive amino acid residues (*) as a function of folding and binding of the BioKey peptide.
  • Figure 4 Scheme showing the assay used to determine the relative rates for modification of a protein by a reagent. The presence of the protein decreases the yield of Form II DNA in a manner related to the rate constant for oxidation of the protein by the reagent.
  • FIG. 5A shows the secodary structure of an RNA sequence, with protein contact sites marked with an arrow.
  • Fi ⁇ . 5B is a two-dimensional grid representation of the same information.
  • the present invention is directed to a method for the more efficient identification of small organic molecules, preferably molecules having a molecular weight of less than 500 daltons, which are pharmaceutically acceptable and which are potent modulators of the biological activity of a protein.
  • a pharmacologically active substance elicits a physiological response by interacting with a specialized portion, known as a receptor, of the target cell.
  • the substances which are able to elicit that response, by specific interaction with a receptor site are known as agonists .
  • a substance able to elicit the maximum response is known as a full agonist, and one which elicits only, at most, a lesser (but discernible) response is a partial agonist.
  • a pharmacological antagonist is a compound which interacts with the receptor but without eliciting a response. By doing so, it inhibits the receptor from responding to agonists.
  • a competitive antagonist is one whose effect can be overcome by increasing the agonist concentration; a noncompetitive antagonist is one whose action is unaffected by agonist concentration .
  • Ligands are substances which bind to receptors, and thereby encompass both agonists and pharmacological antagonists. Ligands which activate (agonize) or inhibit (antagonize) the receptor are here termed modulators .
  • a "physiological" antagonist could be a substance which directly or indirectly inhibits the production or release of the natural agonist, or directly or indirectly facilitates its elimination from the receptor site.
  • a physiological antagonist of one receptor may be a pharmacological agonist of another receptor, such as one which activates an enzyme which degrades the natural ligand of the first receptor. If a disease state is the result of inappropriate activation of a receptor, the disease may be prevented or treated by means of a physiological or pharmacological antagonist. Other disease states may arise through inadequate activation of a receptor, in which case the disease may be prevented by means of a suitable agonist.
  • receptors proteins embedded in the phospholipid bilayer of cell membranes.
  • the binding of an agonist to the receptor can cause an allosteric change at an intracellular site, altering the receptor's interaction with other biomolecules.
  • the physiological response is initiated by the interaction with this "second messenger” (the agonist is the "first messenger") or "effector" molecule.
  • the peptides and nucleic acids used in the present invention can act as agonists (binding to the receptor and causing its activation) , as antagonists (binding to the receptor without activating it, and blocking its activation by agonists) , or as pharmacologically neutral species (binding to the receptor without either activating or blocking it) .
  • Enzymes are special types of receptors. Receptors interact with agonists to form complexes which elicit a biological response. Ordinary receptors then release the agonist intact. With enzymes, the agonists are enzyme substrates, and the enzymes catalyze a chemical modification of the substrate. Enzymes are not necessarily integral membrane proteins; they may be secreted, or intracellular, proteins.
  • enzymes are activated by the action of a receptor's second messenger, or, more indirectly, by the product of an "upstream" enzymatic reaction.
  • drugs may also be useful because of their interaction with enzymes.
  • the drug may serve as a substrate for the enzyme, as a coenzyme, or as an enzyme inhibitor. (An irreversible inhibitor is an "inactivator” . )
  • Drugs may also cause, directly or indirectly, the conversion of a proenzyme into an enzymes. Many disease states are associated with inappropriately low or high activity of particular enzymes.
  • the present invention may be used to identify both agonists and antagonists of receptors. It is not unusual for a relatively small structural change to convert an agonist into a pharmacological antagonist, or vice versa. Therefore, even if the drugs known to interact with a reference protein are all agonists, the drugs in question may serve as leads to the identification of both agonists and antagonists of the reference protein and of related proteins. Similarly, known antagonists may serve as drug leads, not only to additional antagonists, but to agonists as well.
  • nucleic acid aptamers and BioKey peptides used in developing descriptors are selected only for their ability to bind the target protein; it is not required that they activate the protein, or inhibits its activation. Some will bind at sites at which agonism/antagonism can occur, others will not. Their purpose is to help characterize the target protein surface, not to serve as drug leads themselves (although the practitioner is free to test the nucleic acids and peptides for agonist/antagonist activity and to use the active ones as leads in the design of active analogues which are more suitable than nucleic acids and peptides per se as drugs) .
  • binding partners ligands
  • ligands proteins
  • binding partner of a protein it is relatively straightforward to study how the interaction of the binding protein and its binding partner affects biological activity.
  • inhibitors are likely to affect the biological activity of the protein, at least if they can be delivered m. vivo to the site of the interaction.
  • the binding protein is a receptor, and the binding partner an effector of the biological activity, then the inhibitor will antagonize the biological activity. If the binding partner is one which, through binding, blocks a biological activity, then an inhibitor of that interaction will, in effect, be an agonist of the biological activity in question.
  • binding sites are typically relatively small surface patches.
  • the binding characteristics of the protein may often be altered by local modifications at these sites, without denaturing the protein.
  • Electrostatic interactions include salt bridges, hydrogen bonds, and van der Waals forces.
  • hydrophobic interaction is actually the absence of hydrogen bonding between nonpolar groups and water, rather than a favorable interaction between the nonpolar groups themselves. Hydrophobic interactions are important in stabilizing the conformation of a protein and thus indirectly affect ligand binding, although hydrophobic residues are usually buried and thus not part of the binding site.
  • Peptides have been found to bind proteins at the same sites as those by which the proteins interact with other proteins, macromolecules and biologically significant substances e.g. nucleic acids, lipids and enzyme substrates.
  • the potency of an antagonist of a protein may be expressed as an IC50, the concentration of the antagonist which causes a 50% inhibition of a protein's binding or biological activity in an in vitro or in vivo assay system.
  • a pharmaceutically effective dosage of an antagonist depends on both the IC50 of the antagonist, and the effective concentrations of the protein and its clinically significant binding partner (s).
  • Potencies may be categorized as follows:
  • the antagonists identified by the present invention are in one of the four higher categories identified above, and are in any event more potent than any antagonist known for the protein in question at the time of filing of this application.
  • the potency of an agonist may be quantified as the dosage resulting in 50% of its maximal effect on a receptor.
  • drug lead refers to a compound which is a member of a structural class which is generally suitable, in terms of physical characteristics (e.g., solubility) , as a source of drugs, and which has at least some useful pharmacological activity, and which therefore could serve effectively as a starting point for the design of analogues and derivatives which are useful as drugs.
  • the "drug lead” may be a useful drug in its own right, or it may be a compound which is deficient as a drug because of inadequate potency or undesirable side effects. In the latter case, analogues and derivatives are sought which overcome these deficiencies. In the former case, one seeks to improve the already useful drug.
  • Such analogues and derivatives may be identified by rational drug design, or by screening of combinatorial or noncombinatorial libraries of analogues and derivatives.
  • a drug lead is a compound with a molecular weight of less than 1,000, more preferably, less than 750, still more preferably, less than 600, most preferably, less than 500.
  • it has a computed log octanol-water partition coefficient in the range of -4 to +14, more preferably, -2 to +7.5.
  • the target protein may be a naturally occurring protein, or a subunit or domain thereof, from any natural source, including a virus, a microorganism (including bacterial, fungi, algae, and protozoa) , an invertebrate (including insects and worms) , or the normal or cancerous cells of a vertebrate (especially a mammal, bird or fish and, among mammals, particularly humans, apes, monkeys, cows, pigs, goats, llamas, sheep, rats, mice, rabbits, guinea pigs, cats and dogs) .
  • the target protein may be a mutant of a natural protein. Mutations may be introduced to facilitate the labeling or immobilization of the target protein, or to alter its biological activity (An inhibitor of a mutant protein may be useful to selectively inhibit an undesired activity of the mutant protein and leave other activities substantially intact) .
  • the target protein may be, inter alia, a glyco-, lipo-, phospho-, or metalloprotein. It may be a nuclear, cytoplasmic, membrane, or secreted protein. It may, but need not, be an enzyme.
  • the known binding partners (if any) of the target protein may be, inter alia. other proteins, oligo- or polypeptides, nucleic acids, carbohydrates, lipids, or small organic or inorganic molecules or ions.
  • the biological activity or function of the target protein may be, but is not limited to, being a kinase protein kinase tyrosine kinase
  • protease endoprotease exoprotease metalloprotease serine endopeptidase cysteine endopeptidase nuclease Deoxyribonuclease ribonuclease endonulcease exonuclease polymerase DNA Dependent RNA polymerase DNA Dependent DNA polymerase telomerase primase
  • the binding protein may have more than one paratope and they may be the same or different. Different paratopes may interact with epitopes of different binding partners. An individual paratope may be specific to a particular binding partner, or it may interact with several different binding partners. A protein can bind a particular binding partner through several different binding sites. The binding sites may be continuous or discontinuous (vis-a-vis the primary sequence of the protein) .
  • the target (query or reference) protein may be any protein of interest.
  • the information is added to the database.
  • a particularly preferred initial target protein is glutathione-S-transferase (GST) , which is chosen because it has been crystallographically characterized with and without a large number of bound inhibitors (37) , the level of expression has been measured in NCI's 60 cell lines ( 13) , the activities of many known inhibitors are available (38) , and macroscopic quantities of peptides that bind to the active site are preparable .
  • GST glutathione-S-transferase
  • Proteins for which peptide ligands exist and for which expression data are available at NCI include ras ( 39) , src ( 40) , and p53 ( 41) , and other promising proteins for which both kinds of data are likely to be available in the future are the UL44 protein from cytomegalovirus, and hMDM2 protein that binds to p53 ( 41) . Crystallographically characterized targets are of particular interest and utility in the early stages.
  • library generally refers to a collection of chemical or biological entities which are related in origin, structure, and/or function, and which can be screened simultaneously for a property of interest .
  • combininatorial library refers to a library in which the individual members are either systematic or random combinations of a limited set of basic elements, the properties of each member being dependent on the choice and location of the elements incorporated into it. Typically, the members of the library are at least capable of being screened simultaneously. Randomization may be complete or partial; some positions may be randomized and others predetermined, and at random positions, the choices may be limited in a predetermined manner.
  • the members of a combinatorial library may be oligomers or polymers of some kind, in which the variation occurs through the choice of monomeric building block at one or more positions of the oligomer or polymer, and possibly in terms of the connecting linkage, or the length of the oligomer or polymer, too.
  • the members may be nonoligomeric molecules with a standard core structure, like the 1, 4-benzodiazepine structure, with the variation being introduced by the choice of substituents at particular variable sites on the core structure.
  • the members may be nonoligomeric molecules assembled like a jigsaw puzzle, but wherein each piece has both one or more variable moieties (contributing to library diversity) and one or more constant moieties (providing the functionalities for coupling the piece in question to other pieces) .
  • Composite combinatorial library is a mixture of two or more simple libraries, e.g., DNAs and peptides, or benzodiazepine and carbamates .
  • the number of component simple libraries in a composite library will, of course, normally be smaller than the average number of members in each simple library, as otherwise the advantage of a library over individual synthesis is small.
  • An oligonucleotide library is a combinatorial library, at least some of whose members are single-stranded oligonucleotides having three or more nucleotides connected by phosphodiester or analogous bonds.
  • the oligonucleotides may be linear, cyclic or branched, and may include non-nucleic acid moieties.
  • the nucleotides are not limited to the nucleotides normally found in DNA or RNA.
  • nucleotides modified to increase nuclease resistance and chemical stability of aptamers see Chart 1 in Osborne and Ellington, Chem. Rev., 97: 349-70 (1997).
  • For screening of RNA see Ellington and Szostak, Nature, 346: 818-22 (1990).
  • the libraries of the present invention are preferably composed of oligonucleotides having a length of 3 to 100 bases, more preferably 15 to 35 bases.
  • the oligonucleotides in a given library may be of the same or of different lengths.
  • Oligonucleotide libraries have the advantage that libraries of very high diversity (e.g., 10 1S ) are feasible, and binding molecules are readily amplified in vitro by polymerase chain reaction (PCR) . Moreover, nucleic acid molecules can have very high specificity and affinity to targets.
  • PCR polymerase chain reaction
  • this invention prepares and screens oligonucleotide libraries by the SELEX method, as described in King and Famulok, Molec . Biol. Repts . , 20: 97-107 (1994) ; L. Gold, C. Tuerk . Methods of producing nucleic acid ligands, US#5595877; Oliphant et al . Gene 44:177 (1986).
  • aptamer is conferred on those oligonucleotides which bind the target protein. Such aptamers may be used to characterize the target protein, both directly (through identification of the aptamer and the points of contact between the aptamer and the protein) and indirectly (by use of the aptamer as a ligand to modify the chemical reactivity of the protein) .
  • a peptide library is a combinatorial library, at least some of whose members are peptides having three or more amino acids connected via peptide bonds.
  • the peptides may be linear, branched, or cyclic, and may include nonpeptidyl moieties.
  • the amino acids are not limited to the naturally occurring amino acids.
  • a biased peptide library is one in which one or more (but not all) residues of the peptides are constant residues.
  • the individual members are referred to as peptide ligands (PL) .
  • an internal residue is constant, so that the peptide sequence may be written as
  • Xaa is either any naturally occurring amino acid, or any amino acid except cysteine
  • m and n are chosen independently from the range of 2 to 20
  • the Xaa may be the same or different
  • AA X is the same naturally occurring amino acid for all peptides in the library but may be any amino acid.
  • m and n are chosen independently from the range of 4 to 9.
  • AA X is located at or near the center of the peptide. More specifically, it is desirable that m and n are not different by more than 2; more preferably m and n are equal. Even if the chosen AA X is required (or at least permissive) of the target protein (TP) binding activity, one may need particular flanking residues to assure that it is properly positioned. If AA X is more or less centrally located, the library presents numerous alternative choices for the flanking residues. If AA X is at an end, this flexibility is diminished.
  • the most preferred libraries are those in which AA ! is tryptophan, proline or tyrosine.
  • Second most preferred are those in which AA X is phenylalanine, histidine, arginine, aspartate, leucine or isoleucine.
  • Third most preferred are those in which AA X is asparagine, serine, alanine or methionine.
  • the least preferred choices are cysteine and glycine. These preferences are based on evaluation of the results of screening random peptide libraries for binding to many different TPs .
  • Ligands that bind to functional domains tend to have both constant as well as unique features. Therefore, by using “biased” peptide libraries, one can ease the burden of finding ligands. Either “biased” or “unbiased” libraries may be screened to identify "BioKey” peptides for use in developing reactivity descriptors, and, optionally, peptide aptamer descriptors and additional drug leads.
  • a “descriptor” (also known as a parameter, character, variable, or variate) is a numerically expressed characteristic of a compound (which may be a protein, or a protein ligand) , which helps to distinguish that compound from others.
  • a descriptor value need not be absolutely specific to a compound to be useful.
  • the characteristics may be pure structural characteristics (as in a "structural descriptor") or they may refer to the compound's interaction with other compounds, such as a binding interaction (as in an "aptamer descriptor") or a chemical reaction (as in a "reactivity descriptor”) .
  • “Paired Descriptors” are descriptors of the same property as measured in two different molecule.
  • a “descriptor array”, “list”, or “set” is an array, list or set whose elements are different descriptors for the same molecule.
  • a plurality of comparable descriptors for two compounds may be used to calculate a similarity for the two compounds.
  • the descriptors used in making this calculation in the present invention, for two proteins, will include (a) at least one chemical reactivity descriptor, and/or (b) at least one peptide or oligonucleotide affinity descriptor, and preferably both.
  • the similarity calculation may optionally consider other descriptors, such as structural descriptors of known ligands, as well.
  • a set of n-descriptors defines an n-dimensional descriptor space; each compound for which a descriptor set is available may be said to occupy a point in descriptor space.
  • the dissimilarity of two compounds may be expressed as a distance between the two points which they occupy in descriptor space.
  • a similarity measure or coefficient quantifies the relationship between two individuals (compounds) , given the values of a set of variates (descriptors) common to both. Similarity coefficients are usually defined to take values in the range of 0 to 1.
  • One commonly used ' measure of similarity is the product moment correlation coefficient. Its correlation is unity whenever two profiles are parallel, regardless of how far apart they are in level. Two profiles may have correlation of +1 even if they are not parallel, provided that the two sets of scores are linearly related.
  • the Jaccard or Sneath coefficient modifies the simple matching coefficient by ignoring bits which in both i and j_ are zero, i.e., by ignoring negative matches (mutual absences). In other words, it is obtained by dividing the number of bits which are set in both descriptor bit strings, and dividing by the total number of bits set in either descriptor string. It is also called the unweighted Tanimoto coefficient.
  • Gower has defined a general similarity coef f icient which can be used for binary, qualitative , and quantitative data :
  • Descriptors may be quantitative or qualitative. Quantitative descriptors may be integers or real numbers. Qualitative descriptors divide the data into categories which may be, but need not be, expressible as having relative magnitudes. Binary descriptors are a special case of qualitative descriptors, in which there are just two categories, typically representing the presence or absence of a feature. Qualitative data for which the variates have several levels may be treated like binary data with each level of a variate being regarded as a single binary variable (i.e., an eight level variate expressed as eight bits) . Or the levels may be numbered sequentially (i.e., an eight level variable expressed as three bits) . The reactivity descriptors are preferably quantitative in form.
  • aptamer descriptors are expressed as nucleic acid sequences, with or without secondary structure and protein contact information, they are qualitative in form. If they are expressed as 2D fingerprints, they are a string of binary data. Gower's coefficient, for qualitative data, only credits exact matches of the variate. For aptamers, it is more useful to evaluate the similarity of the sequences by a BLAST type analysis rather than to simply state whether aptamers are the same or different.
  • the greater the distance the less the similarity.
  • Descriptors may be weighted (or otherwise transformed) for any of several reasons, including:
  • the raw descriptor values may be, but need not be, transformed prior to use in calculating distances.
  • Typical transformations are (a) presence (l)/absence (0), (b) ln(x+l), (c) frequency in sample, (d) root, and (e) relative range, i.e., (value-min) / (max-min) .
  • Descriptor weights may be adjusted empirically on the basis of specially designed test sets.
  • a training set of proteins is identified. Descriptors are evaluated for each protein in the set.
  • a training set of compounds including are also tested against each compound in the set. These compounds are chosen so that, for any protein in the set, there is at least one compound which is an agonist or antagonist for it.
  • a neural net with the descriptor weights as inputs, is used to predict the activity of each compound against each protein, using the calculated protein similarities. For example, it will calculate the similarity of protein x to all other proteins, then treat the activities of the compounds against the other proteins as "knowns" and use it to predict the activity of the compounds against protein x. This is done repeatedly, with each protein taking on the role of protein x, in turn.
  • the training set proteins may be chosen for high diversity, e.g., insignificant sequence similarities and/or unrelated biological activities. If the plan is to use the library for lead optimization, then the training set proteins might be members of a family of homologous proteins.
  • the coefficient of variation may be useful in comparing descriptors; it is the standard deviation divided by the mean. If there is no information available about the ultimate significance of a descriptor, one may give a greater weight to descriptors which have a larger CV and hence a more uniform distribution.
  • Standard mathematical methods such as cluster analysis, principal components analysis, or partial least squares analysis, may be used to determine which descriptors are strongly correlated and to replace them with a new descriptor which is a weighted sum of the original correlated descriptors.
  • One may alternatively choose (perhaps randomly) one of each pair of highly correlated descriptors and simply prune it, thereby reducing the amount of data which must be collected.
  • Distances may be calculated on the basis of any of a variety of distance measures known in the statistical arts.
  • the "cosine theta” distance is the cosine of the angle between the vector from the origin to point X lk and the vector from the origin to point X_ .
  • the Mahalanobis distance allows for correlations between variables; if the variables are uncorrelated, D 2 is equivalent to Euclidean distance measured using standard variables.
  • the Calhoun distance uses only rank orders; for molecules i and i, the distance is the proportion of the entire set (excluding i and j.) that have descriptor states intermediate between that for i and that for j_ for one or more of the descriptors k.
  • a distance measure may be transformed into a similarity measure by any of a variety of transformations that convert a non-negative number to the range 0..1, e.g.,
  • S 13 l- (d 13 /d amax )
  • s 13 the fraction of the pairs for which the distance is greater than or equal to d ⁇ . This is a measure of relative similarity.
  • the diversity of a set of compounds, as measured by a set of descriptors, may be calculated in several ways.
  • a purely geometric method involves assuming that each compound sweeps out a hypersphere in descriptor space, the hypersphere having a radius known as the similarity radius.
  • the total hypervolume in descriptor space of points within a unit similarily radius of one or more of the compounds is calculated. This is compared to the hypervolume achievable if none of hypersphere ' s overlap; i.e., to n * volume of a single hypersphere, where n is the number of compounds in the set.
  • the swept hypervolume may be determined exactly, or by Monte Carlo methods.
  • the ratio of the swept hypervolume to the maximum hypervolume is a measure of compound set diversity, ranging from 1 (maximum) to 1/n (minimum) .
  • Another approach is to calculate all of the pairwise distances between compounds in descriptor space.
  • the mean distance is a measure of diversity. If desired, this can be scaled by calculating the ratio of the mean distance to the maximum theoretical distance.
  • a third approach is to apply cluster analysis to the set of compounds .
  • the method used should be one which does not set the number of clusters arbitrarily, but rather decides the number based on some goodness-of-fit criterion.
  • the resulting number of cluster is a measure of diversity, as is the ratio of the number of clusters to the number of compounds .
  • the sum of H(k) for all k is a measure of overall diversity. Standard techniques may be used to correct for correlation.
  • Binding sites recognize ligands based on complementarity of both shape and functionality.
  • the functionality in the binding site is an array of recognizable groups (hydrophobic, hydrogen bond donors, hydrogen bond acceptors, IT stacking) that is complementary to the ligand ( 9) .
  • Many of these functional groups are reactive towards common chemical modification reagents, such as hydroxyl radical, Mn0 4 " , NaBH 4 , and dimethyl sulfate ( 10) , and towards "designer” reagents usually based on transition-metal complexes ( 11 , 12) .
  • reagents can react with functional groups via a wide range of pathways, including one-electron oxidation, oxygen-atom transfer, hydrogen-atom abstraction, hydrogenation, one-electron reduction, hydrolysis, and alkylation.
  • the array of functional groups in a binding site will therefore exhibit a unique array of rates of reactivity to each of these reagents. This array of rates will provide a set of descriptors for the chemical functionality of the binding site. Therefore, the set of descriptors can be obtained by measuring the relative rates of the functional groups in the binding site against a panel of chemical modification reagents.
  • the reactivity descriptors are binding site specific.
  • the structure of lysozyme shows that there are two very important tryptophan residues in the binding site, so it will be important that the measured rate is for those and not for the other trp residues in the protein.
  • the binding site trp residues are by far the most accessible and will probably dominate the measured reactivity. The easiest way to assess this point will be to measure the reaction rates with and without a ligand (such as a peptide BioKey) that blocks the binding site. The difference in rates is then the reactivity of the binding site.
  • the chemical modification reagent can be covalently attached to the ligand and delivered directly to the binding site.
  • the presently preferred reagents are transition-metal complexes -- e.g., Pt 2 (pop) 4 4" , Ru(tpy) (bpy)0 + , and Ru (bpy) 3 3+ -- whose rate constants can be measured by optical spectroscopy or Stern-Volmer quenching.
  • the oxoruthenium (IV) system has been very useful in discerning the important kinetic and thermodynamic factors in DNA oxidation, because the precise quantity of oxidant is known and the fate of all oxidizing equivalents can be quantitated.
  • the reactivity pathways available to the oxoruthenium (IV) and Pt 2 (pop) 4 4 ⁇ complexes include inner-sphere reactions where there is significant bond- breaking or bond-making in the transition state that involves the metal complex. These are the reaction pathways described above that lead to hydrogen abstraction from nucleotides. Outer-sphere pathways are ones where only an electron is transferred by tunneling from one reactant to another. The advantage of these pathways is that there is a spherically symmetric distance dependence to the reaction probability while inner-sphere pathways generally require a specific approach of the reagent on the reactant. This difference will be important here in defining a diverse set of reaction pathways of the reagents with the protein targets.
  • Reactivi ty prof iles The amino acid "reactivity profiles" will be determined for each of the proposed reagents.
  • the rate constants for the reagents may be determined with any or with all 20 conventional amino acids by methods described below. These 20 rate constants will then provide a profile for each reagent.
  • the desired result is that there is a different profile for each reagent. For example, if Ru (tpy) (bpy) 0 2+ reacts mostly with C-H donors like serine and threonine but Ru(bpy) 3 3+ reacts mostly with one electron donors like tyrosine and tryptophan, then each descriptor and the relationship between the two descriptors will be informative.
  • the rate constants will be measured for the free amino acids and a representative set of di- and tri-peptides to show that the individual rate constants add together linearly and can be discriminated by the three reagents .
  • the three preferred reagents and their reactivities are well suited to discriminating the 20 amino acids.
  • Pt 2 (pop) 4 4" abstracts hydrogen atoms from weak C-H bonds in organic substrates. Shown in Figure 2 are some of the amino acids with the C-H bonds likely to be activated by Pt 2 (pop) 4 4" highlighted.
  • the C-H activation chemistry of Ru(tpy) (bpy)0 2+ is distinct from that of Pt 2 (pop) 4 4" in that inner-sphere adducts of the ruthenium-oxo linkage are often formed ( 19, 34) , which favors activation of alcohols over aliphatic functionality.
  • Ru(tpy) (bpy)0 2+ is likely to prefer serine, threonine, and cysteine more than Pt 2 (pop) 4 4" .
  • Inner-sphere chemistry on the tryptophan and histidine rings or terminal amines of argninine are also likely with Ru(tpy) (bpy)0 2+ .
  • the rate constants for a large array of amino acids will be approximately the average for all of the amino acids separately, weighted by the solvent accessibility of the individual groups in the folded protein. Therefore, the rate constant for a dipeptide should be the average of the rate constants for each individual amino acid, and similarly to larger peptides as long as no secondary structure develops to attenuate the reactivity of certain groups that are protected by the tertiary structure. It will be instructive to measure the rate constants for some representative random-coil peptides to show how simply linking the amino acids modulates the observed rate constants.
  • the rate constants may be measured for the three transition-metal complexes under three conditions: denatured protein, folded protein, and folded protein with bound surrogate (BioKey) peptide or nucleic acid.
  • the objective is for the difference in rate constants for folded and denatured protein to give a quantitative descriptor of the surface area of the folded protein (weighted by the solvent-exposed amino acids) and for the rate constants with and without the bound BioKey molecule to give a quantitative descriptor of the amino acids in the binding site.
  • a complication is that the bound BioKey may present new solvent-accessible residues that react with the transition-metal complex.
  • BioKey peptides with a scrambled amino acid sequence will be included in the reaction with the protein alone, so that the reactive functionality in the BioKey will be present in both reactions.
  • Unfolded proteins may be generated chemically by addition of a denaturing agent, such as urea or guanidinium hydrochloride; we have shown elsewhere that urea does not deactivate our reagents (24) .
  • a denaturing agent such as urea or guanidinium hydrochloride
  • thermal denaturation in cases where the chemical denaturant is problematic, we will use thermal denaturation; in the case of Pt 2 (pop) 4 " , we have shown that thermal denaturation does not alter the selectivity of the reagent with biomolecules except as modulated by the change in biomolecular structure.
  • the rate constant for the peptide may be determined separately and used to correct for additional oxidation of peptide side chains not blocked by the protein.
  • the rate constant for a peptide with the same amino acids, but with a scrambled sequence, may also be of interest.
  • the protein oxidation rates are measured either by quantitating the disappearance of the reagent by standard analytical methods, quantitating the disappearance of the protein by mass spectrometry, amino acid analysis, changes in fluorescence of probes that bind to proteins, or competition with DNA plasmids that are cleaved by the reagent. Rates are compiled for all proteins for each analytical method and each protein state (folded, unfolded, BioKey) . Each of these rates is a new quantitative descriptor of the protein. If one measures the rate of disappearance of the reagent, this will reflect its action on all amino acids.
  • the measurement will be more amino acid-specific. Any or all of these approaches may be used to generate descriptors.
  • the reactivities may be weighted by the solvent accessibility of each residue as prescribed by the folded structure, so the difference in rates for the folded and unfolded proteins will give a quantitative description of the degree of folding and of the chemical functionality on the solvent-accessible surface.
  • the BioKey peptide When the BioKey peptide is bound to the active site, the active site residues will be blocked. The difference in the rates with and without the BioKey will therefore be a quantitative descriptor of the number and kind of residues in the active site.
  • the collected data are used to generate a new database containing in vi tro descriptors of protein targets.
  • the database contains entries for relative rates in the unfolded, folded, and ligand-bound states.
  • the relative rates are normalized on a scale from zero to one and entered into the database in a three-dimensional matrix where each point corresponds to a particular reagent for a given protein in one of the three states. More dimensions may be added by including amino acid-specific rate measurements, such as surface hydrophobicity changes.
  • proteins that exhibit large changes in rate from unfolded to folded states are those with compact structures and a large number of buried residues. Likewise, large changes in rate between folded and BioKey-bound states indicates large active sites.
  • the three reagents described thus far can be generalized by substitution and still be amenable to real-time characterization.
  • the R (tpy) (bpy) 0 2+ complexes can be substituted with electron-donating and releasing substituents that will increase or decrease the driving force for oxidation and hence the reactivity and selectivity for different groups in the polymer.
  • Ru (bpy) 2 (4-NMe 2 -py) 0 2+ only abstracts 1' hydrogens from thymidine sugars and not from A, C, or G whereas the parent Ru(tpy) (bpy)0 2+ complex oxidizes all four nucleotides ( 42) .
  • these complexes can be substituted with sterically differentiated groups that will modulate the selectivity based on the solvent accessibility.
  • steric effects lower the reactivity towards guanine oxidation in duplex DNA without changing the reactivity toward sugar 1' hydrogens (20) .
  • Reactivity of one-electron oxidants based on Ru(bpy) 3 3+ can be modulated by electron-releasing or withdrawing groups or by making the complex significantly larger or smaller, which will change the electron-transfer distance (27) .
  • these changes should modulate the reactivity profile and the effects of solvent accessibility. Such changes in profile will create reagents that return differentiated descriptors for the protein targets .
  • transition-metal reagents are attractive because of the facility with which they can be modified and the ease with which absolute rate constants can be measured ( 17) ; however, mining the potential information available from the wide range of known chemical modification reagents requires general methods for measuring the relative rates.
  • chemical modification reactions have been much more widely applied to study of nucleic acid structure than to study of protein structure because of the greater lability of the phosphodiester backbone. Indeed, reactions that modify DNA nucleotides lead to strand scission if not immediately than almost always after base treatment ( 43) .
  • a very sensitive method for measuring DNA modification is by plasmid isomerization ( 44 , 22) . This method is therefore also a sensitive method for measuring the instantaneous concentration of the modification reagent.
  • a parallel approach described below is to assess the extent of modification of different families of amino acids.
  • the change in surface hydrophobicity of a protein can be assessed by the change in fluorescence of 8-anilino-l- naphthalene-sulfonic acid (ANSA) ( 45) .
  • ANSA 8-anilino-l- naphthalene-sulfonic acid
  • An increase in hydrophobicity measured this way is observed upon protein oxidation, which is very well correlated with formation of tyrosine dimers and oxidation of methionine ( 36) .
  • Hydrophobicity and other methods that monitor modification of specific sets or subsets of amino acids, such as changes in reactive carbonyl group or formation of oxidized methionine
  • the one-electron reagents based on Ru(bpy) 3 3+ can also be analyzed by changes in optical spectra.
  • Plasmid Isomerization The method for measuring the rate constants by competition with plasmid isomerization is shown in Figure 4. Reaction of the reagent of interest with supercoiled (Form I) DNA leads to nicking of the DNA to produce a nicked plasmid (Form II) , which is readily separated from Form I on an agarose gel ( 44 , 22) . Because the plasmid is as much as 4 kb in length, the plasmid isomerization assay is very sensitive to small amounts of DNA damage.
  • reaction is then repeated in the presence of the protein for which a relative rate constant is desired.
  • protein is modified instead of the plasmid, less isomerization is observed.
  • This assay can then be performed for any reagent that modifies both DNA and amino acids .
  • a drawback to the approach in Figure 4 is the that the assay cannot be performed for DNA-binding proteins; however, the simplicity and low quantities of material required are attractive.
  • a more general strategy would be to analyze the oxidized protein by matrix-assisted laser-desorption ionization (MALDI) mass spectrometry ( 45) .
  • MALDI matrix-assisted laser-desorption ionization
  • This method can be used to detect small changes in molecular weight of large proteins on 0.1 pmol of protein (for a recent example, see ( 45) ) .
  • the mass spectrometry will simply be used to detect the change in the concentration of the unmodified protein as a means for determining the rate of modification by the reagent.
  • a final general approach to measuring the relative rates would be to measure the quantity of reagent before and during reaction with the protein. This method could involve traditional analytical chemistry techniques such as HPLC or GC for suitable substrates. The use of isotopically labeled reagents will provide the desired sensitivity. As discussed above, a separate strategy for measuring the rates would be rather than to measure the total rate for all of the amino acids, to use detection methods that sample a subset of the amino acids. Such assays would involve quantitation of reactive carbonyl in the oxidized protein following the reaction, analysis for individual oxidized amino acids such as methionine sulfoxide, or measuring the change in hydrophobicity by ANSA emission ( 36) . These methods would then provide further specificity in the descriptors that would be meaningful when the same detection methods were compared for different proteins.
  • the reactivity desciptors should be predictable from the three- dimensional structure and the reactivity profile, which describes the inherent chemical reactivity of each amino acid towards the reagent.
  • the three-dimensional structure can be used to weight each amino acid by its solvent accessibility in the folded protein, which can then be weighted by its inherent reactivity. The ability to perform these calculations carefully will depend on accurate reactivity profiles, which is why choosing transition-metal reagents where absolute rate constants can be measured is vital in the initial studies.
  • rates are measured in the presence of the BioKey peptide, again the reactivity should be predictable from the three-dimensional structure and the reactivity profile.
  • the amino acid composition of protein binding sites and surfaces has been determined in 50 proteins whose crystal structures with bound ligands are known ( 9) . These studies show that Trp, His, Arg, and Tyr are much more abundant in binding sites contacting bound ligands than in general in the protein. Gly and Ser are often found near the bound ligand as well; however, these residues are generally abundant throughout the protein. Therefore, reagents that are specific for Trp and His, which are found at very low frequencies outside the binding site, will be very informative. Attractive reagents include Ru(tpy) (bpy)0 2+ , which oxidizes ring nitrogens, reagents that alkylate ring nitrogens, or oxidants that form amine oxides at ring nitrogens.
  • the collected data will be used to generate a new database containing in vi tro descriptors of protein targets, which we will call R (for reactivity or recognition) to distinguish it from NCI's S, A, and T databases.
  • R for reactivity or recognition
  • the database preferably contains information regarding reaction rates in the unfolded, folded, and ligand-bound states.
  • Rate constants may provided for several different ligand-bound states, i.e., with different ligands.
  • the rate information is expressed in relative terms, and normalized on a scale from zero to one.
  • the rate information may be expressed in difference form, e.g., by providing the (rate folded-rate unfolded), (rate ligand bound-rate folded) , (rate ligand bound-rate unfolded) , and/or (rate ligand 1 bound-rate ligand 2 bound) .
  • the data may be conceptualized as a three-dimensional matrix where each point corresponds to a particular reagent for a given protein in one of the three states, although its most efficient representation is likely to be that of a relational database .
  • the various databases are normalized in accordance with standard database programming practice to minimize the duplication of information.
  • one database may be of reactions.
  • Each record in this database will contain a reaction ID, and will normally contain additional information about the reaction, such as the chemical name of the reagent, the reaction conditions (e.g., temperature, solvent) , and assay method. If a reagent is used under many different reaction conditions, is may be appropriate to also create a reagent database, with a reagent ID field used to link the reaction and reagent databases, and information about the reagent placed in the other fields of the reagent database record.
  • Another database may tabulate target proteins.
  • Each record in this database will contain a target ID, and may optionally contain additional information about the target protein, such as its name, biological activity, sequence, etc.
  • a third database could provide the target protein state.
  • Each record of the "state" database will have a state ID field, and additional fields which identify whether the state is a folded or unfolded protein and, if folded, whether a ligand is bound to it.
  • a ligand could be identified by a ligand ID acting as a lookup field for retrieval of information from a ligand database .
  • the reaction, protein, and protein state databases are relationally linked to the reactivity database, with the reaction ID used for lookup of a reaction and the target protein ID used for lookup of a target protein.
  • each record in the reactivity database will contain fields for the reaction ID, the target protein ID, and the result of the reaction (e.g., a rate constant) .
  • reaction database There is a many-to-one relationship between the reaction database and the reactivity database, and a one-to-many relationship between the reactivity database and the protein database.
  • each record identifies a different type of assay. Proteins may have more than one activity, and have a different spectrum of relevant drugs for each. Therefore, in a preferred interaction database, each record will include a drug ID, a protein ID, an assay ID, and an assay value or potency. Each ID field is a relational link to another database .
  • the R database Once the R database is compiled, it will be used to improve the efficiency of lead generation as hereafter described. As discussed above, the initial database will be compiled for proteins that bind known inhibitors and have well understood biology and expression profiles. Once a sufficient number of these proteins are in the database, reactivity descriptors may be determined for proteins that have not been screened against combinatorial libraries, whose three-dimensional structures are not known, and/or whose biological profile has not yet been determined. It will be appreciated that there is no fixed minimum number of proteins which must be in the database before it can provide useful information about a protein of interest.
  • the database will provide data on at least 50, more preferably at least 200, still more preferably at least 1000, and most preferably every known protein. Another factor is the diversity of the proteins in the database. The more diverse the proteins, the more likely it is that at least one database protein will have a reasonably similar reactivity.
  • the reactivity descriptor for a protein is the set of rate constants (or rate constant differences) for all of the characterizing reactions to which that protein has been subjected.
  • the reactivity descriptors will be entered into R and the similarity between the unknown protein and the known proteins in the database will be determined as described in the section on "Descriptors" .
  • the utility of a reactivity database is dependent, not only on the number of proteins in the database, but also on the number of reactions to which the protein was subjected, and the number of protein states examined. All else being equal, the more data points, the greater the degree of characterization. Preferably, there are at least 1, more preferably at least 3, still more preferably at least 10, most preferably at least 100, datapoints (target protein/state/reaction triplets) . While less easily defined, the greater the diversity in the characterizing reactions, the more useful the database will be.
  • Relation to databases with information on activity (such as A) will predict what types of chemical compounds will most likely bind to the unknown target, because proteins that appear similar in R will bind similar compounds. This procedure will be analogous to calculating similarities between chemical compounds (such as in S) and predicting that compounds similar to a known inhibitor will also inhibit similar targets.
  • Relation to an expression database such as T will provide pharmacological information on the unknown target.
  • An aptamer-based descriptor is a description of a protein in terms of the aptamers which recognize it. While peptides could serve as aptamers, the preferred aptamers are nucleic acids.
  • Such a descriptor is not a simple numerical value. Rather, it is a list of sequences (and, preferably, secondary structures and contact points) for each of the aptamers identified as binding a particular protein. In this section, we will describe how nucleic acid aptamers are identified and characterized, and how the similarity of the aptamer-based descriptors for two different proteins may be calculated.
  • a single-stranded oligonucleotide library is screened to identify aptamers which bind a protein with a desired affinity.
  • This protein may be a reference protein with known drug antagonists, or the target protein for which such antagonists are to be identified.
  • the desired aptamers are amplified and sequenced.
  • the aptamers serve to characterize the protein in that only those oligonucleotides which can conform to the surface of the protein will bind to it. One may say that the aptamers take "impressions" of the protein surface. Some of the aptamers may be expected to bind the protein at a site corresponding or, overlapping, or otherwise occluding the functional site of the protein. (Such aptamers may, if desired, be identified by screening the aptamers for antagonist activity.) Others will bind at sites distal to the functional site. Some will bind to the same site, others, to different sites. All contribute to a "picture" of the protein.
  • the contact sites (the bases within the aptamer) through which the aptamer contacts the protein are identified.
  • a preferred means for such identification is a footprinting reaction where chemical modification of the nucleic acid is performed with and without the bound protein; sites where bound protein blocks chemical modification are contact sites. Footprinting of nucleic acids in this manner can be achieved using enzymes such as DNAse or chemical reagents such as copper-phenanthroline (Papavassiliou, A.G. Biochem. J. 1995, 305, 345-357), Fe(EDTA) 2 (Pogozelski, et al . , J. Am. Chem. Soc .
  • the secondary structure of at least the contacting bases of the oligonucleotide is analyzed.
  • the secondary structures of the aptamers can be predicted using the approach of Zichi et al . (J.P. Davis, N. Janjic, D. Pribnow, D.A. Zichi, Nucleic Acids Res.
  • the descriptor for each aptamer preferably identifies not only the overall sequence of the aptamer, but also the contact site.
  • the bases of the contact site are preferably described not only by identifying the base itself, but also indicating whether in the secondary structure of the aptamer it is paired to any base, and if so what.
  • two-letter codes are used for each nucleotide.
  • the contact nucleotides are followed by the small letter "o", so the contacts are Ao, To, Go, and Co for a DNA aptamer.
  • the contact sites are followed by a small letter representing the nucleotide on the opposite strand.
  • At the entire list of double-stranded codes.
  • An alternative to comparing the footprinted sequence in a linear array would be to develop a two-dimensional projection of the secondary structure of the aptamer. For example, suppose the aptamer at the right is selected for a given target and the sites indicated with an arrow are determined to be protected by the target via Pt-pop footprinting. The aptamer can then me mapped onto a two-dimensional grid where the contact sites are coded as before and the remaining "placeholder" sites are coded as either Nn for a base pair or mismatch site or No for a single-stranded site (see Figure 5) . There is no need to differentiate the base pair and mismatch placeholder sites because this information is already in the contact site codes.
  • the two-dimensional grid can now be analyzed for similarity to other two-dimensional representations by graph-theoretical approaches, such as those used for determining compound similarity (as in Patterson et al . , J. Med. Chem., 1996).
  • the contact sites and secondary structure are used to develop a consensus "functionality sequence" (epitope) , which represents both the primary sequence and the secondary structure of the nucleic acid moiety which contacts the target protein.
  • These functionality sequences are entered into a relational database that is used to analyze functionality sequences for unknown targets.
  • Targets with homologous functionality sequences bind small molecules of a similar nature.
  • Sequence identity among aptamer contact sites may be determined using an adaptation of BLAST from the National Center for Biotechnology Information.
  • the BLAST (Basic Local Alignment Search Tool) algorithm is described in S.F. Altschul, W. Gish, W. Miller, E.W. Myers, andD.J. Lipman, J. Mol. Biol. 215, 403- 10 (1990) .
  • a new version that allows the functionality (secondary structure) codes (At, Ta, Ao, etc.) to be compared can be adapted from art in the public domain.
  • an identity matrix is used. If the aligned functionality elements are the same, the score for that pair of elements is one, if they are different, the scope is zero. The individual scores are summed and divided by the total number of elements.
  • a second approach would be to weight more highly the alignments least likely to occur by chance alone. This would require tabulating predicted secondary structure information for a large number of DNA sequences. This could be done by (1) randomly generating DNA sequences of the length and base composition to be used in the library, (2) predicting the secondary structure of each sequence by standard methods, (3) converting the string of bases into a string of "functionality elements” and (4) calculating the probability of occurrence of each alignment.
  • this weighting scheme is not limited to protein contact sites; all bases are considered.
  • a base pair (Gc, Cg, At, Ta) is likely to be more common, and hence less informative, than a mismatch (Gt, Gg, Ga, Ag, etc.) or a single-stranded nucleotide (Go, Ao, To, Co) .
  • These elements could therefore be weighted appropriately to allow the elements that have the greatest information to be considered most heavily in determining similarity. For one possible weighting, see Ex. 3.
  • nucleic acids are preferred, because they can be amplified, peptides may also be used to generate aptamer-based descriptors. Preferably, these peptides are 5-10 a. a. in length.
  • the peptide library is screened for binding to the query protein and the peptide aptamers are compared to those of the peptide aptamers found to bind each of the reference proteins.
  • BLAST may be used in its standard form to compare the amino acid sequences. Any scoring matrix conventional in the art, e.g., BLOSUM, may be used to score aligned amino acid pairs. The resulting sequence similarity score may then be standardized.
  • aptamer similarity on the basis of sequence similarity
  • a 2-D fingerprint may be devised in which each bit represents the presence or absence of a particular secondary structure, such as an unbonded region, an unbonded region of a particular length (e.g., 2-6 bases, 7-20 bases, >20 bases), an interior loop, an interior loop of a particular length (e.g., 2-6 bases, 7-20 bases, >20 bases), a bulge loop, a bulge loop of a particular length (e.g.
  • a hairpin loop of any length and type a hairpin loop closed by G:C, a hairpin loop closed by A.U, a particular type of hairpin loop of a particular length (e.g., 3 bases, 4-5 bases, 6-7 bases, 8-9 bases, 10-30 bases, >30 bases) , a run of paired bases of a particular length, an overall base composition in a particular range (e.g., 40-60% GC) , etc.
  • Tinoco, et al . Nature New Biol., 246:40-41 (1973) and Tinoco, et al .
  • m aptamers bind the query protein, and n aptamers binding a reference protein.
  • the aptamer-based similarity of the target and reference protein may be calculated on the basis of any or all of the m x n possible comparisons. Many approaches are possible, including, but not limited to
  • aptamers are selected for new targets of unknown structure.
  • the ligands for these targets are footprinted, e.g., using Pt 2 (pop) 4 4" , and the functionality sequences are determined.
  • the functionality sequences are entered into the data base, and connections are drawn between the ligands that bind the unknown target and those already in the data base for known targets.
  • Initial screening of small molecule libraries is then directed towards compounds that have similar size and functionality to those that bind the known targets that exhibit high functionality sequence homology to the unknown target of interest .
  • this information is entered into a set of relationally linked databases.
  • the aptamers database has a many-to-many relationship with the protein database.
  • each record includes an aptamer ID field (a link to an aptamer database) and a protein ID field (a link to a protein database) .
  • the record may optionally identify the contact site. Secondary structure maybe indicated in either the aptamer record (if invariant) or in the aptamer- protein interaction record (if affected by the protein binding) .
  • the database may include additional descriptors for the target proteins. Possible descriptors include the following:
  • the compound library is a combinatorial library whose members are suitable for use as drugs if, indeed, they have the ability to mediate a biological activity of the target protein.
  • Peptides have certain disadvantages as drugs. These include susceptibility to degradation by serum proteases, and difficulty in penetrating cell membranes.
  • all or most of the compounds of the compound library avoid, or at least do not suffer to the same degree, one or more of the pharmaceutical disadvantages of peptides.
  • disjunction in which a lead drug is simplified to identify its component pharmacophoric moieties
  • conjunction in which two or more known pharmacophoric moieties, which may be the same or different, are associated, covalently or noncovalently, to form a new drug
  • alteration in which one moiety is replaced by another which may be similar or different, but which is not in effect a disjunction or conjunction.
  • disjunction in which a lead drug is simplified to identify its component pharmacophoric moieties
  • conjunction in which two or more known pharmacophoric moieties, which may be the same or different, are associated, covalently or noncovalently, to form a new drug
  • alteration in which one moiety is replaced by another which may be similar or different, but which is not in effect a disjunction or conjunction.
  • the use of the terms "disjunction”, “conjunction” and “alteration” is intended only to connote the structural relationship of the end product to the original leads, and not how the new drugs are actually synth
  • Alterations may modify the size, polarity, or electron distribution of an original moiety. Alterations include ring closing or opening, formation of lower or higher homologues, introduction or saturation of double bands, introduction of optically active centers, introduction, removal or replacement of bulky groups, isosteric or bioisosteric substitution, changes in the position or orientation of a group, introduction of alkylating groups, and introduction, removal or replacement of groups with a view toward inhibiting or promoting inductive (electrostatic or conjugative (resonance) effects.
  • the substituents may include electron acceptors and/or electron donors.
  • Typical electron donors (+1) include -CH 3 , -CH 2 R, -CHR 2 , -CR 3 and -COO " .
  • the substituents may also include those which increase or decrease electronic density in conjugated systems.
  • the former (+R) groups include -CH 3 , -CR 3 , -F, -Cl, -Br, -I, -OH, -OR,
  • the later (-R) groups include -N0 2 , -CN, -CHC, -COR, -COOH, -COOR, -CONH 2 , -S0 2 R and -CF 3 .
  • a compound, or a family of compounds, having one or more pharmacological activities may be disjoined into two or more known or potential pharmacophoric moieties.
  • Analogues of each of these moieties may be identified, and mixtures of these analogues reacted so as to reassemble compounds which have some similarity to the original lead compound. It is not necessary that all members of the library possess moieties analogous to all of the moieties of the lead compound.
  • Benzodiazepines The design of a library may be illustrated by the example of the benzodiazepines .
  • Benzodiazepine drugs including chlordiazepoxide, diazepam and oxazepam, have been used on anti- anxiety drugs.
  • Derivatives of benzodiazepines have widespread biological activities; derivatives have been reported to act not only as anxiolytics, but also as anticonvulsants, cholecystokinin (CCK) receptor subtype A or B, kappa opioid receptor, platelet activating factor, and HIV transactivator Tat antagonists, and GPIIblla, reverse transcriptase and ras farnesyltransferase inhibitors.
  • CCK cholecystokinin
  • the benzodiazepine structure has been disjoined into a 2- aminobenzophenone , an amino acid, and an alkylating agent. See Bunin, et al . , Proc. Nat. Acad. Sci. USA, 91:4708 (1994). Since only a few 2-aminobenzophenone derivatives are commercially available, it was later disjoined into 2-aminoarylstannane, an acid chloride, an amino acid, and an alkylating agent. Bunin, et al., Meth. Enzymol . , 267:448 (1996).
  • the arylstannane may be considered the core structure upon which the other moieties are substituted, or all four may be considered equals which are conjoined to make each library member.
  • a basic library synthesis plan and member structure is shown in Figure 1 of Fowlkes, et al . , U,S. Serial No. 08/740,671, incorporated by reference in its entirety.
  • the acid chloride building block introduces variability at the R 1 site.
  • the R 2 site is introduced by the amino acid, and the R 3 site by the alkylating agent.
  • the R 4 site is inherent in the arylstannane.
  • Bunin, et al . generated a 1, 4 -benzodiazepine library of 11,200 different derivatives prepared from 20 acid chlorides, 35 amino acids, and 16 alkylating agents.
  • variable elements included both aliphatic and aromatic groups.
  • aliphatic groups both acyclic and cyclic
  • the aromatic groups featured either single and multiple rings, fused or not, substituted or not, and with heteroatoms or not.
  • the secondary substitutents included -NH 2 , -OH, -OMe, -CN, -Cl, -F, and -COOH. While not used, spacer moieties, such as -O- , -S-, -00-, -CS-, - NH-, and -NR- , could have been incorporated.
  • Bunin et al suggest that instead of using a 1, 4- benzodiazepine as a core structure, one may instead use a 1, 4- benzodiazepine-2 , 5-dione structure.
  • DeWitt, et al . , Proc. Nat. Acad. Sci. (USA), 90:6909-13 (1993) describes the simultaneous but separate, synthesis of 40 discrete hydantoins and 40 discrete benzodiazepines. They carry out their synthesis on a solid support (inside a gas dispersion tube) , in an array format, as opposed to other conventional simultaneous synthesis techniques (e.g., in a well, or on a pin) .
  • the hydantoins were synthesized by first simultaneously deprotecting and then treating each of five amino acid resins with each of eight isocyanates.
  • the benzodiazepines were synthesized by treating each of five deprotected amino acid resins with each of eight 2 -amino benzophenone imines .
  • a polymer bead-bound aldehyde preparation was "split" into three aliquots, each reacted with one of three different ylide reagents.
  • the reaction products were combined, and then divided into three new aliquots, each of which was reacted with a different Michael donor.
  • Compound identity was found to be determinable on a single bead basis by gas chromatography/mass spectroscopy analysis.
  • hypnotics higher alcohols clomethiazole
  • aldehydes chloral hydrate
  • carbamates meprobamate
  • acetylcarbromal barbiturates
  • barbital benzodiazepine
  • narcotic analgesics morphines phenylpiperidines (meperidine) diphenylpropylamines (methadone) phenothiazihes (methotrimeprazine)
  • analgesics analgesics, antipyretics, antirheumatics salicylates (acetylsalicylic acid) p-aminophenol (acetaminophen)
  • anxiolytics propandiol carbamates meprobamate
  • benzodiazepines chlordiazepoxide, diazepam, oxazepam
  • CNS stimulants xanthines (caffeine, theophylline) phenylalkylamines (amphetamine) (Fenetylline is a conjunction of theophylline and amphetamine) oxazolidinones (pemoline) cholinergics choline esters (acetylcholine) N,N-dimethylcarbamates
  • adrenergics aromatic amines epinephrine, isoproterenol, phenylephrine
  • alicyclic amines cyclopentamine
  • aliphatic amines methylhexaneamine
  • imidazolines naphazoline
  • antihistamines ethanolamines (diphenhydramine) ethylenediamines (tripelennomine ) alkylamines (chlorpheniramine) piperazines (cyclizine) phenothiazines (promethazine)
  • vasodilators polyol nitrates (nitroglycerin)
  • antibiotics penicillins cephalosporins octahydronapthacenes (tetracycline) sulfonamides nitrofurans cyclic amines naphthyridines xylenols
  • antitumor alkylating agents nitrogen mustards aziridines methanesulfonate esters epoxides amino acid antagonists folic acid antagonists pyrimidine antagonists purine antagonists antiviral adamantanes nucleosides thiosemicarbazones inosines amidines and guanidines isoquinolines benzimidazoles piperazines
  • Goth Medical Pharmacology: Principles and Concepts (C.V. Mosby Co. : 8th ed. 1976) ; Korolkovas and Burckhalter, Essentials of Medicinal Chemistry (John Wiley & Sons, Inc. : 1976) .
  • the library is preferably synthesized so that the individual members remain identifiable so that, if a member is shown to be active, it is not necessary to analyze it.
  • Several methods of identification have been proposed, including:
  • each member is synthesized only at a particular coordinate on or in a matrix, or in a particular chamber. This might be, for example, the location of a particular pin, or a particular well on a microtiter plate, or inside a "tea bag".
  • the present invention is not limited to any particular form of identification .
  • Solid phase synthesis permits greater control over which derivatives are formed. However, the solid phase could interfere with activity. To overcome this problem, some or all of the molecules of each member could be liberated, after synthesis but before screening.
  • the user receives a list of reference proteins. For each reference protein a similarity score, and a list of known antagonists (or other modulators) is given. These are the "drug leads" .
  • the antagonists are weighted by the similarity scores of their respective proteins. If available, and desired, they may also be weighted by their potency against their corresponding reference protein, and/or by other physicochemical characteristics of interest, e.g., lipophilicity.
  • This invention contemplates the construction of a composite combinatorial compound library which is biased in favor of compounds (both scaffoldings and substituents) which are structurally similar to the drug leads.
  • the composite library is based on these already available simple libraries .
  • Each drug lead is compared, using structural descriptors, to the basic scaffold (or to the most similar member) of each candidate simple combinatorial library.
  • the structural descriptors which may be used include, but are limited to, those listed in Patterson, et al . (1996), Klebe and Abraham (1993), Cummins, et al . (1996), and Matter (1997). Conventional mathematical methods may be used to select or weight the descriptors .
  • the 2D fingerprint method described in Matter, et al . (1997) is of particular interest. In essence, the compound is analyzed for the presence or absence of particular molecular fragments, the results being encoded in a binary format.
  • each fragment was projected using a pseudorandomization algorithmin to a bitstring of limited size (i.e., fewer bits than the total number of unique fragments in the compounds of the database) .
  • a pseudorandomization algorithmin to a bitstring of limited size (i.e., fewer bits than the total number of unique fragments in the compounds of the database) .
  • the presence of 60 specific functional groups, rings or atoms was encoded in 60 of the total 988 bits. Details are given in UNITY Chemical Information Software, version 2.5. Reference Guide, pp. 45-58, Tripos Inc., 1699 Hanley Rd. , St. Louis, MO 63144.
  • the candidate library is assigned a weight which is a function of (1) the drug lead's query score (reflecting the similarity of its reference protein to the query protein and, optionally, the potency or other drug characteristics of the drug lead) and (2) the structural similarity score between the drug lead and the scaffold. These weights then determine the predominance of that candidate library in the ideal composite library. Thus, if benzodiazepines score twice as high as carbamates, the benzodiazepine library screened might be twice as big as the carbamate library.
  • Each drug lead may also be used to evaluate possible substituents to be conjugated to the scaffold.
  • Each candidate substituent is scored, on the basis of structural descriptors, for its similarity to the drug lead.
  • the proportion of the candidate substituent in the combinatorial reaction mix may then be governed by its similarity score.
  • Low-scoring candidate substituents may be omitted entirely, or merely reduced in concentration.
  • the mix may be limited to high scoring substituents, or their concentrations may merely be increased. The concentration changes need not be strictly proportional to the scores.
  • a reference protein is very similar to the query protein, it should have a strong influence on the library composition, and, if its similarity is only modest, its influence should be weak.
  • a drug lead which strongly resembles a candidate library component should strongly favor that library, and one which only weakly resembles it should imply a more modest enrichment.
  • W L original absolute weight of library type L in composite library [0..1]
  • the new relative weights W' L are converted to new absolute weights by dividing each by the sum of W' L for all L.
  • the leads could be considered in descending order of structural similarity with a proposed library component, i.e., most similar lead first.
  • the user may decide to simply screen the most similar of the candidate simple combinatorial libraries. This is equivalent to giving it a weight of 1 and the others a weight of 0.
  • Drug leads are not equal in value.
  • the drugs will vary in potency, side effects, deliverability, residence time, ease of synthesis, cost of production etc. Those factors which the chemist wishes to consider may be subsumed into a quality factor (q d ) ranging from 0 to 1, with unity being the most desirable sort of lead.
  • a quality factor may be assigned by any rational method. If only potency is considered, then the most potent drug in the database could be assigned a value of 1, and the other drugs assigned relative values based on the logarithms of their potencies. Thus, if the best drug has an IC50 of 10 ⁇ 12 , a drug with an IC50 of 10 ⁇ might have a q d of 1/6. Of course, any function which converts potencies to a zero to one scale in which higher potencies yield higher values might be used.
  • Examples of candidate simple libraries which might be evaluated include derivatives of the following: Cyclic Compounds Containing One Hetero Atom Heteronitrogen pyrroles pentasubstituted pyrroles pyrrolidines pyrrolines prolines indoles beta-carbolines pyridines dihydropyridines 1 , 4 -dihydropyridines pyrido [2 , 3 -d] pyrimidines tetrahydro-3H-imidazo [4, 5-c] pyridines Isoquinolines tetrahydroisoquinolines quinolones beta-lactams azabicyclo [4.3.0] nonen-8-one amino acid Heterooxygen furans tetrahydrofurans
  • 1,2,3-triazoles purines Heteronitrogen and Heterooxygen dikelomorpholines isoxazoles isoxazolines Heteronitrogen and Heterosulfur thiazolidines
  • the library one may consider not only the drug leads retrieved for the high similarity reference proteins, but also structures which are analogues of the BioKey peptides or nucleic acid aptamers known to bind the query protein and/or the higher-ranked reference proteins.
  • the present invention is useful, not only in designing a library, but also in deciding which of several query proteins to target. It may be desirable to inhibit a biochemical pathway. Several different proteins may be known to mediate that pathway. To decide which one to develop drugs for, each protein might be used to query the database . The one with the most potent and specific drug leads, or with the drug lead having the greatest resemblance to a readily available combinatorial library, becomes the target of choice.
  • the protein is immobilized, e.g, on filter paper, and subjected to binding by a library of random oligonucleotides.
  • the oligonucleotides may be DNA, RNA, or a DNA or RNA analogue, e.g., in which the sugar functionality of the nucleotide is modified at the 2' position with substituents such as fluoro-, amino-, or methoxy- .
  • the members of the library that bind to the target protein are separated and amplified using the polymerase chain reaction or other amplification scheme. The amplified pool is bound to the immobilized protein again, and the strong binding fraction is selected.
  • the affinity of the pool may be determined after each cycle.
  • the oligonucleotides are cloned into suitable host cells, e.g., bacteria, and the sequences are determined by automated methods.
  • the nucleic acid aptamers for a specific target are resynthesized and 3 ' - or 5 '-labeled with 32 P.
  • Each radiolabeled oligomer is then mixed with an excess of the target protein and 100 - 500 ⁇ M Pt 2 (pop) 4 4" and photolyzed at wavelengths between 350 and 500 nm until the reaction with Pt 2 (pop) 4 4" is complete.
  • the radiolabeled oligomer is then precipitated.
  • a parallel control reaction is run without the protein.
  • the aptamers photolyzed with and without the protein are then loaded onto a polyacrylamide sequencing gel.
  • the nucleotides where there is significant reaction without the protein but not with the protein are then classified as contact sites. This process is repeated for all of the aptamers for a given target and shows a high degree of similarity in the contact sites between aptamers .
  • a DNA library is screened for binding to lysozyme
  • This sequence is then synthesized and radiolabeled on the 5' end with 32 P.
  • the labeled oligomer is reacted with Pt 2 (pop) 4 4" by photolysis at 400 nm in phosphate buffered saline and analyzed on a sequencing gel, which shows that Pt 2 (pop) 4 4 " induces scission of the oligonucleotide at every base in the sequence, giving a ladder of bands for the oligomer with approximately the same extent of scission at each nucleotide in the sequence.
  • the reaction is then repeated in the presence of enough protein to cause the majority of the oligonucleotide to be bound to the protein.
  • the Pt 2 (pop) 4 4 ⁇ exhibits some reaction with the protein, so the concentration of the Pt 2 (pop) 4 4" must be higher than in the reaction of the oligonucleotide alone, and the reaction must be performed in a short time so that alteration of the structure of the nucleoprotein complex due to damage of the protein by Pt 2 (pop) 4 4" is not a factor. Nevertheless, only the oligonucleotide is labeled, so only scission of the protein is detected.
  • the scission pattern visualized on a sequencing gel then shows the same relative reactivity at the nucleotides that do not contact the protein as in the reaction of the oligomer alone, and greatly attenuated reactivity at sites protected by the protein.
  • Studies on crystallographically characterized DNA-protein complexes show that the experiment faithfully indicates the sites of contact of the protein on DNA (see Breiner already cited) .
  • the "footprint" of the protein on the DNA is determined by quantitating the extent of cleavage at each nucleotide in both reactions to form two histograms, normalizing the two histograms so that nucleotides that are clearly outside the binding site give the same intensity, and then assigning contact sites as those where addition of the protein attenuates the relative intensity.
  • cleavage without the protein will give approximately the same intensity at all nucleotides.
  • the cleavage intensity in the reaction with the protein is then normalized for the nucleotides on either end, and the sites where less cleavage occurs is counted as a protein site. For example, if relatively less cleavage is observed at the underlined sites: 5'- TAGCTGGCCAAAGTGCGAACACGGCCTTG. then the cleavage for example on the end nucleotides and in the CGAA loop would be relatively the same with and without the protein, while cleavage in the stem and AAA bulge would be protected by the protein. This result would imply that the protein recognizes the AAA bulge and flanking stem, and this structure would be used to search for proteins that bind similar functionality sequences.
  • RNA aptamers shown in Scheme 1A were isolated by SELEX for the reverse transcriptase from Feline Immunodeficiency Virus (Chen, H. ; McBroom, D.G.; Zhu, Y.-Q; Gold, L.; North, T.W. Biochemistry 1996 , 35, 6923-6930) and from the ribosomal L22 protein associated with Epstein-Barr Virus (Dobbelstein, M.; Shenk, T. J. Virology 1995, 69, 8027-8034).
  • the homology score is determined by comparing the two sequences according to the (illustrative) scoring system shown in C. By this scheme, a single-stranded residue is given a maximum score of 2 , a mismatch is given a maximum score of 1.8, and a base pair is given a maximum score of 1. The longer of the two sequences is then chosen as the "parent sequence", and the maximum score is calculated, which in this case is 26. This sequence is then aligned with the homolog to be compared, allowing for gaps if necessary; it is important to choose the longer sequence as the parent sequence because this imposes an implicit penalty for gaps. The score is then computed at each nucleotide. For a perfect match, the score is the same as the maximum score.
  • the score is 0.5 for transposition of a base pair (i.e., substitution of Ta for At), 0.5 for a base pair/mismatch pair where the same base contacts the protein, and 0.9 for a two mismatches where the contacted base is the same.
  • the summed comparison score is then divided by the maximum total score.
  • the comparison then of the two aptamers in Scheme 2 gives a score of 11/26 or 0.423. (The score would have been 4/26 for an identity matrix) .
  • DNA aptamers would be scored in an identical manner except that T would replace U.
  • Related strategies might involve changing in any of the maximum score numbers for single strands, mismatches, and base pairs or any of the partial match scores. It also need not be the case that all four nucleotides (A, T/U, G, C) have the same maximum score or partial match scores. Also, the individual scores could be summed instead as the squares of all the scored and then divided by the sum of the squares of the maximum score.
  • Ru(tpy) (bpy)0 2+ has a rate constant with methionine of 15 M "1 s '1 and the rate constant with all of the other amino acids is negligible
  • Pt 2 (pop) 4 4 ⁇ has a Stern- Volmer rate constant for tryptophan that is 1.0 x 10 7 M "1 s "1 and that all of the other amino acids give negligible rate constants .
  • glutamine synthetase and lysozyme are reacted with the two compounds in the folded, unfolded, and BioKey-bound states.
  • the BioKey peptides are engineered to avoid the inclusion of methionine and tryptophan so that cross-reaction with the BioKey is not an issue.
  • the rate constants are normalized to the number of the most reactive residue for each reagent, i.e., the Ru(tpy) (bpy)0 2+ rate constant is in moles of methionine and the Pt 2 (pop) 4 4" rate constant is in moles of tryptophan. Now the rate constants are measured in the three states to give (hypothetically) :
  • G ven as " s _ w ere t e mo ar concentration is per tota methionine in the two proteins.
  • B Given as M “1 s" 1 where the molar concentration is per total tryptophan.
  • the rate constants are close to those for the amino acids themselves but slightly attenuated due to some steric constraints imposed by inclusion in the linear polymer.
  • the reactivity descriptors are then calculated by comparing the difference in the folded and unfolded or the BioKey and folded rate constants normalized by the unfolded rate constant.
  • the protein concentration would be determined as the total protein concentration times ( (0.40 times the fraction of the total amino acids which are of isoleucine residues) plus (0.40 times the fraction which are leucine residues) plus (0.20 times the fraction which are alanine residues)).
  • aptamers and reactivity descriptors are determined for a new protein (protein X) .
  • Protein X is compared to the other proteins in the database and is determined to be 60% similar to lysozyme, 20% similar to ribosomal L22 protein, 8% similar to glutamine synthetase and ⁇ 3% similar to all of the other proteins in the database.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Pathology (AREA)
  • Biotechnology (AREA)
  • Food Science & Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

L'invention concerne une protéine d'intérêt caractérisée par une réactivité et/ou des descripteurs aptamères comme étant similaire à une protéine d'une base de données, dont l'activité est induite par un médicament connu. La recherche de médicaments induisant la protéine d'intérêt fait apparaître des composés similaires par leur structure à ceux qui induisent l'activité des protéines de la base de données à score plus élevé.
PCT/US1998/015943 1997-08-01 1998-07-30 Procede d'identification et de developpement de chefs de file de medicaments WO1999006839A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP98938201A EP1002235A1 (fr) 1997-08-01 1998-07-30 Procede d'identification et de developpement de chefs de file de medicaments
AU86781/98A AU8678198A (en) 1997-08-01 1998-07-30 Method of identifying and developing drug leads
CA002298629A CA2298629A1 (fr) 1997-08-01 1998-07-30 Procede d'identification et de developpement de chefs de file de medicaments

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US90484297A 1997-08-01 1997-08-01
US08/904,842 1997-08-01

Publications (1)

Publication Number Publication Date
WO1999006839A1 true WO1999006839A1 (fr) 1999-02-11

Family

ID=25419872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/015943 WO1999006839A1 (fr) 1997-08-01 1998-07-30 Procede d'identification et de developpement de chefs de file de medicaments

Country Status (4)

Country Link
EP (1) EP1002235A1 (fr)
AU (1) AU8678198A (fr)
CA (1) CA2298629A1 (fr)
WO (1) WO1999006839A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001036980A2 (fr) * 1999-11-18 2001-05-25 Melacure Therapeutics Ab Procede d'identification du site actif dans une cible biologique
EP1209610A1 (fr) * 2000-11-28 2002-05-29 Valentin Capital Management Dispositif et procédé de détermination de données d'affinité entre une cible et un ligand
WO2002044990A2 (fr) * 2000-11-28 2002-06-06 Valentin Capital Management Appareil et procede permettant de determiner des donnees d'affinite entre une cible et un ligand
US6475726B1 (en) 1998-01-09 2002-11-05 Cubist Pharmaceuticals, Inc. Method for identifying validated target and assay combinations for drug development
US6586190B2 (en) 2000-08-18 2003-07-01 Syngenta Participations Ag Parallel high throughput method and kit
US6589738B1 (en) 1999-11-09 2003-07-08 Elitra Pharmaceuticals, Inc. Genes essential for microbial proliferation and antisense thereto
US6720139B1 (en) 1999-01-27 2004-04-13 Elitra Pharmaceuticals, Inc. Genes identified as required for proliferation in Escherichia coli
US6846625B1 (en) 1998-01-09 2005-01-25 Cubist Pharmaceuticals, Inc. Method for identifying validated target and assay combination for drug development

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993001484A1 (fr) * 1991-07-11 1993-01-21 The Regents Of The University Of California Methode permettant d'identifier des sequences de proteines qui se plient pour former une structure en trois dimensions connue
US5338659A (en) * 1991-04-02 1994-08-16 Terrapin Technologies, Inc. Method for determining analyte concentration by cross-reactivity profiling
WO1995032425A1 (fr) * 1994-05-23 1995-11-30 Smithkline Beecham Corporation Bibliotheques combinatoires codees
WO1997042500A1 (fr) * 1996-05-09 1997-11-13 3-Dimensional Pharmaceuticals, Inc. Procede d'analyse et appareil a changement thermique et a microplaque pour l'optimisation de la mise au point de ligands et de la chimie des proteines a variables multiples

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5338659A (en) * 1991-04-02 1994-08-16 Terrapin Technologies, Inc. Method for determining analyte concentration by cross-reactivity profiling
WO1993001484A1 (fr) * 1991-07-11 1993-01-21 The Regents Of The University Of California Methode permettant d'identifier des sequences de proteines qui se plient pour former une structure en trois dimensions connue
WO1995032425A1 (fr) * 1994-05-23 1995-11-30 Smithkline Beecham Corporation Bibliotheques combinatoires codees
WO1997042500A1 (fr) * 1996-05-09 1997-11-13 3-Dimensional Pharmaceuticals, Inc. Procede d'analyse et appareil a changement thermique et a microplaque pour l'optimisation de la mise au point de ligands et de la chimie des proteines a variables multiples

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ELOFSSON M ET AL: "Tightening the nuts and bolts", TRENDS IN BIOTECHNOLOGY, vol. 16, no. 4, April 1998 (1998-04-01), pages 147-149, XP004112297 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6475726B1 (en) 1998-01-09 2002-11-05 Cubist Pharmaceuticals, Inc. Method for identifying validated target and assay combinations for drug development
US6846625B1 (en) 1998-01-09 2005-01-25 Cubist Pharmaceuticals, Inc. Method for identifying validated target and assay combination for drug development
US6720139B1 (en) 1999-01-27 2004-04-13 Elitra Pharmaceuticals, Inc. Genes identified as required for proliferation in Escherichia coli
US6589738B1 (en) 1999-11-09 2003-07-08 Elitra Pharmaceuticals, Inc. Genes essential for microbial proliferation and antisense thereto
WO2001036980A2 (fr) * 1999-11-18 2001-05-25 Melacure Therapeutics Ab Procede d'identification du site actif dans une cible biologique
WO2001036980A3 (fr) * 1999-11-18 2002-03-14 Melacure Therapeutics Ab Procede d'identification du site actif dans une cible biologique
US6586190B2 (en) 2000-08-18 2003-07-01 Syngenta Participations Ag Parallel high throughput method and kit
EP1209610A1 (fr) * 2000-11-28 2002-05-29 Valentin Capital Management Dispositif et procédé de détermination de données d'affinité entre une cible et un ligand
WO2002044990A2 (fr) * 2000-11-28 2002-06-06 Valentin Capital Management Appareil et procede permettant de determiner des donnees d'affinite entre une cible et un ligand
WO2002044990A3 (fr) * 2000-11-28 2005-04-21 Valentin Capital Man Appareil et procede permettant de determiner des donnees d'affinite entre une cible et un ligand

Also Published As

Publication number Publication date
CA2298629A1 (fr) 1999-02-11
EP1002235A1 (fr) 2000-05-24
AU8678198A (en) 1999-02-22

Similar Documents

Publication Publication Date Title
Wiedemann et al. Quantification of PDZ domain specificity, prediction of ligand affinity and rational design of super-binding peptides
Duffner et al. A pipeline for ligand discovery using small-molecule microarrays
Yap et al. Calmodulin target database
Lepre Library design for NMR-based screening
US20040132100A1 (en) Pharmacophore recombination for the identification of small molecule drug lead compounds
Layton et al. Large-scale, quantitative protein assays on a high-throughput DNA sequencing chip
US20120021967A1 (en) Synthetic antibodies
Hargrove Small molecule–RNA targeting: starting with the fundamentals
JP2007137887A (ja) 独立下部構造分析を実行するためのコンピュータ・システムの操作方法
EP1073891B1 (fr) Methode permettant de prevoir la capacite de composes de moduler l'activite biologique de recepteurs
US20120065123A1 (en) Synthetic Antibodies
Hubbard Can drugs be designed?
US20190194358A1 (en) Synthetic Antibodies
Li et al. Chemical proteomic profiling of bromodomains enables the wide-spectrum evaluation of bromodomain inhibitors in living cells
JP2009075116A (ja) タンパク質−タンパク質相互作用の解析および標識の方法
Peng Evaluation of proteomic strategies for analyzing ubiquitinated proteins
Kapoor et al. Exploring the specificity pockets of two homologous SH3 domains using structure-based, split-pool synthesis and affinity-based selection
JP2004523726A (ja) コンフォメーションに敏感な結合ペプチドの同定方法およびその利用
EP1002235A1 (fr) Procede d'identification et de developpement de chefs de file de medicaments
Riching et al. Translating PROTAC chemical series optimization into functional outcomes underlying BRD7 and BRD9 protein degradation
Kauvar et al. Protein affinity map of chemical space
US20020172967A1 (en) Identification of non-covalent complexes by mass spectrometry
Zwillinger et al. Isotope ratio encoding of sequence-defined oligomers
WO2000065421A2 (fr) Representation de la selectivite de recepteurs
Jenmalm Jensen et al. Affinity‐Based Chemoproteomics for Target Identification

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2298629

Country of ref document: CA

Ref country code: CA

Ref document number: 2298629

Kind code of ref document: A

Format of ref document f/p: F

NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 1998938201

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09485007

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1998938201

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 1998938201

Country of ref document: EP