WO2003044219A1 - Procede de generation souple de diverses compositions chimiques de reaction - Google Patents

Procede de generation souple de diverses compositions chimiques de reaction Download PDF

Info

Publication number
WO2003044219A1
WO2003044219A1 PCT/US2002/037190 US0237190W WO03044219A1 WO 2003044219 A1 WO2003044219 A1 WO 2003044219A1 US 0237190 W US0237190 W US 0237190W WO 03044219 A1 WO03044219 A1 WO 03044219A1
Authority
WO
WIPO (PCT)
Prior art keywords
products
reaction
reactions
identifying
chemical
Prior art date
Application number
PCT/US2002/037190
Other languages
English (en)
Inventor
Barry A. Bunin
Timothy S. Powers
Guillermo Antonio Morales
Stephan C. Schurer
Steve M. Muskal
Oliver L. Saunders
Original Assignee
Libraria, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Libraria, Inc. filed Critical Libraria, Inc.
Priority to AU2002366093A priority Critical patent/AU2002366093A1/en
Publication of WO2003044219A1 publication Critical patent/WO2003044219A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/10Analysis or design of chemical reactions, syntheses or processes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Definitions

  • the invention relates to database technology. More specifically, the invention relates to software tools for generating diverse reaction sets from particular precursors, classes of precursors, or reaction chemistries.
  • the modern organic chemist has numerous software tools at her disposal. These include tools for predicting activity from chemical structure (termed "structure activity relationship” tools or SAR tools), tools for ordering commercially available reagents, and databases for storing vast quantities of chemical information including links to literature. Many of these tools have appeared recently in order to take advantage of new electronic infrastructure and electronic commerce. Others have appeared because the computational power now exists to solve previously intractable problems (or reasonably approximate a solution to these problems).
  • Some of the most widely used on-line databases provide electronically indexed data that previously appeared in textual research tools on library shelves (e.g., Beilstein, Chemical Abstracts, and the like). While such databases include various modern electronic features, they are at their heart collections of traditional chemical information reformatted for electronic databases. These existing databases are essentially lists indexing the literature with information to help the chemist decide if she wishes to obtain a particular article. As such they are not optimized to facilitate the research of a modern chemist.
  • One set of problems that cannot be easily addressed using current chemical software pertains to constraints on the vast range of reaction conditions available to a chemist. Another important issue is access to detailed information on reactivity and chemical pathways in a database format, especially for high-throughput chemistry.
  • the present invention addresses these needs by providing improved software tools that employ databases and associated systems for storing, manipulating, and investigating chemical information organized by reaction chemistries and or transformations. At least some reaction chemistries are organized as belonging to particular reaction protocols within the database. Each reaction represents a discrete step in a multi-step protocol for making a final product from a starting reactant.
  • the present invention uses chemical and reaction databases having information stored such that individual molecules, reactions, and protocols are tagged according to many criteria. Using these tags, software of the invention can not only logically retrieve information, but also use logic to extrapolate from known chemistries in order to provide the user with valuable synthetic information.
  • the invention provides software tools that help automate suggestions of and generation of diverse reaction sets for particular precursors, classes of precursors, or different reaction chemistries. This is accomplished by automatically generating a group of reaction chemistries for a particular precursor or class of precursors of relevance for diverse problems of commercial and scientific interest. Some of the reactions and/or products may be produced without reliance on reactions and products reported in available references, for example the chemical information databases mentioned above.
  • one aspect of the invention is a method of identifying, from a database, a collection of chemical compounds that can be synthesized from a common reactant or class of reactants.
  • Such methods may be characterized by the following sequence: (a) automatically generating a list of reactions that the class to which the reactant or reactants belong undergoes; and (b) for each reaction in the list, identifying a product or class of products, which products comprise the collection of chemical compounds.
  • at least some of the products are identified in or inferred from the database, which contains reactions reported in references, and at least some of the products are identified without reliance on reactions reported in references.
  • such methods may further include: (c) treating at least one of the products as a new reactant; (d) automatically generating a new list of reactions that the class to which the new reactant belongs; and (e) for each reaction in the new list, identifying a new product or class of products produced by the reaction.
  • the new products are added to the collection of compounds.
  • such methods may include filtering the products identified at (b) based on one or more of the following criteria: predicted activity and specified reaction conditions.
  • Another aspect of the invention is a method of identifying a collection of chemical compounds that can be synthesized from a common reactant or class of reactants.
  • Such methods can be characterized by the following sequence: (a) automatically generating a list of reactions that the class to which the reactant or reactants belong undergoes; (b) for each reaction in the list, identifying a product or class of products; and (c) filtering the products identified in (b) to yield a subset of the compounds identified (b), which subset comprises the collection of chemical compounds.
  • the filtering is based on one or more of the following criteria: predicted activity, number of reaction steps required to produce the product, specified reaction conditions, and relevance of the final products enumerated.
  • at least some of the products are identified in a database containing reactions reported in references and at least some of the products are identified without reliance on reactions reported in references and generating the list of reactions is accomplished, at least in part, without reliance on reactions reported in references.
  • such methods may further include: (d) treating at least one of the compounds in the subset as a new reactant; (e) automatically generating a new list of reactions that the class to which the new reactant belongs; and (f) for each reaction in the new list, identifying a new product or class of products produced by the reaction.
  • the new products are added to the collection of compounds.
  • Another aspect of the invention pertains to methods of identifying biologically active compounds based on the chemical structure of a known biologically active compound.
  • the method may be characterized by the following sequence: (a) retrosynthetically decomposing the known biologically active compound into two or more building blocks; (b) identifying multiple chemical reactions that a first one of the building blocks can undergo; (c) identifying, with the aid of a chemical database, products of said multiple chemical reactions; (d) identifying a potentially biologically active compound by linking at least one of said products with one or more of the other building blocks, whether transformed or not; and (e) conducting a computational screen to predict whether the potentially biologically active compound is likely to be biologically active.
  • the screen may be a pharmacophore screen or a docking algorithm.
  • Yet another aspect of the invention pertains to computer program products including machine-readable media on which are provided program instructions for implementing the methods described above, in whole or in part.
  • the program instractions are provided as code for performing certain method operations. Any of the methods of this invention may be represented, in whole or in part, as program instractions that can be provided on such machine-readable media.
  • the invention pertains to various combinations and arrangements of data generated and/or used as described herein.
  • Figure 1A is a block diagram depicting how logic of the invention can generate a novel reaction sequence.
  • Figure IB is a synthetic scheme depicting how logic of the invention can provide the user with reaction sequences for maximizing diversity.
  • Figure 2 is a process flow diagram depicting a general methodology for using a database of chemical/biological information to facilitate design of chemical structures having a desired biological activity in accordance with an embodiment of this invention.
  • Figure 3 A is a structural depiction of the GleevecTM molecule.
  • Figures 3B-3C depict, structurally, a suitable retrosynthetic analysis performed on the GleevecTM molecule.
  • FIGS. 3D-3E depict and exemplary GleevecTM building block transformation in accordance with an example of this invention.
  • Figure 3F depicts, for the sake of illustration, various synthetic pathways that are available to a generic aldehyde constituent.
  • Figure 3G depicts an example GleevecTM analog generated from transformed building blocks in accordance with an embodiment of this invention.
  • Figure 4A shows a sample screen shot from a database application used with an example of this example.
  • Figure 4B depicts a process employed to generate a pharmacophore for use in an example of this invention.
  • Figure 5A presents an example subset of possible chemistry patterns that illustrate the flexible synthetic analysis that can be performed by the present invention.
  • Figure 5B shows how the invention may used to replace the amide bond in Statine inhibitors.
  • Figure 6 illustrates, in simple block format, a typical computer system that, when appropriately configured or designed, can serve as a computational apparatus of this invention
  • reaction - A “reaction” is a fundamental chemical transformation of one or more reactants to one or more products.
  • reaction and “reaction step” are synonymous. Examples of reactions include condensation reactions (e.g., esterification, amidation, imine formation), carbon-carbon couplings (e.g., Suzuki, Wittig, Heck), and reduction reactions (e.g., nitro reductions, hydrogenations, reductive amination). Multiple reactions may be concatenated to produce a "protocol” or synthetic pathway to a final product.
  • a reaction may require certain reaction conditions. Such conditions may include a reaction temperature, a reaction time, etc. Specialized laboratory instrumentation may be required to provide the needed reaction conditions. Commonly, a reaction will employ one or more reagents and/or solvents.
  • Protocol - A protocol is a group of chemical reactions, typically performed sequentially, to carry out an encompassing transformation from a starting reactant or reactants to a final product or products. Such sequential reactions may be carried out in parallel and converge at some point to a product or products.
  • reaction scheme and “synthetic pathway” are often used in the art to mean “protocol,” as that term is used herein.
  • a protocol may include not only its constituent reactions, but also any associated reaction conditions used to carry out each of the reactions or reaction steps.
  • An example of a multi-step protocol is a synthesis of a particular tripeptide using sequential two reactions. The first reaction couples a first amino acid to a second amino acid to form a dipeptide. The second reaction couples the dipeptide to a third amino acid to form the tripeptide.
  • Reference - A reference is document or other medium containing pertinent information.
  • the pertinent information is usually chemical information. This concept includes traditional published literature articles, published and unpublished patent documents (patent applications and issued patents), unpublished experimental results, books, monographs, abstracts, and the like.
  • Reactant - This term encompasses the compounds used in any particular reaction that are transformed or converted by the reaction to a product.
  • reactants are those molecules that are modified in some way to become part of or are incorporated into the product molecule or molecules of a reaction, and thus are not “spectators" in the reaction.
  • Reagent - Reagents are those compounds used in a reaction that ultimately do not end up as part of the product molecules. Such molecules include solvents, catalysts, and other reaction mediators. Reagents are, overall, spectators in the reaction. Although they may be intimately involved with the reactants during the reaction, generally neither they nor subsets of their molecular structure become incorporated into the product's molecular structure.
  • solution phase reactions refer to homogeneous reactions; however, some solution phase reactions referred to are heterogeneous reactions.
  • a reagent used in a solution phase reaction may mediate the reaction while immobilized on a solid phase support or may itself exist in the solid phase.
  • an isocyanate scavenger bound to an inert polymer resin may be used to trap excess amine in a solution phase reaction or a catalyst solid may itself remain as a solid in a solution phase reaction.
  • solid phase reactions generally one reactant is immobilized on a solid support medium, and other reactants and "reagents" used in the reaction are in the liquid phase.
  • a "scaffold" or "template” molecule is immobilized on a solid support.
  • a reaction with this molecule is then performed in a particular solvent (reagent) with one or more reactants, parts of which become integrated into the scaffold or template molecule to become part of a product molecule, itself still bound to the solid support.
  • the product molecule is then freed from the solid support using a "cleaving reagent," an intramolecular rearrangement, or other technique such as irradiative cleavage of a linker-product bond.
  • Solvent - A solvent is generally the liquid medium in which a reaction takes place.
  • a solvent is the liquid medium in which solid phase supported reactants are suspended and reagents are dissolved.
  • a solvent is the liquid medium in which reactants are dissolved, but solid phase reagents are suspended.
  • a compound serves multiple roles, for example as both solvent and reactant or both solvent and reagent.
  • Reaction condition - Reaction condition refers to parameters under which a reaction takes place, for example, time, temperature, pressure, radiation, solvents or reagents used.
  • Laboratory instrumentation or equipment refer to the hardware used to carry out reactions; i.e. for traditional synthesis any glassware, heating devices, pressure vessels etc. and for combinatorial or parallel synthesis any hardware used to perform multiple reactions in parallel.
  • this term is used to define the minimal amount of hardware necessary to carry out a reaction; i.e., hardware that does not include peripheral devices or equipment not crucial to carry out a reaction. In this context, the term might not encompass a particular robotic device, for example.
  • Product - Products are molecules that result from a reaction of reactants.
  • a chemist using a set of reactants and using associated procedures converts the reactants into a product or set of products.
  • Ontology - Ontology in this application refers to the logical linkage of categories used to classify a type of chemical information such as a reference, a protocol, or a reaction. These categories are often arranged in levels of a hierarchy. For example, a reference may be categorized at a high level as pertaining to either solid-phase chemistry or solution-phase chemistry. Each of these categories may be further categorized as to the reaction type, for example condensation reaction, carbon- carbon bond forming reaction, substitution reaction, and the like. Still further, each of these categories is further categorized more definitively, for example a condensation reaction category may contain amide-forming condensations, ester-forming condensations, imine-forming condensations, and the like. Each of references, protocols, and reactions possess hierarchical components of a given ontology.
  • a compound or reaction is represented in a generic format. That is, for a particular compound reaction genus there may be multiple species.
  • a generic compound (reactant or product) is represented as a core structure having one or more substituents represented generically, as for example an "R- group.” For a given R-group, there are a number of particular chemical moieties that define distinct species of the generic compound.
  • a generic compound is "enumerated” by displaying or otherwise identifying the species comprising the genus. Each species represents a specific compound containing the core structure and a specific chemical moiety at the location of each R-group. Stated another way, enumeration refers to electronically reconstructing representations of the actual structures (reactants and products) for each species reaction.
  • a reaction employing 5 amines and 5 carboxylic acids under dehydrating conditions would generate 25 amides through enumeration.
  • the concept of enumeration can extend to groups of generic chemical compounds, as one might encounter in a generic representation of a reaction.
  • a reference identifies 100 reactions that were carried out, each reaction being a species of a generic reaction. In one format, the reference may depict only the generic reaction. When enumerated, all 100 specific reactions are depicted. In some embodiments of this invention, it will be convenient to separately store in electronic format the actual R-group moieties used.
  • Chemical Information includes all information in a reference, database, or other medium that pertains to a chemical compound, a chemical reaction, collections of compounds or reactions, and the like.
  • the chemical information may be provided in various textual, numerical, and/or structural formats. Often the information will include pertinent annotations such as reaction conditions, laboratory instrumentation, solvents, reagents, details about a reference, etc.
  • a filter refers to a constraint applied to a search in order to narrow or more fully define the search.
  • a filter can be any number of search constraints that are added to a search query or applied to a set of results from such a query to further narrow or define the result in terms of the particular filter or filters applied.
  • Filters used in embodiments of the invention can be applied at the reference, protocol, and/or reaction level as well as any fields that are contained in records of databases of the invention. Data can be searched and filtered in many combinations of ways.
  • filters include, but are not limited to, the following: reaction condition, reaction type, library size, number of steps in a protocol, yield of reaction, molecular weight, reactant type, logP, ADMET/PK, Lipinski's rule of five, QSAR, pharmacophore, docking, binding, structure, substructure, reliability ratings, biological activity, reactivity, starting material, product, author, journal, keywords, vendor, leading references, and the like.
  • Database A set of related files or records that is created and managed by a database management system.
  • the records may include text, images, sound, video, etc.
  • a record is a group of related fields that store data about a subject or activity.
  • the invention provides databases of chemical information organized by chemical synthesis methods. Generally, this means organization by reaction type. Preferably, though not necessarily, these databases are relational databases. In the databases of this invention, chemical reactions are classified according to type, reaction information, specific aspects of procedures and methods used in the reaction, product yield, reliability rating, and chemical reagents are classified according to functional group and compatible synthetic methods. In some examples, specific chemical reaction/process information is used as primary or foreign keys in relational database tables. In fact, the primary key of some database tables may be a combination of reaction type (e.g., reductive animation) and either a reactant or a product. Still further, the database keys may comprise particular reaction conditions (e.g.
  • reaction types in association with reaction type.
  • chemical information can be organized by biological activity such as Ki or IC50 values for interactions with particular biological targets.
  • reaction types, biological activity, and/or reaction conditions may be provided as attributes or columns of individual database records.
  • Substructures are fragments that define a core stractural motif for which the user
  • the query specifies, an aldehyde fragment added to an amine fragment to yield a product.
  • the chemical reaction is a reductive animation.
  • a query of this type would return every reaction in the database (sometimes several hundred or more) that conformed to the substructure fragments drawn. If a shorter list is desired, the user would have to submit a more constrained query in which the structures are more fully defined. Having used a more constrained structural query, the user is left with a more manageable list of reactions. However, this list describes only those particular literature reactions that have been loaded into the database. Thus, the user may be missing potentially valuable reaction data.
  • this invention can provide not only the aforementioned literature example reaction lists but also can generate examples based on literature precedent. This provides the user with variations (diversity) that perhaps were not considered, even if the user is an experienced chemist.
  • precursors/building blocks from the products of any other reaction in the database or other reactions added to the database. This creates a relational database based on actual known chemistries, as opposed to a static hierarchal database.
  • precursors/building blocks can be from files of molecules that can be important from any other sources (for example, all secondary amines from the Available Chemicals Directory).
  • precursors/building blocks can be drawn out by hand and imported into any reaction limited only by the imagination of the chemist working with the invention.
  • the system can then evolve through enumeration to explore synthetically feasible molecules coupled to various customizable filters to rapidly identify desired functional molecules.
  • Methods of the invention are embodied in software tools for generating diverse reaction sets, based on inputs from a user.
  • the input may be a particular compound or class of compounds, or a particular reaction or class of reactions, for example.
  • the diverse reaction sets can provide a plurality of chemical compounds that are very likely to be synthetically accessible.
  • the invention uses databases in which complete synthetic pathways (sometimes referred to as reaction schemes) as represented in the literature, are broken into the individual reactions that comprise the larger pathway. These individual reactions are separately stored and intimately indexed in databases. This is unlike the situation with conventional chemical databases, where only complete syntheses, as reported in the literature, populate the databases.
  • the present invention provides a more granular representation of chemical reactions. In this manner, the databases of this invention facilitate mixing and matching of individual chemical reaction steps to create new synthetic pathway.
  • the logic of the invention facilitates generation of novel synthesis schemes (and thus distinct molecule products) from literature precedents.
  • Figure 1A is a block diagram 101 depicting how the logic of the invention may use literature precedent to generate a novel reaction sequence.
  • Reaction sequences 103 and 105 are two examples of synthesis procedures taken from the literature and characterized in the database of the invention by the discrete reaction steps of which each consists. Each step is characterized by a unique set of conditions used to carry out that step.
  • Sequence 103 consists of the individual steps 107 - 115 to give products 117.
  • sequence 105 consists of the individual steps 119 - 129 to give products 131.
  • reaction sequences 103 and 105 may be provided with individual steps (reactions) of reaction sequences 103 and 105.
  • reaction sequences are not provided in discrete steps, but rather with a reactant, a product, and a conglomeration of text over an arrow describing two or more steps and associated process conditions. Since sequences in the databases of the invention are characterized by discrete steps and the steps are classified according to reaction type, the logic of the invention can use the steps to extrapolate from known sequences to generate novel sequences. As depicted by the dashed arrows, the logic of the invention can generate for example a new sequence 133, consisting of steps
  • This new sequence is generated using a "mix and match" algorithm, providing novel products 135.
  • Many novel sequences can be generated from the many thousands of known chemical conversions in the literature.
  • a user can further massage and refine chemical information provided by the invention by application of filters, for example by specific process conditions, reliability ratings, pharmacokinetic parameters, and others.
  • FIG. 1B depicts a system of synthetic schemes 137.
  • the logic employed by this invention provides various reaction schemes to users automatically. Thus, the user gains access to numerous reaction sequences for maximizing diversity.
  • a generic aldehyde 138 is input as a starting reaction class.
  • the logic of the invention generates suitable synthetic pathways for reaction of 138 to make products.
  • aldehyde 138 is reacted with amine 139 to give imine 141. This is but one reaction branch from aldehyde 138. As shown, however, multiple reactions may be generated from the starting aldehyde 138 to yield diverse products 149.
  • Each of these products (149 and 141) is one reaction level removed from aldehyde 138. Some or all of these compounds can be further reacted to produce even more products.
  • imine product 141 is now used as a starting reagent for chemical reactions suitable to imines, 143. Further, imine 141 can be reduced to amine 145. Amine 145 represents a set of products two steps from aldehyde 138. Likewise, amine 145 is reacted further in chemical reactions suitable to amines, see 147.
  • aldehyde 138 represents a class of aldehydes; that is, each member of that class will produce a unique product for each reaction pathway to which it is exposed. Moreover, all products resulting from and reactants used with 138 also represent classes of compounds.
  • Figure 2 depicts a computational method and associated data arrangement for identifying a relationship between biological activity and one or more chemical features by using a database of this invention.
  • a process flow of this invention begins at a process block 203 with provision of a database containing chemical and biological information organized by generic chemical transformations.
  • chemical transformations may involve chemical reactions and/or chemical protocols.
  • chemical transformations include reactants, products, and sometimes intermediates.
  • at least some compounds of the transformations are represented with generic Markush R groups.
  • the database in question provides information keyed to specific chemical compounds - as well as generic chemical transformations. Note that within the database many of the specific chemical compounds may be associated with one or more of the chemical transformations.
  • the database associates at least some of the specific chemical compounds with biological information. Such information may take the form of an activity value representing interaction with one or more biological molecules. Still further, the database may associate chemical compounds with particular chemotypes or substructures contained therein.
  • the computation process performs a retrosynthetic analysis of a particular biologically active compound identified by the user. This analysis may be performed entirely by computation or together with a user's input.
  • the biologically active compound is identified in the database by one or more reported synthetic pathways.
  • the computational method can identify the reactants used in these pathways as part of the retrosynthetic analysis.
  • the reactants are treated as components or constituents of the starting compound and they are available for further flexible analysis and assembly into analogs of the starting compound.
  • Other mechanisms for identifying constituent compounds from the starting compound will be known to those of skill in the art. Some such mechanisms involve computationally parsing a structural representation of the compound, such as a mark up language representation.
  • each of the various constituent compounds identified by retrosynthetic analysis is optionally displayed for the user.
  • the display may present the compounds in fully elaborated format or in Markush format.
  • the system displays the compounds via a user interface.
  • the control logic identifies multiple available synthetic pathways for at least one of the constituents identified by the retrosynthetic analysis. This operation may be performed with the aid of user input to focus on certain synthetic pathways or automatic selection in computational apparatus. Automatic selection may be based on criteria such as ease of reaction, available reagents or reaction conditions, stability of products, etc. The above discussion presents examples of how a single constituent or class of constituents can undergo multiple reactions.
  • the computational system chemically links some or all of the various transformed building blocks (generated at 209) to create molecules comparable in size and/or overall layout to the original biologically active compound.
  • the chemical linkages should be chosen to present certain features of the original molecule. For example, they may be provided so as to present certain moieties, or more generically functional types, at orientations comparable those found in the original compound.
  • the linkage may be conducted in a manner that presents a hydrophobic region at a first location, a hydrogen bond donor at a second location separated from the first location by a certain number of angstroms, and a nitrogen-containing aromatic group at a third location separated by the first and second locations by specified angular ranges and distances.
  • the transformed building blocks can be linked arbitrarily and then filtered by a pharmacophore filter, for example, to remove those products that are sufficiently dissimilar to the original compound.
  • computational screens are used to identify products generated at 211 that likely result in a biological activity of interest.
  • Various algorithms with various thresholds may be employed to screen the computationally derived products of interest.
  • the computational system may predict binding values (e.g., Ki) for targets of interest. These values may be predicted by a pharmacophore matching calculations, for example.
  • binding values are predicted based upon calculations with a docking algorithm.
  • ADME screens are used to identify products generated at 211 that likely result in a biological activity of interest.
  • the biological activity values related to an interaction with at least one biological molecule such as a receptor or enzyme may represent binding with a target, a class of targets, a binding site on a target, or binding sites of a class of targets. Examples of such values include IC50, Ki, Km, and mean days of survival.
  • a final operation of interest in the exemplary process flow involves selecting one or more compounds for further investigation. These compounds are selected based upon the relationship identified or derived at block 213. They may be screened in vitro, and, if appropriate, investigated further as pharmaceutical candidates. In a related approach, the selected compounds comprise at least part of a primary or secondary library of chemical compounds.
  • the following example illustrates how one can employ a database and method of this invention.
  • the goal of this example is to identify new therapeutic equivalents to the leukemia drug, GleevecTM (imatinib mesylate) marketed by Novartis Pharmaceuticals Corporation of East Hanover, New Jersey.
  • the chemical structure of GleevecTM is used as a starting point. This structure is shown in Figure 3A.
  • the initial goal is to perform a retrosynthetic analysis that identifies molecular components that might serve as starting points to develop analogs that could have activity similar to GleevecTM. Chemical space is explored by proposing various acceptable reactions of the GleevecTM components. The resulting reaction products are enumerated and screened computationally. But initially, there must be some mechanism for identifying constituent parts of GleevecTM.
  • a markup language such as SMILES (Daylight Chemical Information Systems, Inc.) to parse molecule into its component parts.
  • SMILES Daylight Chemical Information Systems, Inc.
  • the parsing follows a synthesis route or multiple synthesis routes reported in the literature or otherwise known and appearing in a database of this invention. Because the database stores information in the form of synthetic pathways, the reactants employed to synthesize GleevecTM are also stored in the database. These are identified by appropriate database queries and serve as the results of a retrosynthetic analysis.
  • Figures 3B-3C depict, structurally, a suitable retrosynthetic analysis performed on the GleevecTM molecule.
  • Figure 3D shows each of four GleevecTM building blocks that can be elaborated using various synthetic pathways.
  • Figure 3E shows specific elaboration of one of these building blocks (a piperazine) to produce a number of transformed building blocks.
  • the software may select all or a subset of the synthetic pathways available to a starting component depending upon whether chemical or biological filters are set.
  • Figure 3F depicts, for the sake of illustration, the various synthetic pathways that are available to a generic aldehyde constituent.
  • the software (with or without aid from a user) must start with either a starting material building block or an intermediate that could be used as a precursor in a reaction.
  • the software can then consider various specific reactions to specific products. Every instance of that occurrence in the knowledgebase/database or an instance added by an end user can be used as a potential reaction transformation.
  • Another approach involves use of a Markush representation of the building block/precursors. In this case, confidence in the likelihood of success in the reaction can be increased by using the database to look for the patterns of the building blocks that do (and those that do not) successfully react.
  • the software (with or without aid from a user) can evaluate a number of different, complementary reactions that give rise to diverse products all originating from the same starting material or intermediate. Of course, these are products found in the knowledgebase/database with their associate synthetic reactions.
  • the software optionally searches the chemical database for molecular analogs matching the transformed building blocks identified by computationally applying various synthetic routes as previously discussed.
  • This operation may be implemented as an enumeration process as described above. That is, the generic moieties "R” are converted to specific moieties (e.g., -NH(Et)) to provide a list of specific molecules.
  • the transformed building blocks are then linked to one another at positions selected to produce analogs of the original GleevecTM molecule. See Figure 3G for an example. Because the software contains a stractural representation of the original GleevecTM molecule, it can favor or require assembly of the enumerated components designed to preserve, to some degree, the three dimensional arrangement of moieties in the original molecule.
  • the software can identify both "exact hits" for synthesis and "related hits" that would have a reasonable probability of working.
  • the software does a tanimoto similarity search of enumerated products (based on pharmacophore matching) and prioritizes/organizes them according to similar 3-D structures. See PCT application WO 00/25106, published April 4, 2000, which is incorporated herein by reference for all purposes.
  • the technology can use a variety of general reactions to build similar novel structures without regard to structural matching.
  • the software uses algorithms to prioritize pathways for analog construction. Examples of other such algorithms include algorithms based on reported reaction yield, reaction condition constraints (time, temperature, etc.), stereochemical constraints, and the like. Finally, the software may apply relevant biological filters to screen out therapeutically compromised molecules for later synthesis.
  • the chemical database employed in this example included data from numerous literature sources. The data was captured and abstracted into the database. Specifically, the data included reaction pathways, structures, and biological activities. In many cases, the data associated with a particular organic compound included synthesis information and/or biological information (e.g. IC50 values).
  • Figure 4A shows a sample screen shot from a database application used with this example.
  • the data used in this example contained information on approximately 2,060,000compounds, about 1,000,000 of where provided in a database of this invention.
  • Three separate scaffolds were chosen based on analysis of the GleevecTM molecule. With the aid of the database, researchers identified various reaction chemistries available to those scaffolds.
  • the database was then used to enumerate 60,000 novel analogs as described above. Each of these compounds was screened using the electronic pharmacophore screen described above. The top 120 compounds identified from this screening were then selected for actual synthesis and wetlab bioassay. From these compounds, researchers identified a novel lead series with potency approaching GleevecTM. Specifically, ten of the compounds showed ⁇ 10DM activity, and six had single-digit DM activity.
  • Compounds discovered by this method are the subject of a pending provisional patent application: US provisional patent application number 60/400,828, filed August 2, 2002 by Powers et al., and incorporated herein by reference.
  • FIG. 5A illustrates how the invention uses databases (organized as described above) for example to elaborate a generic group of building blocks around the common amine functional group. After exploring the chemical space accessible from the amine (which happens to be a disconnect in the retrosynthetic analysis of the Plasmepsin II inhibitors), one can see how the technology would be applied to the rapid identification of novel pharmacophores and chemotypes.
  • the invention makes use of a vast array of chemistry that can be done with a particular functional group. After examining the synthetically accessible routes in the computer, one can do a similarity search to known inhibitors but with a library derived from multiple different chemistries. This then allows one to explore all possibilities with maximal efficiency and creativity to discover fundamentally new chemotypes. In one example, the goal is simply to replace the amide bond with a functional group that is different.
  • the peptidomimetic structures with amide bonds could potentially suffer from the traditional liability of peptides as therapeutic agents (low half-life in the bloodstream, hydrolysis by proteases, and rapid excretion).
  • a primary goal of peptidomimetic design has been to replace the amide with another functional group while retaining activity.
  • the utility of the invention within the context of replacing the amide bond of the statine-based inhibitor is shown below in Figure 5B.
  • the amide has been replaced by an amino-alcohol by opening up a judiciously selected epoxide (only the nearest match shown).
  • two reactions are used in conjunction: imine formation with an aldehyde followed by the addition of a Grignard reagent in the presence of benzotriazole.
  • the database must avoid doing "unrealistic" chemistry that will not work in the laboratory due to functional group incompatibility or steric and electronic factors. Because the system is constantly enriched with experimental data based on chemistries reported to work with particular sets of precursors throughout the literature, the information to avoid "unrealistic" chemistry is embedded in the dataset. With enough data one can develop a predictability rating for certain chemistries in novel contexts based on a careful statistical analysis and grouping the reagents and transformation into similar sets.
  • embodiments of the present invention employ various processes or methods involving data stored in or transferred through one or more computing devices.
  • Embodiments of the present invention also relate to an apparatus for performing these operations.
  • This apparatus may be specially constructed for the required purposes, or it may be a general-purpose device (e.g., a computer) selectively activated or reconfigured by a set of instractions (e.g., a computer program) and/or data structure provided to the apparatus.
  • the processes presented herein are not inherently related to any particular computer or other apparatus.
  • various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required method steps. A particular structure generally representing a variety of these machines will be described below.
  • embodiments of the present invention relate to computer readable media or computer program products that include program instractions and/or data (including data structures) for performing various computer-implemented operations.
  • Examples of computer-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media; semiconductor memory devices, and hardware devices that are specially configured to store and perform program instructions, such as readonly memory devices (ROM) and random access memory (RAM).
  • ROM readonly memory devices
  • RAM random access memory
  • the data and program instructions of this invention may also be embodied on a carrier wave or other transport medium (including electronic or optically conductive pathways).
  • Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. Further, the program instractions include machine code, source code and any other code that directly or indirectly controls operation of a computing machine in accordance with this invention. The code may specify input, output, calculations, conditionals, branches, iterative loops, etc.
  • FIG. 6 illustrates, in simple block format, a typical computer system that, when appropriately configured or designed, can serve as a computational apparatus of this invention.
  • the computer system 600 includes any number of processors 602 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 606 (typically a random access memory, or RAM), primary storage 604 (typically a read only memory, or ROM).
  • processors 602 may be of various types including microcontrollers and microprocessors such as programmable devices (e.g., CPLDs and FPGAs) and unprogrammable devices such as gate array ASICs or general purpose microprocessors.
  • primary storage 604 acts to transfer data and instractions uni-directionally to the CPU and primary storage 606 is used typically to transfer data and instractions in a bidirectional manner. Both of these primary storage devices may include any suitable computer-readable media such as those described above.
  • a mass storage device 608 is also coupled bi-directionally to CPU 602 and provides additional data storage capacity and may include any of the computer-readable media described above. Mass storage device 608 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk. It will be appreciated that the information retained within the mass storage device 608, may, in appropriate cases, be incorporated in standard fashion as part of primary storage 606 as virtual memory.
  • a specific mass storage device such as a CD-ROM 614 may also pass data uni- directionally to the CPU.
  • CPU 602 is also coupled to an interface 610 that connects to one or more input/output devices such as such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers.
  • CPU 602 optionally may be coupled to an external device such as a database or a computer or telecommunications network using an external connection as shown generally at 612. With such a connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the method steps described herein.
  • the computer system 600 is configured as a database and database management system for chemical information organized as described herein.
  • the chemical information may derive from various sources. Remote sources of chemical information may provide the information to system 600 via interface 612.
  • a memory device such as primary storage 606 or mass storage 608 stores the chemical information.
  • the memory may also store various routines and/or programs for analyzing and presenting the data.
  • Such programs/routines may include database management systems, search engines, filtering programs (including QSAR programs, docking programs, ADME property prediction programs, etc.) programs for populating databases with new chemical information, tools for improving the performance of databases, etc.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Library & Information Science (AREA)
  • Analytical Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne des outils de calcul permettant de suggérer/générer automatiquement divers ensembles de réactions pour des précurseurs particuliers, des classes de précurseurs ou différentes compositions chimiques de réaction. Ces outils permettent de générer automatiquement un groupe de compositions chimiques de réaction pour un précurseur ou une classe de précurseurs. Certaines de ces réactions et/ou certains de ces produits peuvent être produits indépendamment des réactions et des produits décrits dans les références disponibles.
PCT/US2002/037190 2001-11-20 2002-11-19 Procede de generation souple de diverses compositions chimiques de reaction WO2003044219A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002366093A AU2002366093A1 (en) 2001-11-20 2002-11-19 Method of flexibly generating diverse reaction chemistries

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US33223001P 2001-11-20 2001-11-20
US60/332,230 2001-11-20

Publications (1)

Publication Number Publication Date
WO2003044219A1 true WO2003044219A1 (fr) 2003-05-30

Family

ID=23297302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/037190 WO2003044219A1 (fr) 2001-11-20 2002-11-19 Procede de generation souple de diverses compositions chimiques de reaction

Country Status (2)

Country Link
AU (1) AU2002366093A1 (fr)
WO (1) WO2003044219A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2653529A4 (fr) * 2010-12-17 2018-01-24 Mitsubishi Chemical Corporation Équipement d'élaboration d'une voie de synthèse, procédé d'élaboration d'une voie de synthèse, programme d'élaboration d'une voie de synthèse, et procédés de production d'acide 3-hydroxypropionique, d'alcool crotonylique et de butadiène
WO2020007962A3 (fr) * 2018-07-04 2020-03-19 The University Court Of The University Of Glasgow Apprentissage automatique
JP2021513177A (ja) * 2018-01-30 2021-05-20 ピーター マドリッド 化学合成経路および方法の計算生成
CN117548053A (zh) * 2024-01-12 2024-02-13 广东林工工业装备有限公司 一种可调式反应釜及其控制方法和相关设备

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3853774A (en) * 1972-12-20 1974-12-10 Chevron Res Process for preparing oil-soluble basic magnesium salts
US4670817A (en) * 1984-10-01 1987-06-02 Venus Scientific Inc. Heat sink and interconnection arrangement for series connected power diodes
US4764475A (en) * 1986-12-01 1988-08-16 The University Of British Columbia Pancreas dependant immunoassay for determining subpopulations of monoclonal antibodies to somatostatin.
US4868313A (en) * 1985-06-21 1989-09-19 I.S.F. Societa Per Azioni A process for making pyrrolidone derivatives
US5723289A (en) * 1990-06-11 1998-03-03 Nexstar Pharmaceuticals, Inc. Parallel selex
US5766842A (en) * 1994-09-16 1998-06-16 Sepracor, Inc. In vitro method for predicting the evolutionary response of a protein to a drug targeted thereagainst
US5871697A (en) * 1995-10-24 1999-02-16 Curagen Corporation Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing
US6031228A (en) * 1997-03-14 2000-02-29 Abramson; Fred P. Device for continuous isotope ratio monitoring following fluorine based chemical reactions
US6051194A (en) * 1995-06-12 2000-04-18 California Institute Of Technology TI02-coated fiber optic cable reactor
US6061636A (en) * 1996-02-26 2000-05-09 Pharmacopeia, Inc. Technique for representing combinatorial chemistry libraries resulting from selective combination of synthons
US6127158A (en) * 1994-12-07 2000-10-03 President And Fellows Of Harvard College Ubiquitin conjugating enzymes
US6150488A (en) * 1998-12-30 2000-11-21 Wacker Silicones Corporation Process for preparing silanol-functional specifically branched organopolysiloxanes and products produced thereby

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3853774A (en) * 1972-12-20 1974-12-10 Chevron Res Process for preparing oil-soluble basic magnesium salts
US4670817A (en) * 1984-10-01 1987-06-02 Venus Scientific Inc. Heat sink and interconnection arrangement for series connected power diodes
US4868313A (en) * 1985-06-21 1989-09-19 I.S.F. Societa Per Azioni A process for making pyrrolidone derivatives
US4764475A (en) * 1986-12-01 1988-08-16 The University Of British Columbia Pancreas dependant immunoassay for determining subpopulations of monoclonal antibodies to somatostatin.
US5723289A (en) * 1990-06-11 1998-03-03 Nexstar Pharmaceuticals, Inc. Parallel selex
US5766842A (en) * 1994-09-16 1998-06-16 Sepracor, Inc. In vitro method for predicting the evolutionary response of a protein to a drug targeted thereagainst
US6127158A (en) * 1994-12-07 2000-10-03 President And Fellows Of Harvard College Ubiquitin conjugating enzymes
US6051194A (en) * 1995-06-12 2000-04-18 California Institute Of Technology TI02-coated fiber optic cable reactor
US5871697A (en) * 1995-10-24 1999-02-16 Curagen Corporation Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing
US6061636A (en) * 1996-02-26 2000-05-09 Pharmacopeia, Inc. Technique for representing combinatorial chemistry libraries resulting from selective combination of synthons
US6031228A (en) * 1997-03-14 2000-02-29 Abramson; Fred P. Device for continuous isotope ratio monitoring following fluorine based chemical reactions
US6150488A (en) * 1998-12-30 2000-11-21 Wacker Silicones Corporation Process for preparing silanol-functional specifically branched organopolysiloxanes and products produced thereby

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2653529A4 (fr) * 2010-12-17 2018-01-24 Mitsubishi Chemical Corporation Équipement d'élaboration d'une voie de synthèse, procédé d'élaboration d'une voie de synthèse, programme d'élaboration d'une voie de synthèse, et procédés de production d'acide 3-hydroxypropionique, d'alcool crotonylique et de butadiène
JP2021513177A (ja) * 2018-01-30 2021-05-20 ピーター マドリッド 化学合成経路および方法の計算生成
JP7357183B2 (ja) 2018-01-30 2023-10-06 エスアールアイ インターナショナル 化学合成経路および方法の計算生成
US11961595B2 (en) 2018-01-30 2024-04-16 Sri International Computational generation of chemical synthesis routes and methods
WO2020007962A3 (fr) * 2018-07-04 2020-03-19 The University Court Of The University Of Glasgow Apprentissage automatique
CN117548053A (zh) * 2024-01-12 2024-02-13 广东林工工业装备有限公司 一种可调式反应釜及其控制方法和相关设备
CN117548053B (zh) * 2024-01-12 2024-03-26 广东林工工业装备有限公司 一种可调式反应釜及其控制方法和相关设备

Also Published As

Publication number Publication date
AU2002366093A1 (en) 2003-06-10

Similar Documents

Publication Publication Date Title
US20020049548A1 (en) Chemistry resource database
Warr Representation of chemical structures
Xu et al. Using molecular equivalence numbers to visually explore structural features that distinguish chemical libraries
NL1028923C2 (nl) Werkwijze, toestel en software voor het extraheren van chemische gegevens.
Hu et al. Molecular scaffolds with high propensity to form multi-target activity cliffs
Hu et al. Pfizer Global Virtual Library (PGVL): a chemistry design tool powered by experimentally validated parallel synthesis information
US20050177280A1 (en) Methods and systems for discovery of chemical compounds and their syntheses
Hartenfeller et al. Probing the bioactivity-relevant chemical space of robust reactions and common molecular building blocks
Castañón et al. Design and development of a technology platform for DNA-encoded library production and affinity selection
Pottel et al. Customizable generation of synthetically accessible, local chemical subspaces
Vainio et al. Automated recycling of chemistry for virtual screening and library design
Su et al. Predicting the feasibility of copper (i)-catalyzed alkyne–azide cycloaddition reactions using a recurrent neural network with a self-attention mechanism
US6678619B2 (en) Method, system, and computer program product for encoding and building products of a virtual combinatorial library
Pikalyova et al. Chemical library space: definition and DNA-encoded library comparison study case
US20030087334A1 (en) Method of flexibly generating diverse reaction chemistries
US20020077757A1 (en) Chemistry resource database
WO2003044219A1 (fr) Procede de generation souple de diverses compositions chimiques de reaction
Smalter Hall et al. An overview of computational life science databases & exchange formats of relevance to chemical biology research
Villar et al. Design of chemical libraries for screening
CN111696623B (zh) 一种基于dna编码化合物库的实验室信息管理系统
Prasanna et al. Chemical compound navigator: A web‐based chem‐BLAST, chemical taxonomy‐based search engine for browsing compounds
Gruter et al. R&D Intensification in Polymer Catalyst and Product Development by Using High‐Throughput Experimentation and Simulation
Lebl Centrifugation based automated synthesis technologies
Mahjour et al. RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries
Leach et al. Representation and manipulation of 2D molecular structures

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP