CN1725222A - Combinatorial chemistry centralized repository design and optimization method - Google Patents

Combinatorial chemistry centralized repository design and optimization method Download PDF

Info

Publication number
CN1725222A
CN1725222A CN 200410053102 CN200410053102A CN1725222A CN 1725222 A CN1725222 A CN 1725222A CN 200410053102 CN200410053102 CN 200410053102 CN 200410053102 A CN200410053102 A CN 200410053102A CN 1725222 A CN1725222 A CN 1725222A
Authority
CN
China
Prior art keywords
molecular
design
centralized repository
reaction
combinatorial chemistry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200410053102
Other languages
Chinese (zh)
Other versions
CN100362519C (en
Inventor
罗小民
蒋华良
陈刚
沈建华
郑苏欣
张健
柳红
沈旭
陈凯先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Materia Medica of CAS
Original Assignee
Shanghai Institute of Materia Medica of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Materia Medica of CAS filed Critical Shanghai Institute of Materia Medica of CAS
Priority to CNB2004100531026A priority Critical patent/CN100362519C/en
Publication of CN1725222A publication Critical patent/CN1725222A/en
Application granted granted Critical
Publication of CN100362519C publication Critical patent/CN100362519C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of combinatorial chemistry centralized repository design and optimization method is the method that a kind of design, foundation and evaluation of virtual portfolio chemistry centralized repository software package are optimized.Comprise that the first step is basic engineering virtual portfolio preliminary screening storehouse with the target; Second step set up virtual portfolio preliminary screening storehouse and molecular conformation is optimized; The 3rd step was estimated virtual portfolio preliminary screening storehouse and optimized.The present invention adopts genetic algorithm to combine with combinational chemistry, can be used for design and research based on the brand-new medicine of big molecular target three-dimensional structure.Target had than higher compatibility diversity and quasi-medicated property.Have that speed is fast, expense is low, do not expend any experiment equipment, finish the design and the optimization of combinatorial chemistry centralized repository fully by the method for calculating simulation.

Description

Combinatorial chemistry centralized repository design and optimization method
Technical field
The present invention relates to a kind of design and optimization method of combinatorial chemistry centralized repository, be applicable to design, foundation, evaluation and optimization a kind of virtual portfolio chemistry centralized repository software package.
Background technology
Drug discovery is one and combines multi-disciplinary, complicated research and heuristic process, its key step can brief overview be from the gene to protein and protein to lead compound, again lead compound is evaluated and optimized such circulation repeatedly, the process of spiral escalation.In this process, computing machine becomes indispensable means already, and fields such as molecular simulation, combinatorial chemistry, structure-activity relationship and data mining all are the places that computing machine is exhibited one's skill to the full.Utilize computing machine to adopt various possible methods, search and the optimization lead compound, improve its physical and chemical index and biologically active, become one of committed step in the drug discovery process.
Combinatorial chemistry (combinatorial chemistry, combichem) combination by construction unit (building block) be connected, parallel, systematically synthetic have the compound of molecular diversity in a large number to form combinatorial chemical library (combinatorial library).Its target is as synthetic a large amount of, the various compounds that play with building blocks, to satisfy the demand of modern medicines research to the diversity compound.Combinatorial chemistry can trace back to the peptide solid phase synthesis technique of the sixties the earliest, utilizes polypeptide to synthesize reaction conditions (Merrifield) middle unanimity, reliable and use high molecular polymer solid phase carrier, thereby makes product and reagent be very easy to separate.Therefore the inventor has obtained Nobel chemistry Prize.After the mid-80, this technology develops rapidly, and not only available multiple solid phase synthesis technique synthesizes combinatorial chemical library, and also available liquid phase synthetic technology is synthesized combinatorial libraries.Content from combinatorial libraries, not only had peptide storehouse, peptide derive the storehouse, intend peptide storehouse, non-natural peptide storehouse, non-peptide oligomerization (oligomer) storehouse (as nuclear peptide, few urea, oligosaccharides storehouse etc.), organic molecule chemical association storehouse also occurs, greatly enriched existing compound library.And provide a large amount of candidate molecules structures for the discovery of lead compound.At present, combinatorial chemistry technique is increasingly mature, and it mainly comprises the combinatorial libraries strategy, reaches the method and the principle of design of purpose; Solid phase or liquid-phase synthesis process; Coding and apart mode; The core texture principle; Problem such as quality assurance and control.
Early stage in the combinatorial chemistry development, the structure of combinatorial libraries is randomness mostly, just chooses to randomness the basic building block unit, after a series of synthesizing, obtains containing the storehouse of a large amount of compounds.Because the method in this structure storehouse does not have purpose, all is great waste at human and material resources with on the time.In recent years, people began to take to make up the method for directed combinatorial libraries, obtained having the probability of required creating new drug with increase.When making up directed combinatorial libraries, just relate to combinatorial libraries design problem (referring to technology<1 formerly 〉, Dolle RE, Comprehensive survey of combinatorial library synthesis:2000, JComb Chem.2001.3 (6): 477-517).Another problem does not make full use of the macromolecular structural information of target organisms before being when the composite design chemical libraries, not strong to the specific aim of target molecules, but will find that the hope of lead compound places on the molecular diversity of combinatorial chemical library inherence.If seek lead compound by combinatorial chemistry is to look for a needle in a haystack, and then by blindly enlarging the combinatorial chemical library scale seeks lead compound, as seeking pin by enlarging the marine site scope of searching, its difficulty and workload are well imagined.The combinatorial chemistry researchist recognizes this problem, has proposed the centralized repository notion.They are pointed combinatorial chemical libraries, and its small scale because it is with strong points, finds that by centralized repository the chance of lead compound can't reduce.Produced a new problem thus again, promptly how on purpose to have designed centralized repository, particularly under the clear and definite situation of target.If can make full use of the three-dimensional structure information of biomacromolecule, then can dwindle the scale of combinatorial chemical library effectively, reduce required man power and material.
Current, be tie with the bioinformatics, become the passage mode of new drug research and exploitation in conjunction with the pattern of genomics, combinatorial chemistry and high flux screening.Along with molecular biological fast development, increasing gene is by clonal expression, the crystal structure of many pharmaceutically-active desirable target proteinses (as the enzyme that plays a crucial role in the pathogenic microorganism life process) is determined or built model by computer-aided design (CAD), and determined the binding site of substrate, become the basis of the directed exploitation of medicine.Now, combinatorial chemistry has spreaded all over the affirmation that is found to target from guide's thing, is optimized to drug research and the exploitation (R﹠amp that strengthens amalgamation compound collection from guide's thing; D) various aspects.Carry out the compound database search or from the beginning design new lead compound according to the character of binding site active pocket, method by combinatorial chemistry generates a large amount of compounds rapidly, and in conjunction with the high flux screening technology, rapid screening goes out highly active compound and is used for drug development.From the first half nineties 20th century, along with the high speed development of computing power, the work of area of computer aided combinatorial chemical library design becomes the focus of drug research gradually, mainly concentrates on the diversity or the otherness aspect of molecule.Its objective is that under the prerequisite of the molecular diversity that keeps combinatorial chemical library the scale of as far as possible dwindling the storehouse is to reduce the work of chemosynthesis and pharmacology test.In the later stage nineties, many researchers have proposed the algorithm that the drug design based on structure combines with combinatorial chemistry, attempt utilizing in the design of carrying out combinatorial chemical library the macromolecular three-dimensional structure information of target organisms.Wherein some has also carried out corresponding chemosynthesis and pharmacology test experiments, has obtained result preferably.Proved conclusively a clinical compound in 1999 in an optimization storehouse that contains 500 compounds, it is that Agouron drugmaker finds in based on the research of structure rhinovirus 3C protease inhibitors.The human genome examining order is finished substantially, for drug research provides a large amount of recruit's targets, thereby has become an approach of effectively finding medicine based on the route that the drug design of structure combines with combinatorial chemistry.
The diversity of analyzing combinatorial libraries is the significant process in the combinatorial chemistry.Combinatorial chemistry can with different construction units at one time in synthetic a large amount of various compound library of structure, (High Throughput Screening HTS) provides abundant source and successful foundation stone to the molecular structure diversity of combinatorial chemical library for the high flux screening of " looking for a needle in a haystack " formula just.Someone (Weininger) once estimated at 10 180Individual compound has the molecular structure of quasi-medicated property.Both made correct numeral to be 10 50(this numeral has comprised the compound that many patents relate to) also is an astonishing tremendous amount, more impossiblely removes to synthesize, screen so huge compound of number.Therefore, only emphasize that molecular diversity there is no need, have a diversity design that sets the goal and just have real value.
The purpose of estimating the combinatorial chemical library molecular diversity is to reduce cost, and improves the probability of finding new reactive compound.Hope is selected to buy or synthetic compound by PC Tools.Improve the molecular diversity of compound library in the screening active ingredients.Also wishing simultaneously can be by the diversity compound library of PC Tools design at the particular target three-dimensional structure.Be met safety, efficient, economic and activated compound thus.
Had at present many evaluation combinatorial chemical library molecular diversities commercial programs (software package) (consult technology<2 formerly 〉, Matter H, Potter T, Comparing 3D Pharmacophore Triplets and 2D Fingerprints forSelecting Diverse Compound Subsets, J.Chem.Inf.Comput.Sci.1999,39:1211-1225; Technology<3 formerly 〉, Jorgensen AM, Pedersen JT, Structural Diversity of Small MoleculeLibraries, J.Chem.Inf.Comput.Sci., 2001,41: 338-345; Technology<4 formerly 〉, Flower DR, On the Properties of Bit String-Based Measures of Chemical Similarity, J.Chem.Inf.Comput.Sci.1998,38:379-386).
Traditional diversity Software tool make the user can be from the colony of proper number choice structure diversity subclass.Diversity process software system (DiverseSolutions) if any people (by Robert professor Pearlman leader's group) exploitation not only is devoted to this problem, also is applicable to diversity relevant issues field well.This System Software is for replenishing other unit of management system (Tripos diversity Manager) product.It is used to adopt two kinds of algorithms of different to solve the diversity relevant issues: be ideally suited for simple diversity subclass based on distance algorithm and select; Comprised a multidimensional chemical space is divided into based on unit algorithm and be called as " unit " multidimensional body.These software systems provide two kinds of distinct methods of expressing chemical space: based on the higher-dimension expression way from " fingerprint " of fingerprint expression way or other resource.The molecule descriptor that the low-dimensional expression way is based on software systems thus or generates from other resource.
Molecular diversity research now mainly contains two big purpose and contents: the one, and the quantification of molecular diversity or similarity; The 2nd, select compound with maximum diversity or similarity.In calculating the process of molecular diversity, the step of most critical is the selection of the describing method of molecular structure and diversity computing method and determines.
The molecular structure describing method:
The structures shape of molecule its character, so no matter be that so-called Chemical Diversity or molecular diversity are all determined by molecular structure in itself.Contain 10 in order in the rational time, to calculate one 6The diversity of above compound combinatorial libraries, the least possible to the computing time of each structured descriptor.But, because calculating, the chemical constitution diversity do not have unified Definition, can change to some extent because of concrete condition is different.Therefore, should adopt the structure representation factor more than 3 kinds to come the description scheme diversity.Formerly in the technology, introduced the many kinds of structure representation factors, described the factor, characteristics tree, comparison diagram, the automatic dependent vector of two-dimentional pharmacophore, maximize the common substructure structrual description factors such as (MCS) as two-dimensional structure.The computing method of most of molecular diversity are all only considered the two-dimensional structure information of molecule on the information source.But, just must face the complex relationship between conformation (conformers), change (isomery) body (tautomer), the ionization state (ionization) in case consider three-dimensional nature.Someone (Brown and Martin) adopted the descriptor of multiple 2D and the descriptor of 3D to carry out molecular mimicry and diversity analysis respectively, and end product thinks that 2D is more suitable for distinguishing activity and non-active compound than 3D descriptor.In addition, a large amount of structures and physicochemical property descriptor can be applied in the diversity calculating, and the 2D structured descriptor is more effective by contrast.Certainly, some calculate complicated 3D descriptor, and for example site descriptor (PPP), field descriptor (Cramer) have also shown effect preferably under bigger calculation cost.
The two-dimensional structure descriptor comprises the molecular structure description, acceptor identification description and molecular topology description.
Wherein the molecular structure description mainly comprises four kinds of methods: systematic nomenclature (systematic nomenclature), fragment encoding method (fragmetic codes), for example the structured coding method, mix fingerprint technique; Linear representation (line notation) connects table method (connect tables).
Structured coding method (structural keys) is to search the frequency that molecular fragment occurs in the molecule from predefined molecular fragment structure storehouse, thereby draws the structrual description method of molecule.But when this method did not contain predefined molecular fragment in molecular structure, it was nonsensical to encode.
Mix fingerprint technique (hashed fingerprints) and adopt the index of predefined molecular path length to encode, each path is exactly a bits of coded (bit).Therefore whole molecule is described as a string binary character.
The discovery of lead compound and optimization are the emphasis and the difficult point of pharmaceutical chemistry research always.In recent years, the relevant new and high technology with other of biology has obtained rapid progress, has produced the high flux screening technology based on cell and molecular level, more makes the bottleneck into drug research of having synthesized of novel compound.Combinatorial chemistry is the important breakthrough of organic synthesis.For a long time, chemists be always to compound synthesize one by one, purifying and structure identify.Carry out bioactive mensuration then.But this method efficient is low, and speed is slow, makes the new drug development cost more and more higher, and the cycle is more and more longer.And combinatorial chemistry can utilize measured response, synthetic simultaneously a large amount of compound.A combinatorial libraries can comprise from tens to a millions of even up to ten million compound.Combinatorial chemistry provides efficient feasible approach for finding new medicine, has become the important platform in the new drug research.Along with to lead compound and medicine essence cognition and development of knowledge, the limitation of combinatorial chemistry technique is also more and more outstanding, these limitation have caused utilizes combinatorial chemistry to synthesize combinatorial libraries, and carrying out method that target sieving obtains lead compound then becomes new drug gradually and find a bottleneck on the road.At present, the main limitation of combinatorial chemistry technique is mainly reflected in following 3 points:
1. the compound that is synthesized in a large number by combinatorial chemistry has greatly increased the number for the used compound database of screening.But its specific aim to target molecules is not strong, and the hope of finding lead compound is placed on the molecular diversity of combinatorial chemical library inherence.The operational mode of this " looking for a needle in a haystack " formula has reduced the ratio that therefrom extracts lead compound significantly, has increased the cycle that lead compound produces.
2. because the compound number that combinatorial chemistry once produces is very big, comprised a lot of structurally interrelated informations, the increase of quantity of information directly causes the wretched insufficiency to the information utilization, the ability that lacks confluence analysis and utilize information to transform continuously.
3. combinatorial chemistry is because the surge of synthetic quantity, and expense becomes a huge burden, is unfavorable for that current medicine industry produces in the lower integrated environment of probability at new drug to carry out large scale experiment.
In recent years, the appearance of combination centralized repository and the method for virtual screening are maximized favourable factors and minimized unfavourable ones combinatorial chemistry, become a kind of more practical drug design method.Along with the development of molecular biology and structure biology, the macromolecular three-dimensional structure of the receptor biological in the protein structure database is more and more.Three-dimensional structure based on biomacromolecule, can design new lead compound or lead compound is carried out structure of modification with methods such as database search or brand-new drug designs, these drug design methods be called drug design method based on structure (structure-based drug design, SBDD).
Combining closely of combinatorial chemistry and chemistry makes combinatorial chemistry that stronger vitality arranged.Combinatorial chemistry has become a focus in chemistry, medicine and the material science research.A kind of trend of combinatorial chemistry development is that design combines with rational drug, before synthetic combinatorial libraries, reasonably designs virtual compound library by molecular simulation and theoretical calculation method, increases the diversity and the quasi-medicated property of compound in the storehouse, improves the quality in storehouse.The focus of research is the three-dimensional structure design centralized repository according to receptor biological molecule binding site at present, and this will improve the quality and the screening effeciency of combinatorial chemical library greatly.At present also there is not to finish integrated software package at the combinatorial chemistry centralized repository design and optimization of drug discovery.Contain some combinatorial chemistry library modules in many business softwares, the design module that (among the Cerius2) combinatorial chemical library is for example arranged, have (in the Sybyl software) virtual portfolio storehouse generation module (CombiLibMaker, Barnard ChemicalInformation-BCI), molecular diversity analysis module (Diversity and the Daylight of Diversity Analysis Package, ChemicalDesign Ltd.-CDL and ChemDiverse, Cerius2) is arranged, but they do not possess complete centralized repository design and evaluation function.
Summary of the invention
The present invention be directed to the problem in the above-mentioned technology formerly, a kind of design and optimization method of virtual portfolio chemistry centralized repository is provided, and its fundamental purpose is to adopt to carry out the automatic design and the assessment in virtual portfolio preliminary screening storehouse based on the drug design method of structure on the basis of the big molecule three-dimensional structure of given acceptor information.Then the virtual portfolio chemical libraries is estimated with molecular docking (dock), molecular diversity, molecule quasi-medicated property, bad pharmacokinetic property or unfavorable toxic reaction methods such as (ADME/T), and the application genetic algorithm is continued to optimize each virtual centralized repository, generation has the high-affinity at target, the combinatorial chemistry centralized repository of the highly diverse of statistical significance and high quasi-medicated property.Pharmaceutical Chemist synthesizes according to this compound molecule of optimizing in the compound library of back, can improve the synthetic probability that produces lead compound in the compound greatly.Also can carry out structure of modification to compound according to the information that provides in the storehouse in addition, be a kind of brand-new drug design method based on target structures.Everything all is to rely on powerful calculating of computing machine and rational information analysis means to realize.
The technical solution adopted in the present invention is:
(virtual library VL) is the combinatorial libraries that the applying electronic infotech generates and stores to virtual portfolio chemistry centralized repository of the present invention, and it is not the real combinatorial chemical library that exists.But can synthesize if desired, with known chemical reactions and available construction unit.An Ideal Match storehouse should comprise the compound of various structure types and characterization of molecules, that is to say to make to have big as far as possible otherness between the molecule.For example in drug design, mainly contain two problems in the design of combinatorial libraries, the one, it is valuable needing to determine which chemical constitution space of specific combinatorial libraries; The 2nd, a series of molecules of definite these chemical spaces of energy efficiently searching.Therefore, design and optimization method of the present invention is:
<1〉at first design virtual portfolio preliminary screening storehouse, comprising based on target: determine combination synthetic at target proteins; Design team synthesizes route, selects the combination construction unit; The unit member set of each the basic building unit during selected again combination is synthetic;
<2〉second step was set up virtual portfolio preliminary screening storehouse according to above-mentioned design, comprising: according to the combinatorial chemistry reaction, read the unit of appointment from specify tectonic element, make up corresponding virtual portfolio preliminary screening storehouse; Then the molecular conformation in the virtual portfolio preliminary screening storehouse that obtains is optimized;
<3〉the 3rd step was estimated the virtual portfolio preliminary screening storehouse of above-mentioned foundation and optimized, and comprising: the appraisement system of at first setting up modular structure; Adopt genetic algorithm that virtual portfolio preliminary screening storehouse is optimized then.
The software package of virtual portfolio chemistry centralized repository of the present invention adopts and moves on kinds of platform, comprises Unix, Linux, Windows NT etc.Operating process has good inheritability and encapsulation property.The present invention mainly adopts the template base (STL) of C Plus Plus and standard.C Plus Plus is to support data abstraction and object-oriented programming language.It can with C language compatibility, have compactness, flexible, efficient and portable advantage.In contrast to the function of C language, C++ has increased heavy duty (overloaded), inline (inline), constant (const) and virtual (virtual) four kinds of new mechanisms.And C Plus Plus provides class (class), name space (namespace) and access control, makes the localization (locality) of design decision become possibility.In addition, become in the process of complete large program in the module combinations that will disperse, name space and abnormality processing (exception handling) mechanism is used to reduce the difficulty and the complexity of integration process; Along with the increase of program scale, they also play more and more important effect thereupon.On the basis of maim body framework, what both made some part of module that existed in the past employings is the other Languages written program, because C++ very high level conceptual and objectification, therefore can be implemented under the situation of the general frame that do not influence program and other module, adopt interface routine to call.Under the OO main body frame of C++, has good extendibility in addition, for the continuous evolution of program provides solid foundation.
In the method for the design and optimization of combinatorial chemistry centralized repository of the present invention and the technology formerly simple combinatorial chemistry optimum experimental relatively, it is fast to have speed, can finish the preferred of 20,000 compounds in one day; Expense is low, does not expend any experiment equipment, realizes by computer simulation fully.The information analysis means are strong, and global optimization approach can provide centralized repository result best on the statistics automatically.Extract favourable element success ratio height, the probability that the fragment of forming in the centralized repository of acquisition becomes novel drugs is big.
The method of the present invention combines genetic algorithm with combinational chemistry combinatorial chemistry centralized repository design and optimization, can be used for the brand-new medicine design studies based on big molecular target three-dimensional structure, the application in drug research provides new method for combinatorial chemistry and area of computer aided drug design.The designing program on the basis of the big molecule three-dimensional structure of given acceptor information, to adopt and carry out the automatic design and the assessment in virtual portfolio preliminary screening storehouse based on the area of computer aided combinatorial chemistry centralized repository of biomacromolecule three-dimensional structure of the present invention establishment based on the drug design method of structure, then to the molecular docking (dock) of virtual portfolio preliminary screening storehouse, molecular diversity, the molecule quasi-medicated property, methods such as ADME/T are estimated, and the application genetic algorithm is continued to optimize each virtual library, generation has the high-affinity at target, the highly diverse of statistical significance and high quasi-medicated property.Be to synthesize for medicine to have the combination centralized repository of guiding significance.The optimization information that obtains from program can instruct pharmaceutical chemists further existing lead compound to be carried out structure of modification.
Description of drawings
Fig. 1 is the framework synoptic diagram of combinatorial chemistry centralized repository design and optimization method of the present invention.
Fig. 2 is for setting up the synoptic diagram of virtual portfolio preliminary screening operating process among the present invention.
Fig. 3 sets up the synoptic diagram of modular structure appraisement system among the present invention.
Fig. 4 is in the embodiment of the invention, the transformation curve figure of first molecule binding energy.
Embodiment
Further specify design and optimization method of the present invention below in conjunction with embodiment and accompanying drawing
The design and optimization method of combinatorial chemistry centralized repository of the present invention is exactly a kind of design, foundation of virtual portfolio chemistry centralized repository software package and estimates the method for optimizing.As shown in Figure 1, mainly comprise:
1. set up virtual portfolio chemistry centralized repository software package, the first step at first will design the virtual portfolio preliminary screening storehouse based on target.The design in virtual portfolio preliminary screening storehouse can be divided into 3 steps:
A. determine combination synthetic at target proteins;
B. according to synthetic target, select the combination construction unit, design suitable combination synthetic route;
C. according to combination synthetic route and institute getable basic building unit, unit member (molecular fragment or the pharmacophore) set in each the basic building unit during selected combination is synthetic.
2. second step of setting up virtual portfolio chemistry centralized repository software package is to set up virtual portfolio preliminary screening storehouse.
A. according to the combinatorial chemistry reaction, the unit member (molecular fragment or pharmacophore) that reads appointment from specify tectonic element makes up corresponding virtual portfolio preliminary screening storehouse.
B. after obtaining virtual portfolio preliminary screening storehouse, the molecular conformation in the storehouse is optimized to obtain optimum conformation.
3. the 3rd step of setting up virtual portfolio chemistry centralized repository software package is that virtual portfolio preliminary screening storehouse is estimated and optimized.
A. estimate at virtual portfolio preliminary screening storehouse.At first set up the appraisement system of modular structure: set up molecular activity (micromolecule that molecular docking obtains and the interaction energy of biomacromolecule), molecule quasi-medicated property, molecular diversity and bad pharmacokinetic property or the evaluation module of unfavorable toxic reaction (ADME/T).After must passing through conversion, overall treatment, output result in each module of setting up is converted to a final numerical value, promptly combinatorial libraries is carried out comprehensive evaluation, the physical significance of each module representative has nothing in common with each other, and therefore in software package each output result's weight is carried out parameter adjustment and is normalized to the unitarity evaluation criterion under different situations.
B. adopt genetic algorithm that virtual portfolio preliminary screening storehouse is optimized.Because the quantity of compound can reach 10 in a virtual portfolio preliminary screening storehouse 6, even more.For this reason, optimizing screening is the key that acquisition has the small-scale compound library of future.The present invention adopts genetic algorithm to be optimized operation: at first according to parameter establishment initial set storehouse group at random, with molecular docking, quasi-medicated property, molecular diversity, ADME/T etc. each centralized repository is estimated then, the utilization genetic algorithm produces centralized repository group of future generation, judge that according to the end condition of setting whether genetic algorithm continues, and exports optimum at last again.
Further describe design and optimization method of the present invention below for example.
1, first step design comprises based on the virtual portfolio preliminary screening storehouse of target:
A. determine combination synthetic at target proteins
The biochemical network that exists a complexity is in vivo regulated the various functions of body, determines that according to the research needs can suitable target proteins be obtain to have the good prerequisite of combinatorial chemistry centralized repository targetedly.The definite of target proteins will follow following two principles:
(1). in the biochemistry circulation, be present in network intersection and have the albumen of key enzyme characteristic or have specific adjusting albumen.
(2). this albumen has crystal structure or other albuminoid of the affiliated family of this albumen has crystal structure.
After suitable albumen target is determined, just can carry out the design of virtual portfolio centralized repository according to this albumen.
Be below one about determining that PPAR γ is as protein targets target example:
Peroxisome growth factor activated receptor (Peroxisome proliferator-activated receptor, PPAR, PPAR γ) is mainly expressed in adipocyte, and it is the important transcription factor of regulating the adipocyte differentiation.Can promote white adipocyte (White adipocytes) to be divided into numerous little adipocytes after PPAR γ is activated and reduced the quantity of big adipocyte.Studies show that little adipocyte has higher insulin sensitivity with respect to maxicell, can better utilize glucose.In addition, molecular biosciences experiment has in recent years determined that PPAR γ is insulin sensitizer thiazolidinediones medicine (Thiazolidinediones, TZDs) Zuo Yong a target molecule.Therefore, the PPAR gamma agonist gets a good chance of becoming the brand-new type ii diabetes medicine of a class.
PPAR belongs to nuclear hormone receptor (Nuclear hormone receptor) superfamily (at first being found in nineteen ninety by Britain scientist Issemann and Green), it is the transcription factor that a class is activated by part (Ligand), because the novel nuclear receptor of this class can be activated by the peroxisome multiplication agent, so called after PPAR.After being activated by specific part, section of DNA on PPAR and some gene, also be called peroxisome growth factor response element (PeroxisomeProliferator responsive element, PPRE), interact, thus the expression of adjusting downstream gene.All there are three kinds of hypotypes in amphibian animal, rodent and human PPAR, i.e. PPAR α, PPAR γ and PPAR δ (also claiming PPAR β), and wherein PPAR γ studies the most a kind ofly.The earliest the synthetic activator of Bao Dao PPAR γ be a series of thiazolidinediones (Thiazolidinedione, the TZD) compound of class is comprising troglitazone (Troglitazone, 1), Pioglitazone (Pioglitazone, 2) and Rosiglitazone (Rosiglitazone, 3) or the like.Molecular structure is shown in structural formula 1:
Structural formula 1: thiazolidinedione (Thiazolidinedione, TZD) compounds structure and represent medicine
Troglitazone, Rosiglitazone and Pioglitazone are gone on the market by drugs approved by FDA respectively at 1997 and 1999 as treatment type ii diabetes medicine, but, troglitazone has just been found hepatotoxicity and other harmful effect after listing a period of time, is cancelled by FDA in 2000.Toxicity mechanism studies show that, troglitazone also activates another one nuclear receptor PXR (Pregnane X recetpor) when combining with PPAR γ, thereby (can reference: 1998, the scientist of the plain drugmaker of Ge Lan goes up the crystal structure (the PDB numbering is 2PRG) of the PPAR γ-Rosiglitazone compound of report at Britain's Nature Journal (Nature) to have caused the toxic and side effect of troglitazone.Because the hydrogen bond that thiazolidinedione heterocycle (2PRG) forms is extremely important with combining of PPAR gamma activity pocket for the TZD excitomotor, so related work mainly concentrates in the modification of hydrophobic side.People wish to introduce suitable group and form more strong hydrophobic effect.From this thinking, the plain drugmaker of Ge Lan has synthesized a series of TZD class tail derivant, and the activity that wherein has has reached the nanomole level).
PPAR γ is a kind of important transcription factor specific receptor of regulating the adipocyte differentiation, and the multiple crystal structure of PPAR γ is resolved, can from storehouse, existing protein data Kuku (PDB), obtain ( Http:// www.rcsb.org/pdb/).Therefore, it is not only significant for obtaining brand-new type ii diabetes medicine to select PPAR γ to make up the centralized repository design as target proteins, and has possibility.
B. design suitable combination synthetic route
According to the selected target proteins that makes up the centralized repository design, design suitable combination synthetic route, promptly select the synthetic construction unit of combination, synthetic step number and route.The selection of combination synthetic route mainly is according to following two principles:
(1). the character at target proteins binding pocket position.
(2). this albumen is had the result that medicine effect group analyzes.
Difference according to target proteins pocket character, pocket can be divided into several zoness of different, the corresponding construction unit in each zone, according to the principle design synthetic route that progressively connects, each construction unit is coupled together, and calculating couples together needed step number as the synthetic step number of combination with all unit.When the understanding that lacks for target proteins binding pocket part character, can be by the existing medicine of this albumen, activator or the pharmacophore of inhibitor to be analyzed, decomposition texture obtains the synthetic construction unit of combination, synthetic step number and route.
Be below one about determining example at the combination synthetic route of PPAR γ:
Compare the structure of natural activator of PPAR γ and Rosiglitazone, their structure has all comprised a polar head and hydrophobicity tail.
Structural formula 2: the structure of Rosiglitazone and division
In the top structural formula 2 be according to architectural feature and character with the PPAR activator be divided into polar head A, middle interconnecting piece divides B and hydrophobic tail C.On basis of structural analysis to a large amount of known activators, A, B, C three parts have been chosen in the embodiments of the invention as three construction units, and according to the principle that progressively connects, A and B is synthetic as the first step, and the intermediate product that A and B are coupled together synthesizes as second step with C.Therefore, when A, B, three construction units of C, carry out two-step reaction altogether, route is first A+B, adds C again.
C. select the unit member set (or claiming molecular fragment set or pharmacophore) in the construction unit
After each construction unit is determined, carry out the selection of this unit molecular fragment set according to the character of each construction unit.Molecular fragment should have identical character in each construction unit, and such as water wettability, lipophilicity contains hydrogen bond and forms the site, and electrostatic effect is similar etc.Select some (to consider computing velocity for each construction unit, the fragment of each unit is preferably less than 150) molecular fragment, fragment can derive from known synthetic basic group, phenyl for example, phenolic group, heterocycle, amide group, ketone groups etc. also can derive from the bigger group that certain specific function is arranged in the known drug that is present in.
Be a example below about determining to select at each synthetic construction unit molecular fragment of PPAR γ combination:
Because the PPAR activator can be divided into polar head A, middle interconnecting piece divides B and three construction units of hydrophobic tail C.Selection for each construction unit fragment all will be according to the identical principle of character.For the A part, should select to have the group of nonpolar nature as fragment as far as possible; For the B part, should select to have the group of certain flexibility to adapt to connectivity as far as possible; For the C part, should select the stronger group of hydrophobicity as far as possible.According to mentioned above principle, chosen the molecular fragment of A, B, C three parts, number is respectively 118,88,98.
2. second step of setting up the concentrated software package of virtual portfolio chemistry is the unit according to above-mentioned design object, route and structure, makes up virtual portfolio preliminary screening storehouse
A. make up virtual portfolio preliminary screening storehouse
As shown in Figure 2.This part program is according to the combinatorial chemistry reaction, and the unit member (molecular fragment or pharmacophore) that reads appointment from specify tectonic element storehouse (building blocklibrary) makes up corresponding virtual portfolio chemical molecular storehouse.Program can be finished various types of chemical reactions and multistep chemical reaction.Simultaneously, program has added because of atomic type behind the chemical reaction changes and has adjusted the function of relevant bond distance, bond angle and conformation.
Said combinatorial chemistry reaction is a 16th National Congress of Communist Party of China class solid phase synthesis, and this 16th National Congress of Communist Party of China class solid phase synthesis is:
● anchor reaction (Anchoring Reaction)
● amino binding reaction (Amide Bond Forming Reactions)
● fragrant substitution reaction (Aromatic Substitution Reactions)
● condensation reaction (Condensation Reactions)
● cycloaddition reaction (Cycloaddition Reactions)
● grignard reaction (Grignard Reactions)
● Michael addition reaction (Michael Addition Reactions)
● heterocycle reaction of formation (Heterocycle Forming Reactions)
● multi-component reaction (Multi_component Reactions)
● alkene reaction of formation (Olefin Forming Reactions)
● oxidation reaction (Oxidation Reactions)
● reduction reaction (Reduction Reactions)
● non-fragrant substitution reaction (Non_aromatic Substitution Reactions)
● protection and deprotection reaction (Protection/Deprotection Reactions)
● solid phase organic synthesis (Other Solid Phase Organic Reactions)
● cracking reaction (Cleavage Reactions)
Wherein comprise multiple reaction type in each big class reaction.The present invention has considered the realization of multistep reaction, once moves the chemical reaction that can finish for 9 steps at most when combinatorial libraries generates.The storage format of the molecular fragment in the molecular fragment storehouse is the mol2 form, has comprised the reasonable three-dimensional structural information.The present invention carries out mark to reaction site, leaving group, the of bonding variation of each molecular fragment, carries out obtaining new molecular fragment storehouse after the respective handling again.Molecular fragment carries out connecting successively after conformation is adjusted according to reactions steps again, finally generates virtual portfolio chemical molecular storehouse.In the operating process, added in the chemical reaction and adjusted the function of relevant bond distance, bond angle and conformation because of atomic type changes.Thereby make the recruit who obtains have reasonable three-dimensional conformation, i.e. low energy conformations.As shown in Figure 2.Because the conformation change of reactant is often bigger in this class reaction, relate to the variation of the bond angle and the corresponding dihedral angle of each flexible bond, conventional method is difficult to the reacted conformation of prediction, has only the method head it off preferably that adopts molecular mechanics and molecular dynamics.Therefore after Cheng Ku, add the conformation optimization step, have low energy conformations after the molecule optimization that makes combination obtain.
B. molecular conformation in the virtual portfolio preliminary screening storehouse of above-mentioned acquisition is optimized
Because the complicacy on the molecule space conformation of combination results causes part molecule space conformation extremely to twist, and is unsuitable for carrying out next step evaluation.When making up virtual portfolio preliminary screening storehouse, to carry out conformation optimization to the molecule that the virtual portfolio storehouse produces.
The present invention adopts molecular force field (Tripos) and space search algorithm to the optimization of molecular conformation.The space search algorithm adopts simplicial method and conjugate direction method (Powell) to combine.Simplicial method is the discontinuous searching algorithm in a kind of space, the present invention uses atomic coordinates as the independent variable vector, in the molecule that n atom arranged, set up the vector of 3n, and in the 3n dimension space structure to put the volume constitute by 3n+1 be not that 0 geometric figure is a simplex.This 3n+1 point is the summit of this simplex, when search, utilizes compression, expansion and the optimum space of reflection acquisition on this 3n+1 summit to separate.Simplicial method can be jumped out the local energy minimum by reflection, and is higher in search incipient stage efficient, but in the testing site near the overall situation hour, its speed of convergence is obviously slack-off, also may converge to not to be on the accurate minimum point.At its characteristics, the method that the present invention has adopted space compensation dynamically to adjust reduces to change step-length in the compression along with the space, gradually until convergence.For the search of finishing space global optimum that can be faster and better, the present invention is provided with simplicial method and just uses at the initial period of optimizing in software, after reaching certain standard, use conjugate direction method (Powell) instead and finish further optimization.Conjugate direction method (Powell) is the most successful a kind of method of direct search method.This algorithm is based on the algorithm of quadratic model object function.Experiment shows, for objective function, utilize conjugate direction as its direction of search, through the search of limited number of time one-dimensional with quadratic form form, just can reach its extreme point, thisly quadratic function is made the method that the limited number of time iteration just can reach extreme value be also referred to as the quadratic convergence algorithm.The Powell method is the derivative of calculating target function not, therefore can restrain more fast on speed.The present invention as objective function, at first turns to it quadratic form of 3n variable with the energy function of molecular force field (Tripos), and then one-dimensional search on each dimension of 3n dimension quadratic form, until satisfying convergence (energy convergence).
The space chirality of compound is the necessary condition that many medicines play a role.Therefore the uniqueness that keeps chirality in the conformation optimizing process also is an important feature of molecular conformation optimization part of the present invention.The present invention sets up chirality field notion, generate the chirality field for having chiral compounds, and the chirality field is converted into energy is added on the molecule self-energy constraint condition of optimizing as conformation, through constantly experiment adjustment, the present invention has obtained to keep the optimized parameter of chirality in conformation optimization, can guarantee that the chiral molecules chirality in the conformation optimizing process that makes up out is constant in original state.
The present invention has set up automatic screening mechanism in molecular conformation optimization in addition, automatically be separated in the extreme difference molecule that can't obtain more excellent conformation under the parameter current condition after the space conformation optimization, and then adopt long extreme parameter consuming time to be optimized, thereby both guaranteed the short timeliness that most of molecular conformation is optimized, satisfied the completeness that the extremist optimizes conformation again.
Being one below sets up and the optimization example of molecular conformation at the virtual portfolio preliminary screening storehouse of PPAR γ about determining:
According to each construction unit that obtains in the first step (the molecular fragment number is respectively 118,88,98), use the molecule construction step, obtain to add up to the virtual portfolio preliminary screening storehouse of 118*88*98=1017632 molecule altogether.Program selects suitable reaction type to connect two fragments according to the fragment atomic type that links to each other automatically in the building process.After structure is finished, each molecule in this preliminary screening storehouse has been carried out molecular conformation optimization, this part is assigned to finish by molecule Optimization Dept..
3.1. virtual portfolio preliminary screening storehouse is estimated
A. at first set up appraisement system to the preliminary screening storehouse, as shown in Figure 3, set up molecular activity evaluation module (micromolecule that molecular docking obtains and the interaction energy of biomacromolecule), molecule quasi-medicated property evaluation module, molecular diversity evaluation module and ADME/T evaluation module.The three kinds of evaluation modules in front are the most basic evaluation modules that prediction is set up.Not only be used for evaluation, also be applied to make up in the optimizing process of centralized repository virtual portfolio preliminary screening storehouse.Output result in each module is converted to a final numerical value after must passing through conversion, overall treatment, promptly to the comprehensive evaluation of combinatorial libraries, just can feed back in the genetic algorithm (GA).Each module output result's physical significance has nothing in common with each other, and numerical value is widely different, and the weight of respectively exporting the result need be adjusted under different situations.Therefore how with every result respectively normalization be a major issue, respectively the evaluation method of each evaluation module of setting up is described below.
A. the evaluation method of the molecular activity evaluation module of said foundation (comprising the electric charge computing module):
What the evaluation method of molecular activity evaluation module adopted is the molecular docking method.Estimate the binding ability between part and the acceptor.The molecular docking method is the micromolecule part to be positioned over the avtive spot place of acceptor, and seeks its reasonably orientation and conformation, makes shape and interactional coupling the best of part and acceptor.In drug design, the molecular docking method is mainly used to search the micromolecule that better associativity is arranged with the big molecule of receptor biological from the micromolecule database, carries out the pharmacology test, therefrom finds new lead compound.Molecular docking is the effect of overall thinking part and receptors bind, can reasonablely avoid the local action of easy appearance in the additive method better, and whole in conjunction with not good enough situation.
Butt joint molecule (DOCK) is first molecular docking program, and considers the part flexibility, so the present invention adopts the docking procedure of DOCK as the molecular activity evaluation.It also is requisite part in the program that electric charge calculates.When each atomic charge distributes in calculating molecule, consider the requirement of DOCK program, adopt (Gasteiger Marsili) charge distributing method, write electric charge calculation procedure based on C Plus Plus.As shown in Figure 3.
Binding energy for DOCK 4.0 programs arrives-60kJ mol-10 usually -1, but do not have boundary up and down.Therefore can not finish normalization with simple method, need its value be transformed into a finite interval with nonlinear transformation.Can adopt sigmoid curve (Sigmoid) function here.Sigmoid function expression following (formula 1):
y = 1 - e ax 1 + e ax ... formula 1
Wherein a is a constant term.When the x value was any real number, the scope of the value range of y value was all between-1 to 1.So just solved boundlessness numerical value has been mapped to a finite interval correspondingly.Next also to consider between region of interest of the present invention the discrimination problem of transfer function.Because binding energy is less than-60kJ.mol -1The time, the part and the binding constant between the acceptor that come out according to this cohesive energy calculation have departed from actual value, the having little significance of zone of discussion calibration; When binding energy during, illustrate that acceptor does not combine with part greater than 0kJ.mol-1.The also undebatable discrimination of this situation.So the present invention more pays close attention to binding energy at 0kJ.mol -1To 60kJ.mol -1The curve that this is a part of.When a=0.05, the transformation curve of function as shown in Figure 4.
As can be seen from Figure 4, be-60~0kJ.mol at binding energy -1The interval, the energy scoring is between 0.9096 to 0, and just is the bigger section of rate of curve, therefore binding energy is had discrimination preferably.
In the object information file (* .info) of DOCK program output the molecular docking binding energy is sorted automatically, and all make binding energy into 0 for result on the occasion of (>0).When the present invention reads the binding energy of all molecules from this file and since binding energy not on the occasion of, the score that corresponding molecular activity is estimated between 0 to 1, has so also just been finished the normalization problem certainly preferably.Combine well more between part and the acceptor, the absolute value of binding energy is high more, and score is also just high more.
B. the evaluation method of the molecular diversity evaluation module of said foundation:
The evaluation method of molecular diversity evaluation module adopts the description of structure diversity, selects 40 kinds of descriptors for use, has increased the molecular polarization surface area parameters than existing 39 kinds of descriptors.The starting stage that new drug is found, need from the bigger virtual library of molecule number, find new molecular skeleton (scaffold) structure, emphasize the diversity between the molecule; And the lead optimization stage then when keeping preferably molecular skeleton, is sought better substituting group, pays attention to the similarity of molecule.Therefore, calculate the method for molecular diversity, the situation that the real difference of molecular structure is very big or very little in compound library under, can both obtain a suitable discrimination.Mainly concentrate on the physical property aspect in the present invention aspect the quasi-medicated property prediction, therefore realize having adopted the structure diversity description at the algorithm of molecular diversity.The molecular diversity calculating section has selected for use 40 kinds of descriptors (20 kinds of topological indexs, 20 kinds of structural parameters) to come the molecular diversity in calculation combination storehouse on concrete calculating.Concrete steps are:
B1. at first select descriptor
From previous finding as can be seen, not only computing velocity is fast to adopt the two-dimensional structure descriptor, and effect is also better.Therefore, the present invention has determined the molecular diversity structured descriptor.
Formerly in the technology, (referring to: Ashton MJ, Jaye MC, and Mason JS.New perspectives in leadgeneration.II.Evaluating molecular diversity.Drug Discovery Today 1996,1:71-78) adopted structural parameters such as the number of content, hydrogen bond donor and the acceptor of molecular weight, various common elements, rotatable number of keys and multiple topological index totally 159 descriptor parameters, the commercial data base of 100000 compounds has been calculated; With the method for unrooted cluster 159 parameters have been carried out correlation analysis then, removed the parameter that is closely related, obtained the subclass of correlativity minimum between the parameter at last, had only 39 parameters (seeing Table 1) according to standard.The parameter that wherein belongs to structure has 19, and topological index has 20, and first in the table 1 is the title of each parameter, and the 5th is the concrete implication of parameter, and second and third is the mean value and the standard deviation of the score of 1000000 organic compounds on this parameter.
Formerly in the technology, contrast with this parameter set and with 20 kinds of molecular orbit character (for example dipole moment, high power take molecular orbit (HOMO), lowest unfilled molecular orbital (LUMO), generate heat etc.) parameter set, adopt optimization method to carry out the diversity MAXIMUM SELECTION to same database, the weight of calculating each parameter sees Table 1 the 4th, and result of calculation shows that this parameter set more fast effectively.The parameter weighted value is directly related with the molecular diversity computing method.
The present invention has increased molecular polarization surface area (PSA) parameter on the basis of these 39 parameters, character such as PSA and drug absorption are closely related.Therefore, can strengthen program describes the micromolecular structure diversity of quasi-medicated property.The molecular polarization surface area computing method that the present invention adopts are based on the atom adding and the method (being TPSA) of molecule two dimensional topology.Therefore, it also is a kind of two dimensional topology descriptor.In the selection of molecular diversity descriptor, adopt the two-dimensional parameter of molecular structure in principle, and with the physico-chemical property of molecule, for example profit partition factor (logP) then is placed in the quasi-medicated property research and goes to consider.The calculating of preceding 39 characterising parameters, the present invention have utilized existing Academic Software one diversity software for calculation (ALTER), and it adopts formula conversion (Fortran77) language program to write.And the calculating of two-dimensional topology molecular polarization surface area parameters of the present invention (TPSA) adopts C Plus Plus to write.
Table 1. is molecular diversity descriptor and weight thereof in the technology formerly
Descriptor Mean value Standard deviation Weight Implication
MW 317.456 117.343 5.0 Molecular weight
Idon 1.352 1.298 3.0 Hydrogen bond donor
Iacc 3.043 2.368 3.0 Hydrogen bond receptor
thyd 0.923 0.281 0.2 Hydrogen atom number percent
thet 0.263 0.131 0.2 Exotic atom number percent
thal 0.018 0.039 0.2 Halogen atom number percent
tf 0.005 0.026 0.2 Fluorine atom number percent
tcl 0.012 0.028 0.2 The chlorine atomic percent
tbr 0.001 0.001 0.2 Bromine atoms number percent
ti 0.001 0.013 0.2 The iodine atomic percent
tcarbon 0.737 0.131 0.2 Atomic percent carbon
tphos 0.001 0.010 0.2 Phosphorus atoms number percent
tsulph 0.013 0.031 0.2 Sulphur atom number percent
toxy 0.132 0.093 0.2 Oxygen atom percentage
tnitro 0.099 0.084 0.2 Nitrogen percent
Nring 2.631 1.442 2.0 The number of rings order
tiribo 16.031 8.393 0.2 Number of keys
tirobo 0.541 0.971 0.2 The rotation number of keys
tiprbo 0.675 0.273 1.2 Rotation key ratio
tibab 1.838 0.487 0.4 Rich perverse stumbling (Balaban) index
ticent 0.176 0.031 0.4 Middle cardiac index
tizag1 47.529 18.358 0.4 Ze Geruo is than (Zagreb) M1 index
tizag2 137.155 60.756 0.4 Ze Geruo is than (Zagreb) M2 index
tiran0 16.045 5.739 0.4 Lai Endike (Randic) zero level index
tiran1 10.544 3.856 0.4 Lai Endike (Randic) one-level index
tiesum 66.513 24.047 0.4 Full atom electronics topology add and
tiehet 31.784 17.949 1.0 Exotic atom electronics topology add and
tiehal 2.136 4.758 1.0 Halogen atom electronics topology add and
tiecar 34.729 15.404 1.0 Carbon atom electronics topology add and
tikap1 7.596 6.474 0.5 Kai Er and the Ke Bo of Haier (Kier and Hall Kappa) first index
tikap2 7.598 3.038 0.5 Kai Er and the Ke Bo of Haier (Kier and Hall Kpppa) second index
tikap3 4.335 2.151 0.5 Kai Er and the Ke Bo of Haier (Kier and Hall Kappa) the 3rd index
tirad2 5.671 1.836 0.2 Pendant unconventional too deep red (PetitJohn) R2 index
tidia2 10.710 3.635 0.2 Pendant unconventional too deep red (PetitJohn) D2 index
tii2 0.881 0.125 0.2 Pendant unconventional too deep red (PetitJohn) I2 index
tihar2 41.389 18.529 0.2 Sea perverse auspicious (Harary) number
tischul 6120.165 7991.119 0.2 Suhl sesame (Schultz) index
tisyml 0.812 0.143 1.0 Overall symmetry index
tisyml2 0.108 0.161 1.0 Paired symmetry index
B2. for molecular polarization surface area (PSA) CALCULATION OF PARAMETERS method
The present invention selects for use Furthest Neighbor to calculate the parameter of molecular polarization surface area.Because the Furthest Neighbor explicit physical meaning is calculated simple relatively.Furthest Neighbor is at first described all variable normalization, constitutes an Euclidean distance space according to each weight of describing variable again.Similarity between the molecule (or otherness) distance that can be used in this space is represented like this.The distance definition of two molecules is a formula 2
dij = Σ k ( x ^ k i - x ^ k j ) 2 ... formula 2
Wherein
x ^ i j = w i ( x i j - x ‾ i σ i ) ... formula 3
i jBe i description variable of j molecule after the normalization, and x i jBe the description variable before the normalization, x iBe i the mean value of describing variable, σ iBe i the standard deviation of describing variable.w iBe i the weight of describing variable.When calculating the molecular diversity of each centralized repository, there is not the molecular diversity of a centralized repository of independent normalization to describe variable, but, determined the comparability of the molecular diversity between the centralized repository like this with all molecules of the same generation centralized repository normalization of putting together.
In concrete calculating, the numerical value of the molecular diversity of each centralized repository is that all molecules weight is between any two seen formula 4 apart from sum in the storehouse.
D k = 1 n ( n - 1 ) &Sigma; i = 1 n &Sigma; j = 1 j < i d ij ... formula 4
Wherein n is the molecule summation of k centralized repository, d IjIt is the molecularity value of molecule i and j.
Molecular diversity computing method and other of this combinatorial libraries is applicable to that the computing method of extensive combinatorial libraries compare, and reduced deviation.As previously mentioned, when the structure of modification of lead compound, because the molecule one-piece construction changes not quite, the numerical value difference of the molecular diversity that obtains can be very little; And when design virtual portfolio chemistry centralized repository, only need the average molecular diversity of each centralized repository in difference the same generation.In order to address this problem, finish the molecular diversity score of centralized repository in calculating after, again the score of each centralized repository is carried out normalization, as formula 5
D i , out = D i - D min D max - D min ... formula 5
D I, outIt is the final molecular diversity score of centralized repository; D iBe the molecular diversity score of i in the centralized repository, from formula 4; D MaxAnd D MinIt is respectively the minimum and maximum score of this generation centralized repository.The appearance of negative value has been avoided in this normalization.Therefore, specifically the final score in the centralized repository of maximum molecular diversity is 1, and the final molecular diversity score of other centralized repository is between 0 to 1.So solved the combinatorial libraries textural difference preferably and no matter be under the very big or very little situation, final molecular diversity score is all in certain limit and have relative discrimination.Reached the purpose of quantitative description molecule symbol of the present invention.
B3 is provided with the evaluating for molecular diversity
When calculating molecular diversity, preceding 39 molecule characterising parameters are from the technology formerly, and their relative weighting has passed through experiment with computing and discussion.The present invention is after adding TPSA, weighted value in the reference table, and by evaluation to current reactive compound of EGFR and MDDR database (MDDR-MACCS-II Drug Data Report) data, it is made as 3.0, acceptor number with hydrogen bond, the weight of the donor number of hydrogen bond equates, is lower than the weight of molecular weight MW.After all 40 weight normalization, promptly obtain weight about practical application in the molecular diversity evaluation module program.
C. the evaluation method of the molecule quasi-medicated property evaluation module of Jian Liing:
The evaluation method of setting up molecule quasi-medicated property evaluation module selects to contain 7 kinds of descriptors of molecular structure ratio descriptor.The present invention notices: the contained pharmacophore of the molecule that molecular weight is big more, hydrogen bond donor, acceptor number etc. all can increase.And generally all be to count as descriptor with simple character in the research of present existing quasi-medicated property, can not well reflect the difference of the class database property of medicine.
Comprehensive medicine popularity medical chemistry storehouse (CMC-Comprehensive Medicinal Chemistry) is the database that has more quasi-medicated property than medicine and class drug molecule database (MDDR-MACCS-II Drug Data Report).But with existing quasi-medicated property standard evaluation the time (for example Oprea), can not well reflect this essence.Therefore, in quasi-medicated property is passed judgment on, should reduce because of molecular weight is big more the more erroneous judgements that cause of pharmacophore, hydrogen bond donor and acceptor number that molecule is contained more as far as possible.Simple method is exactly to represent that with some the ratio of the descriptor of molecular structure character and molecular weight is as new descriptor, come the compound library (as MDDR, CMC) of the region class property of medicine and the compound library (as ACD) of non-class medicine, eliminate the purpose of compound size the influence of quasi-medicated property thereby reach.The present invention calls molecular structure to these new descriptors than forthright descriptor.The present invention has tested CMC with a series of molecular structures than forthright descriptor, MDDR, and the quasi-medicated property of ACD database obtains good result.On this basis, the present invention therefrom selects and the big descriptor of molecule quasi-medicated property correlativity according to the organic molecule characteristic, further the quasi-medicated property of quantization ratio compound.
To the quasi-medicated property quantitative description of organic molecule the time, the present invention selects the molecular structure ratio descriptor to judging that the molecule quasi-medicated property plays an important role, and the molecular property descriptor of combination " 5 rule ", determined the evaluation of 7 descriptors as the molecule quasi-medicated property at last, table 2 provides the attribute of estimating the molecule quasi-medicated property in the molecule quasi-medicated property evaluation module of the present invention
Table 2. minute subclass medicine descriptor and weight thereof
Descriptor Scope Weight Implication
xlogP -0.5~5 0.1 N-octyl alcohol/water partition coefficient
MW 78~500 0.1 Molecular weight
HBA <=10 0.1 The hydrogen bond receptor number
HBD <=5 0.1 The hydrogen bond donor number
C3p 0.15~0.8 0.2 The ratio of saturated carbon atom number and heavy atom number except that halogen atom
h_p 0.6~1.5 0.2 The ratio of hydrogen atom number and heavy atom number except that halogen atom
unsat_p 0.05~0.45 0.2 The ratio of the number of keys between molecule degree of unsaturation and the heavy atom except that halogen atom
From the present invention the statistics comparative study in class drug compound storehouse and non-class drug compound storehouse is found, be not that the rigidity of the organic micromolecule compound that it is generally acknowledged is strong more good more, but the organic micromolecule compound saturated carbon atom is also had certain proportion requirement.The degree of unsaturation of class medicine molecule wherein, the ratio of saturated carbon atom, the ratio of the number of unsaturated atom and saturated carbon atom number, the ratio of the number of unsaturated atom and saturated atom number, the ratio of the number of saturated atom and the number of carbon atom, nitrogen oxygen atom number will be in certain limit with the ratio of the number of saturated carbon atom.These conclusions provide foundation for the selection of selecting organic molecule quasi-medicated property descriptor.In 7 descriptors that the present invention selects, taken into account organic molecule molecular physical chemistry character, molecule saturation degree character, multiple molecule intrinsic attributes such as degree of freedom character.Wherein, according to document, the most important physicochemical property that are applicable to pharmacokinetic property are exactly fat-soluble (or water-soluble), and logP is a present best molecule descriptor of describing this characteristic; In the noncovalent interaction of ligand-receptor, the effect of hydrogen bond is the component of both keys of combining closely, and HBA and HBD have embodied this ability of molecule; Degree of unsaturation (unsat_p) is relevant, also relevant with the aromaticity of molecule with the number and the unsaturated bond number of ring; MW has provided the scope of organic molecule molecular weight, and this scope is according to the results to several thousand kinds of drug molecule weight range statistics acquisition at present; The C3p descriptor is relevant with the amount of rigidity of molecule, has characterized the degree of freedom of organic molecule: the h_p descriptor is reasonable degree of unsaturation descriptor.
By the weighting normalization of above 7 descriptors, the present invention can obtain the quasi-medicated property marking of organic molecule in the molecule quasi-medicated property evaluation module.
The screening technique of d.ADME/T (bad pharmacokinetic property/unfavorable toxic reaction) evaluation module:
Because the main cause that most medicines are eliminated after entering clinical trial is not because drug effect, but because bad pharmacokinetic property (ATME) or unfavorable toxic reaction (T) caused the waste that is used for synthetic and pharmacological evaluation substantial contribution in earlier stage.So for the virtual portfolio centralized repository, setting up the ADME/T evaluation module also has its significance.
The screening technique of the ADME/T evaluation module of setting up can be to be based upon on the molecular basis, at molecular level structure living model, by the Quantum mechanical calculation method, compound to be selected each physics in living model, chemical and biological descriptor obtains ADME/T character; Or be based upon method on quantitative structure-activity relationship (QSAR) and D-M (Determiner-Measure) construction character relation (QSPR) basis by statistics; Or be based upon method on the biochemical network foundation of human body; Perhaps join in the evaluation of quasi-medicated property evaluation module and fit normalization, see 7 descriptor parameters in the quasi-medicated property evaluation module table 2 with the unification of quasi-medicated property information.
E. above-mentioned each evaluation module comprehensive parameters is carried out normalization
Because the computing method difference of above-mentioned each evaluation module, therefore the result who obtains must come a molecule is carried out comprehensive evaluation through independent variable of normalization formation, and this just relates to the normalization problem of comprehensive parameters.On the normalized basis of above-mentioned evaluation module self, after the present invention obtains the evaluation module score by the evaluation to centralized repository,, feed back to combination centralized repository optimization part and be optimized with genetic algorithm by normalized weighting PTS.At initial period, the present invention is for the molecule binding energy, molecular diversity, and the weight scope of molecule quasi-medicated property is set respectively 0.5~0.9, and 0.1~0.3, change between 0.1~0.3, three's summation is 1.Along with the test to different system, providing the optimal weights scope according to a plurality of test macros is 0.7,0.2 and 0.1.
Be the combination centralized repository 3.2. adopt genetic algorithm that virtual portfolio preliminary screening storehouse is optimized
Genetic algorithm be a kind of use for reference highly-parallel that organic sphere natural selection and evolutionary mechanism grow up, at random, self-adaptive search algorithm.Briefly, it has used colony's search technique, represents a basket to separate population, by current population being applied a series of genetic manipulations such as selection, intersection and variation, thereby produce the population of a new generation, and progressively make population evolve to the state that comprises approximate optimal solution.In genetic algorithm, need to determine genic coding, the selection based on C Plus Plus that the present invention writes, intersection, variation scheduling algorithm can efficiently be finished every function of genetic algorithm steadily.Interface and the encoding context of selecting heredity for use at virtual portfolio chemical libraries generator program have found efficient, stable method.
A. select coding
In genetic algorithm, the most frequently used coding method is a binary coding.Useful binary coding representation combination fragment perhaps is converted into molecular fingerprint binary-coded method for expressing in the combinatorial libraries program.The present invention considers the diversity and the unlimitedness of molecule, and there is no certain inner link between fragment, so the present invention selects dynamic coding and real number coding method for use, promptly selects metric coding method for use, and the fragment numbering is as coding in the present embodiment.Like this, calculating operation is easier, and coding is directly perceived, to the molecular amounts in the combinatorial libraries also without limits.
B. select to duplicate operator
In program of the present invention, the operator of selecting for use (also promptly selecting operator) that duplicates is selected a kind of method the most frequently used in the reproduction process---roulette method really, and its basic step is:
(1) summation is asked in the adaptive value addition of all strings in the colony;
(2) produce one 0 and summation between random number m;
(3) be numbered 1 start of string from colony, the adaptive value addition of its adaptive value and follow-up string nearly is until adding up and being equal to or greater than m.The string of Jia Ruing is exactly a string to be selected at last.
The result that the gambling dish is selected returns a string at random, but each is gone here and there selecteed probability and is directly proportional with its adaptive value.The relatively poor string of adaptive value may be selected in the colony though make in the randomness of selecting, and along with the carrying out of evolutionary process, the influence of this contingency will be eliminated.For the speed of accelerating to evolve, in selection course, keep the highest individuality of one or several adaptive value and directly enter the next generation, this method has been accelerated the required algebraically of evolving.In program, adopt this method, and this option is regulated for the user to provide setting " to select encumbrance K ".
C. select crossover operator
Crossover operator is equivalent to the generative propagation in the biological heredity, can produce new individuality, thus new point in detection or the search volume.In program of the present invention, individuality is a centralized repository, and gene is a series of molecular fragments.The present invention has selected 2 more excellent cross methods, at first 1 and individual sum (centralized repository sum) between select two different random integers r1 and r2, i.e. two individualities (centralized repository) are as male parent and parent in the mating pond; Then, from the position section of other two different random integers r3 and r4 decision hybridization, i.e. Jiao Huan reference position and final position.At last, check in each individuality of a new generation whether contain the gene of repetition, if having then to repeat previous step rapid, till producing two correct new individualities.In program, provide Pc to regulate for the user as parameter.
D. select mutation operator
Mutation operator is with some position on the less probability P m randomly changing chromosome string, and for scale-of-two, being exactly corresponding positions becomes 1 or become 0 by 1 by 0.Because in program of the present invention, the position is the decimal coded of the construction unit of a certain reactions steps, and mutation operator is to select a position at first at random, selects unchecked construction unit in the place reactions steps more at random on this position, finishes mutation operation.In program, provide Pm to regulate for the user as parameter.As shown in Figure 3.
In concrete programming, be a kind of language of height typesization because the present invention adopts C Plus Plus.So the definition and the suitable function declaration that provide suitable class and template are parts maximum in the whole design effort.Chemistry is to be the science of object with atom and molecule, thus many chemical aspects with the object oriented language written program all with atom, molecule is a base class.Therefore to make up the base class that the centralized repository design adopts also be to be based upon to have comprised atom in the present invention, and key is on the basis of important base class such as molecule.When needed, the class of deriving that adds base class such as group and produce, above-mentioned thiazolidine dione compounds structure and represent the molecular formula of medicine to represent employed class and the relation between them in the program of the present invention in detail according to concrete object.
In program of the present invention, except that the conventional fault-tolerant that adds programming language self, also the artificial parameter of program is provided with mistake and has added automatic correction program.The default value that in program, has been most of parameter setting, so both be convenient to the use of program, also improved program concerning the program of the long-time computing of needs, the interruption of program and in the have no progeny function of restarting very important, can avoid the waste of computational resource under the fortuitous event and the prolongation of computing time like this.During program run, each is finished Shi Douhui for calculating and exports corresponding log file in genetic algorithm, and the essential information of next step calculating of program is all preserved.Restarting after program interrupt so only needs a simple order to carry out.
E., the genetic algorithm default parameters is set
For the program design of a maturation, the given default parameter value that can solve most of problem is necessary.So not only can enlarge the scope of use, and the efficient of can the raising program using.
The major parameter that influences genetic algorithm has group size N, selects encumbrance K, crossover probability Pc and variation probability P m.In order to seek the parameter to designing optimal of the present invention, the present invention has carried out repeatedly experiment, to obtain best default parameters.The program run environment of present embodiment is the Unix operating system (IRIX3) of standard (SGI), and the computing of whole procedure and test all are to carry out on the computing machine (origin3800) at SGI, use single CPU.
Population size has influence on the final performance and the efficient of genetic algorithm.When scale too hour because colony only provides inadequate sample size to most of lineoid, so generally can only obtain locally optimal solution; Big population size more is hopeful to obtain to comprise the representative of coming from a large amount of lineoid, has bigger probability to obtain globally optimal solution, but also expends time in more, therefore obtains suitable colony's number and be equilibration time and the key of the optimality of separating.Generally be recommended in 20~50, the scope of using in the present embodiment of (asking for an interview next unit) in test increases by 10 at every turn and tests from 10~60, finally obtains 30 and is best population size number.
Select encumbrance to be meant in the operation of selecting operator, directly with the individual replicate of preceding K the adaptive value maximum of previous generation in colony of future generation.Like this, both guaranteed that optimized individual directly entered the next generation, also promoted the raising speed of the adaptive value of colony.If but the value of retention is too high, will cause genetic algorithm (GA) premature convergence, be absorbed in local optimum.Therefore the value of K is generally 5~15% of population size.Setting up test specification is 0 to 6, increases by 1 at every turn.By test (asking for an interview next unit), determine in the present embodiment that the best value of K is 3 under population size is 30 situation.
Exchanging genetic fragment between the crossover probability control individuality, is the mutual key of community information.In the colony in per generation, Pc * N the individual intersection of participating in arranged.Crossing-over rate is high more, and individual in population is upgraded fast more.If crossing-over rate is too high, the ruined probability of high performance individuality is big more; If crossing-over rate is low excessively, the search meeting is owing to too little exploration rate is stagnated.Setting up test specification in this implementation column is 0.05 to 0.8, increases by 0.05 at every turn.By test (asking for an interview next unit), determine that then the K value is under population size is 30 situation at 3 o'clock, the optimum value of Pc is 0.25.
Variation is to increase the multifarious operator of colony.After selecting for the N time, each individuality in the new colony carries out randomly changing with probability P m, thereby per generation, the inferior variation of Pm * N * L (L is the gene number on the individuality) approximately took place.A low-level aberration rate can prevent whole height convergence, is equivalent to random search and be higher than 0.5 aberration rate.Setting up test specification in the present embodiment is 0.005 to 0.05, increases by 0.005 at every turn.By test (asking for an interview next unit), determine that then the K value is under population size is 30 situation at 3 o'clock, the optimum value of Pm is 0.015.
3.3 the combinatorial chemistry centralized repository to above-mentioned foundation detects
A. design detection system, the design detection system is according to concerning between activity and the structure
It is to detect the balanced acquiescence optimized parameter that obtains through a plurality of medicament sifting motion systems that the combination centralized repository is designed program, and only describes the step that obtains parameter through detecting as an example in detail with Cycloxygenase-2 (Cox-2) system inhibitor here.
Inflammation is that body reacts a kind of basic pathology that plays histologic lesion's generation that various inflammatory stimulus draw, and is a kind of common disease and frequently-occurring disease.NSAID (non-steroidal anti-inflammatory drug) (NSAIDs) is the anti-inflammation analgesis medicament that a class has significant application value, is widely used in various acute, chronic inflammations and treatment of pain clinically.But use existing NSAIDs easily to cause the bad reaction of tissues such as gastronintestinal system and kidney for a long time, therefore, the NSAIDs that seeks high-efficiency low-toxicity is the problem that the medicine scholar endeavours to study always.
The research that is found to be NSAIDs of Cycloxygenase-2 provides new thinking.The Cycloxygenase of Fa Xianing-2 (COX-2) selective depressant was compared with traditional NSAIDs in recent years, and the intestines and stomach toxic and side effect is lower, was the very promising novel anti-inflammatory medicine of a class.Formerly in the technology all in the research of actively carrying out the COX-2 selective depressant, and found the chemical entities that some have medicinal future.At present just can be divided into several big classes such as biaryl substituted heterocycle class, methylsulfonylphenylamine class, di-t-butyl substituted benzene phenols at the COX-2 of development research selective depressant on chemical constitution, wherein studying many is the biaryl substituted heterocycle compounds.
The architectural feature of biaryl substituted heterocycle compounds is adjacent diphenyl substituted benzene ring, heterocycle and unsaturated fat ring; to have mesyl or amino-sulfonyl be that molecule presents the inhibiting essential pharmacophoric group of COX-2 high selectivity in contraposition on one of them phenyl ring; when replacing mesyl with sulfamoyl; although the external selectivity of compound molecule decreases, activity in vivo significantly improves.Numbering SC58635 is exactly a kind of efficient COX-2 selective depressant 3 (shown in structural formula 3) that develops thus.
Figure A20041005310200251
The structure of structural formula 3SC58635 and division
B. select molecular fragment (being the unit member set) and parameter
Present embodiment will be divided into three parts from structure with biaryl substituted heterocycle compounds (with SC58635 is example, shown in structural formula 4): (1) head A: the phenyl ring that has mesyl or amino-sulfonyl in contraposition; (2) center section B: there is the aromatic rings or the unsaturated fat ring of two connection site a position; (3) afterbody C: substituent hexatomic ring or phenyl ring are often arranged in contraposition.Simultaneously, the structure activity study to 1,5 diaryl substituted pyrazole derivative shows: when 1,5 aromatic ring exchanged, the activity change of compound was little, but selectivity reduces.The present invention has taken into full account these results of study when the molecular fragment in composite design storehouse.
By analysis to this systematization compound, regard COX-2 as synthetic by two-step reaction, promptly be divided into three molecular fragment A, B, C.The pharmacophoric group that contains according to the present medicine of this system and according to row's principles such as molecules, present embodiment has partly been chosen 16,12 and 4 molecular fragments to A, B, C respectively, amounts to 16 * 12 * 4=768GE molecule.Concrete structure is seen structural formula 4.In the B part, i.e. center section, the situation that the position exchanges between having designed, therefore the molecular fragment that has repeats, but the connection site difference.
Figure A20041005310200261
The molecular fragment of A part
Figure A20041005310200262
The molecular fragment of B part
Figure A20041005310200272
The molecular fragment of C part
The design of structural formula 4.COX-2 inhibitor centralized repository fragment
Serves as that the basis obtains as the target structure in the molecular docking with crystal structure (PDB4 is numbered 6COX), wherein contains the SC-55820 inhibitor.In the molecular docking parameter, the minimum atomicity of " anchor " is 5, and the conformation number that the orientation number of " anchor " and the segment that increases newly are selected is respectively 500 and 25.
C. program robustness test
In order to obtain optimized parameter, at first to guarantee the robustness of program.Therefore the present invention's robustness of test procedure at first, selected parameter is as follows in the present embodiment:
The reaction step number:
Antimolecule fragment number: 12,16,4
Centralized repository is chosen the molecular fragment number: 4,4,2
Genetic algorithm:
Population size: 30
Select to keep number: 3
Crossover probability: 0.25
Variation probability 15
Maximum genetic algebra: 1000
The EOP (end of program) condition: score comes that preceding 70% centralized repository has identical molecular fragment or genetic algebra surpasses maximum genetic algebra
Molecular activity score weight: 0.7
Molecule quasi-medicated property score weight: 0.2
The diversity score weight in storehouse: 0.1
The program run environment is the IRIX system of the Origin3800 of SGI, and single CPU is used in the whole procedure test, has carried out altogether reruning for 24 times, and operation result sees Table 3.
From operation result, 24 times working procedure all reaches the EOP (end of program) condition in 628 generations, wherein 54% finishes with interior in 200 generations, illustrates that the search efficiency of genetic algorithm is higher, can finish program function in the short period.From the operation time of each several part, rate-determining step is in molecular docking.
Table 3. program run result one
Genetic algebra High basin score The molecular fragment of best centralized repository
One Two Three
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 486 412 159 405 187 197 443 25 104 103 168 118 132 211 628 114 314 89 573 227 178 230 243 59 0.7832 0.7778 0.7803 0.7841 0.7832 0.7887 0.7882 0.7793 0.7734 0.7892 0.7786 0.7832 0.7846 0.7798 0.7906 0.7906 0.7906 0.7892 0.7881 0.7798 0.7827 0.7906 0.7799 0.7842 1 2 5 9 1 2 5 9 1 2 5 9 1 2 5 9 1 2 5 9 1 2 5 9 1 2 3 9 4 2 5 9 1 2 4 9 1 2 3 9 1 2 5 9 1 2 5 9 1 4 8 9 1 2 5 9 1 2 5 9 1 2 3 9 1 2 5 9 1 5 8 9 1 2 5 9 1 2 5 9 1 2 5 9 1 2 5 9 1 2 5 9 3 2 5 9 1 9 8 1 9 8 1 9 8 1 9 12 1 9 8 1 9 12 1 9 12 1 9 16 1 9 12 1 9 4 1 9 8 1 9 8 1 9 16 1 9 8 1 9 12 1 9 12 1 9 12 1 9 12 1 9 12 1 9 8 1 9 12 1 9 12 1 9 8 1 9 12 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 3 4 10 1 4 10 1 4 10 1 4 10 1 4 10 3 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4 10 1 4
Composition analysis to the molecular fragment of best centralized repository.Maximum combination that first occurs is 1,2,5,9, totally 16 times; And 1,2,3,9, have 3 times.For the unimolecule fragment, 9 have occurred 24 times, and 1 and 2 occurred 22 times, and 5 have occurred 18 times, and 3 have occurred 4 times, and 4 have occurred 3 times, and 8 have occurred 1 time.Illustrate 1,2,9th, at the best molecular fragment of this part, and 5 take second place.
1,9 and 10 all occur at every turn in the second portion molecular fragment, and 12 have occurred 12 times, and 8 have occurred 9 times, and other 16 and 4 respectively occur 2 times and 1 time.Illustrate the 1,9, the 10th, best relatively molecular fragment, and 12,8 only take second place; In combination, 1,9,10,12 combinations occur 12 times, and 1,8,9,10 occur 9 times, and 1,9,10,16 occur 2 times, and the two score when getting one and 1,9,10 and making up of this explanation 12,8 differs very little.
Third part is 1,4 combinations basically, twice 3,4 combination have only occurred, and this all is to occur under there is the situation of molecular fragment 4 in first.
In general, first is 1,2,5 (or 3), 9, and second portion is 1,9,10,12 (or 8), and third part is that 1,4 situation has occurred 20 times.Though probability of occurrence has only 83%, because many molecular fragments differ very little on comprehensive grading, under the various combination environment, the score difference of molecular diversity.Therefore drawn the best of breed that a small amount of difference is arranged.As can be seen, even have identical centralized repository, score is also inequality on the score of centralized repository for this point, for example the 1st to 3 computing.This mainly is that other composition with the centralized repository in generation is different because except that best centralized repository, and the relative value that the molecular diversity evaluation was calculated by forming of all centralized repository in the same generation, so the score difference.
The scale that only changes centralized repository (the molecular fragment number becomes 3 * 3 * 2), under the situation of other parameter constant, program run 12 times the results are shown in Table 4:
Table 4 program run result two
Genetic algebra High basin score The molecular fragment of best centralized repository
One Two Three
1 2 3 4 5 6 7 8 9 10 11 12 83 267 80 116 358 124 107 160 211 208 199 155 0.7906 0.7873 0.7897 0.7875 0.7886 0.7832 0.7853 0.7874 0.7866 0.7877 0.7866 0.7888 1 2 9 1 5 9 1 3 9 1 5 9 1 2 9 4 2 9 1 2 9 1 5 9 1 2 9 1 2 9 1 2 9 1 2 9 1 12 10 1 9 10 1 9 10 1 9 10 1 9 10 1 9 10 1 9 10 1 9 10 1 12 10 1 12 10 1 12 10 1 9 10 1 4 3 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4
Roughly the same by conclusion and table 3 that table 4 draws.The first of molecular fragment mainly is 1,2,5,9, and wherein 1,2,9 combinations are dominant; Second portion is 1,9,10,12, and wherein 1,9,10 combinations are dominant; Third part is 1,4 combinations basically, and 3,4 occurred once.
Thus, the present invention can reach a conclusion, though the composition in the best of breed storehouse that each operation result draws is slightly different, all is that centralized repository constitutes preferably.After adjusting the centralized repository scale, also obtained consistent result.
The score of centralized repository changes also little between 0.7734 and 0.7906 in the table 3.In minimum therein 0.7734 the composition, unique one time 1,2,4,9 combinations have appearred in the molecular fragment of first; That score is minimum in the table 4 is the 6th result, and molecular fragment has occurred unusually, and wherein fragment 4 has also appearred in first.
The result of table 3 and table 4 has proved absolutely that this program each run can both find the best subspace in the interblock space, has preferably stability, for the true(-)running of whole procedure provides good basis.Discussion above comprehensive can think that program has realized the function of composite design centralized repository, and operation result has reappearance preferably.
D. select optimized parameter
After the robustness of program run was confirmed, the present invention selected the key parameter of program, found out program best parameter collection.Parameter is selected to concentrate on the parameter relevant with genetic algorithm.Parameter selects experiment one with hybrid rate Pc, and aberration rate Pm is fixed as 0.25 and 0.015 respectively, changes population size N, selects encumbrance K, thereby finds out the optimum value of these two parameters.Because population size changes greatly, the program convergence may be slack-off, therefore the maximum algebraically of heredity is decided to be 3000 here.Test result according to robustness aspect, front, first at construction unit is 1,2,5 (or 3), 9, and second portion is 1,9,10,12 (or 8), and third part is 1,4 o'clock, and restrain under the situation when not reaching maximum genetic algebra, illustrate that parameter is provided with rationally.The 6th row (T) are represented with V in table 5.
Table 5. parameter is selected experiment one
No. system's label Gap restrains algebraically The N population size K selection-encumbrance The Score score Whether T can access correct centralized repository The structural unit that Select blocks selects
1 12 10 0 0.7773 1 5 6 9 6 10 13 15 1 4
2 9 10 1 0.7781 1 4 8 9 7 10 12 16 2 4
3 4 10 2 0.7767 1 3 5 6 3 10 15 16 1 4
4 3000 20 0 0.7543 6 7 8 9 7 9 10 15 1 3
5 372 20 1 0.7866 V 1 2 5 9 1 9 10 12 1 4
6 70 20 2 0.7797 V 1 2 5 9 1 8 9 10 1 4
7 67 20 3 0.7838 1 2 3 9 1 8 9 12 1 4
8 29 20 4 0.7802 1 2 3 9 1 4 8 9 1 4
9 3000 30 0 0.7324 1 3 4 12 3 5 7 11 3 4
10 2731 30 1 0.7864 V 1 2 5 9 1 9 10 12 1 4
11 108 30 2 0.7785 V 1 2 3 9 1 9 10 12 1 4
12 127 30 3 0.7879 V 1 2 5 9 1 9 10 12 1 4
13 20 30 4 0.7710 1 2 3 9 1 3 8 16 1 4
14 40 30 5 0.7870 V 1 2 3 9 1 9 10 12 1 4
15 48 30 6 0.7838 1 2 3 9 1 9 12 16 1 4
16 3000 40 0 0.7753 1 7 9 12 1 7 8 14 1 2
17 3000 40 1 0.7767 1 4 8 9 1 9 10 16 3 4
18 391 40 2 0.7741 V 1 2 5 9 1 8 9 10 1 4
19 676 40 3 0.7751 V 1 2 3 9 1 9 10 12 1 4
20 231 40 4 0.7762 V 1 2 5 9 1 8 9 10 1 4
21 205 40 5 0.7792 V 1 2 3 9 1 9 10 12 1 4
22 78 40 6 0.7786 1 2 3 9 1 10 12 16 1 4
23 15 40 7 0.7722 1 2 9 10 1 10 12 14 1 4
24 37 40 8 0.7785 1 2 3 9 1 9 10 16 3 4
25 3000 50 0 0.7618 1 2 7 10 4 6 9 13 3 4
26 3000 50 1 0.7818 1 2 3 9 2 3 9 10 1 4
27 1116 50 2 0.7767 1 2 8 9 1 9 10 16 1 4
28 1266 50 3 0.7764 V 1 2 5 9 1 8 9 10 1 4
29 427 50 4 0.7813 V 1 2 5 9 1 9 10 12 1 4
30 109 50 5 0.7749 V 1 2 5 9 1 9 10 12 1 4
31 93 50 6 0.7797 V 1 2 5 9 1 8 9 10 1 4
32 50 50 7 0.7755 1 2 5 9 1 9 10 16 1 4
33 73 50 8 0.7736 1 2 8 9 1 9 10 16 3 4
34 118 50 9 0.7771 V 1 2 5 9 1 8 9 10 1 4
35 46 50 10 0.7790 1 2 5 9 1 9 10 16 1 4
36 3000 60 0 0.7719 4 7 8 9 1 7 9 10 1 4
37 3000 60 1 0.7759 1 2 5 9 1 8 9 16 1 4
38 1107 60 2 0.7792 1 2 8 9 1 9 10 16 3 4
39 3000 60 3 0.7775 1 2 8 9 1 9 10 16 3 4
40 2741 60 4 0.7774 1 2 8 9 1 9 10 16 3 4
41 273 60 5 0.7766 V 1 2 5 9 1 8 9 10 1 4
42 347 60 6 0.7733 V 1 2 5 9 1 8 9 10 1 4
43 176 60 7 0.7791 V 1 2 3 9 1 9 10 12 1 4
44 231 60 8 0.7762 V 1 2 5 9 1 8 9 10 1 4
45 158 60 9 0.7763 V 1 2 5 9 1 8 9 10 1 4
46 53 60 10 0.7841 V 1 2 3 9 1 9 10 12 1 4
47 47 60 11 0.7826 V 1 2 3 9 1 9 10 12 1 4
48 130 60 12 0.7795 V 1 2 5 9 1 8 9 10 1 4
49 3000 70 0 0.7620 3 5 7 8 1 4 9 14 2 4
50 3000 70 1 0.7854 1 2 5 9 1 10 12 14 1 4
51 3000 70 2 0.7795 1 2 5 9 1 8 9 10 1 3
52 1677 70 3 0.7745 V 1 2 5 9 1 8 9 10 1 4
53 3000 70 4 0.7762 1 2 5 9 1 8 9 10 1 4
54 3000 70 5 0.7764 1 2 5 9 1 8 9 10 1 4
55 219 70 6 0.7757 1 2 8 9 1 9 10 16 3 4
56 73 70 7 0.7762 V 1 2 5 9 1 8 9 10 1 4
57 67 70 8 0.7762 V 1 2 5 9 1 8 9 10 1 4
58 89 70 9 0.7772 1 2 8 9 1 9 10 16 3 4
59 56 70 10 0.7768 V 1 2 5 9 1 8 9 10 1 4
60 58 70 11 0.7752 1 2 8 9 1 9 10 16 3 4
61 53 70 12 0.7761 1 2 8 9 1 9 10 16 3 4
62 161 70 13 0.7755 1 2 5 9 1 9 10 16 1 4
63 48 70 14 0.7835 1 2 3 9 1 9 10 16 1 4
64 3000 80 0 0.7634 2 5 7 12 1 4 6 11 3 4
65 3000 80 1 0.7761 1 2 5 9 1 8 9 10 1 4
66 3000 80 2 0.7764 1 2 5 9 1 5 8 9 1 4
67 3000 80 3 0.7764 1 2 5 9 1 8 9 10 1 4
68 3000 80 4 0.7760 1 2 8 9 1 9 10 16 3 4
69 3000 80 5 0.7761 1 2 5 9 1 8 9 10 1 4
70 3000 80 6 0.7764 2 5 9 12 1 8 9 10 1 2
71 1690 80 7 0.7762 V 1 2 5 9 1 8 9 10 1 4
72 1483 80 8 0.7762 V 1 2 5 9 1 8 9 10 1 4
73 832 80 9 0.7761 V 1 2 5 9 1 8 9 10 1 4
74 253 80 10 0.7763 V 1 2 5 9 1 8 9 10 1 4
75 443 80 11 0.7762 V 1 2 5 9 1 8 9 10 1 4
76 88 80 12 0.7748 1 2 8 9 1 9 10 16 3 4
77 71 80 13 0.7771 V 1 2 5 9 1 8 9 10 1 4
78 89 80 14 0.7760 1 2 8 9 1 9 10 16 3 4
79 117 80 15 0.7764 V 1 2 5 9 1 8 9 10 1 4
80 44 80 16 0.7748 1 2 8 9 1 9 10 16 3 4
The test result of table 5 shows:
(1) centralized repository (promptly have " V ") that can obtain correct result is distributed in 5% to 20% the scope that K value is N substantially.Have only two exceptions in 34.
(2) population size N is big more, and convergence in mean algebraically is big more.
(3) when N is constant, the big more easy more convergence of K, when K greater than N 20% the time, be difficult to find globally optimal solution.
(4) the N value was less than 20 o'clock, because population size is too little, sample is insufficient, searched for point very little, and the result is unreliable.
From top result as can be known, at population size N=30,, be a population of parameters preferably selecting encumbrance K=1~5 these intervals, can obtain result preferably.Though when population size is bigger, population of parameters is preferably arranged also, computing time is oversize and memory consumption is too big.It is considered herein that population size N=30, selecting encumbrance K=3 is optimum parameter value.
It is to carry out on parameter is selected the conclusion of experiment one that parameter is selected experiment two, is about to population size N, selects encumbrance K to be fixed as 30 and 3 respectively, changes hybrid rate Pc, aberration rate Pm, thus find out the optimal value of these two parameters.Here maximum genetic algebra is made as 2000.
Table 6. parameter is selected experiment two
No is identical with table 5 Gap Pc Pm Score T Select blocks
1 10 0.1 0.005 0.7545 1 5 9 11 1 2 12 16 1 4
2 20 0.15 0.005 0.7775 3 4 8 9 3 9 10 12 1 4
3 39 0.15 0.01 0.7799 1 3 5 9 1 9 12 16 1 4
4 58 0.2 0.005 0.7687 1 5 8 9 1 8 10 14 3 4
5 40 0.2 0.01 0.7783 V 1 2 5 9 1 9 10 12 1 4
6 45 0.2 0.015 0.7769 2 3 5 9 6 9 10 12 1 4
7 22 0.25 0.005 0.7810 2 3 8 9 1 9 10 16 3 4
8 113 0.25 0.01 0.7754 V 1 2 5 9 1 9 10 12 1 4
9 50 0.25 0.015 0.7826 V 1 2 5 9 9 10 12 14 1 4
10 601 0.25 0.02 0.7766 V 1 2 5 9 1 9 10 12 1 4
11 40 0.3 0.005 0.7797 1 2 6 9 1 10 12 14 1 4
12 48 0.3 0.01 0.7743 1 2 5 9 6 9 10 16 3 4
13 60 0.3 0.015 0.7902 V 1 2 3 9 1 9 10 12 1 4
14 815 0.3 0.02 0.7763 V 1 2 5 9 1 8 9 10 1 4
15 1782 0.3 0.025 0.7816 V 1 2 3 9 1 9 10 12 1 4
16 32 0.35 0.005 0.7687 4 8 9 10 1 10 15 16 2 4
17 33 0.35 0.01 0.7765 V 1 2 3 9 1 9 10 12 1 4
18 59 0.35 0.015 0.7781 1 2 3 9 9 10 12 16 3 4
19 1161 0.35 0.02 0.7797 V 1 2 5 9 1 8 9 10 1 4
20 2000 0.35 0.025 0.7815 1 2 3 9 1 9 10 12 1 4
21 2000 0.35 0.03 0.7797 1 2 5 9 1 9 10 12 1 3
22 50 0.4 0.005 0.7737 1 4 8 9 1 8 9 16 3 4
23 83 0.4 0.01 0.7771 1 3 8 9 1 6 12 16 1 4
24 23 0.4 0.015 0.7875 1 3 8 9 1 3 9 10 1 4
25 997 0.4 0.02 0.7767 V 1 2 3 9 1 9 10 12 1 4
26 2000 0.4 0.025 0.7760 1 2 8 9 1 9 10 16 1 4
27 936 0.4 0.03 0.7813 V 1 2 3 9 1 9 10 12 1 4
28 2000 0.4 0.035 0.7725 1 2 3 9 1 9 10 12 1 4
29 19 0.45 0.005 0.7691 1 7 8 9 3 9 14 16 1 4
30 145 0.45 0.01 0.7816 V 1 2 3 9 1 9 10 12 1 4
31 28 0.45 0.015 0.7762 1 2 5 9 1 6 8 10 1 4
32 310 0.45 0.02 0.7752 V 1 2 3 9 1 9 10 12 1 4
33 173 0.45 0.025 0.7770 V 1 2 3 9 1 9 10 12 1 4
34 2000 0.45 0.03 0.7765 1 2 5 9 1 8 9 10 1 4
35 2000 0.45 0.035 0.7759 1 2 3 9 1 9 10 12 1 4
36 2000 0.45 0.04 0.7787 1 2 3 9 1 9 10 16 3 4
37 77 0.5 0.005 0.7803 1 5 8 9 1 9 10 12 1 4
38 51 0.5 0.01 0.7865 1 2 4 9 1 9 12 16 1 4
39 144 0.5 0.015 0.7827 1 2 3 9 1 9 10 16 1 4
40 762 0.5 0.02 0.7816 1 2 8 9 1 9 10 16 3 4
41 1296 0.5 0.025 0.7795 V 1 2 5 9 1 8 9 10 1 4
42 2000 0.5 0.03 0.7801 1 2 3 9 1 9 10 12 1 4
43 2000 0.5 0.035 0.7797 2 3 5 9 1 8 9 10 2 4
44 2000 0.5 0.04 0.7756 1 2 5 9 1 9 10 14 3 4
45 2000 0.5 0.045 0.7787 1 2 3 9 1 9 10 16 3 4
46 32 0.55 0.005 0.7789 3 5 8 9 6 9 10 14 1 4
47 97 0.55 0.01 0.7831 V 1 2 3 9 1 8 9 12 1 4
48 33 0.55 0.015 0.7770 1 3 8 9 9 10 11 12 1 4
49 209 0.55 0.02 0.7816 V 1 2 3 9 1 9 10 12 1 4
50 754 0.55 0.025 0.7816 V 1 2 3 9 1 9 10 12 1 4
51 2000 0.55 0.03 0.7718 1 2 5 10 1 9 10 16 1 4
52 2000 0.55 0.035 0.7791 5 6 8 9 1 9 10 16 2 3
53 2000 0.55 0.04 0.7854 1 2 3 9 1 9 10 12 1 4
54 2000 0.55 0.045 0.7722 1 2 3 9 1 9 10 15 3 4
55 11 0.6 0.005 0.7751 1 4 5 9 1 7 9 10 1 4
56 131 0.6 0.01 0.7832 V 1 2 5 9 1 9 10 12 1 4
57 191 0.6 0.015 0.7778 V 1 2 3 9 1 8 9 10 1 4
58 84 0.6 0.02 0.7815 2 4 5 9 1 9 10 14 1 4
59 421 0.6 0.025 0.7797 V 1 2 5 9 1 8 9 10 1 4
60 2000 0.6 0.03 0.7755 1 2 5 9 1 9 10 16 1 4
61 2000 0.6 0.035 0.7819 1 2 3 9 1 8 9 12 1 4
62 2000 0.6 0.04 0.7757 1 4 8 9 1 9 10 15 2 4
63 2000 0.6 0.045 0.7851 1 2 5 9 6 9 10 12 1 4
64 36 0.65 0.005 0.7800 2 5 7 9 1 6 10 14 1 4
65 125 0.65 0.01 0.7767 1 4 8 9 1 9 10 16 3 4
66 150 0.65 0.015 0.7756 1 2 5 9 1 9 10 16 1 4
67 394 0.65 0.02 0.7789 V 1 2 5 9 1 8 9 10 1 4
68 2000 0.65 0.025 0.7761 1 2 5 9 1 8 9 10 1 4
69 2000 0.65 0.03 0.7783 2 5 6 9 1 2 9 10 2 4
70 2000 0.65 0.035 0.7760 1 2 8 9 1 9 10 16 3 4
71 2000 0.65 0.04 0.7765 1 4 8 9 10 12 13 16 3 4
72 2000 0.65 0.045 0.7765 2 3 4 9 1 10 12 16 3 4
73 48 0.7 0.005 0.7784 1 2 5 9 1 10 12 16 1 4
74 41 0.7 0.01 0.7741 1 3 4 9 1 4 8 10 1 4
75 54 0.7 0.015 0.7801 1 2 5 9 1 10 12 14 1 4
76 398 0.7 0.02 0.7796 V 1 2 5 9 1 8 9 10 1 4
77 453 0.7 0.025 0.7734 V 1 2 3 9 1 9 10 12 1 4
78 2000 0.7 0.03 0.7760 1 2 5 9 2 7 8 10 1 4
79 2000 0.7 0.035 0.7764 1 5 6 9 1 3 9 10 1 4
80 2000 0.7 0.04 0.7756 1 2 3 9 1 6 9 10 1 4
81 2000 0.7 0.045 0.7760 1 2 5 9 2 9 15 16 1 4
82 26 0.75 0.005 0.7694 1 4 6 9 1 8 14 16 1 4
83 262 0.75 0.01 0.7870 V 1 2 3 9 1 9 10 12 1 4
84 314 0.75 0.015 0.7797 V 1 2 5 9 1 8 9 10 1 4
85 612 0.75 0.02 0.7816 V 1 2 5 9 1 9 10 12 1 4
86 2000 0.75 0.025 0.7748 1 2 3 9 1 9 10 12 1 4
87 2000 0.75 0.03 0.7746 1 2 7 8 1 9 10 16 3 4
88 2000 0.75 0.035 0.7773 1 4 8 9 1 9 10 16 3 4
89 2000 0.75 0.04 0.7746 1 2 8 9 9 10 12 15 1 4
90 2000 0.75 0.045 0.7719 2 3 8 11 9 10 12 16 1 4
91 28 0.8 0.005 0.7805 2 3 4 9 1 9 10 16 3 4
92 46 0.8 0.01 0.7763 V 1 2 5 9 1 8 9 10 1 4
93 86 0.8 0.015 0.7826 V 1 2 5 9 1 9 10 12 1 4
94 1989 0.8 0.02 0.7738 V 1 2 5 9 1 8 9 10 1 4
95 1471 0.8 0.025 0.7760 V 1 2 5 9 1 8 9 10 1 4
96 2000 0.8 0.03 0.7769 1 2 5 9 1 9 10 12 3 4
97 2000 0.8 0.035 0.7792 1 9 10 12 1 9 10 16 2 3
98 2000 0.8 0.04 0.7774 1 2 7 9 1 9 10 16 1 4
99 2000 0.8 0.045 0.7815 1 2 5 9 1 9 10 15 1 4
The test result of table 6 shows:
(1) centralized repository (promptly have " V ") that can access correct result all is distributed in Pm value 0.01 ~ 0.03 scope, and too big or too little all to be difficult to converge to optimum efficiency .Pm be 0.01,0.015,0.02, and 0.025,0.03 occurrence number is respectively 7,5, and 10,8,1.
(2) there is not obvious relation between Pc and the Pm.
(3) Pc because the intersection probability is too little, was difficult to obtain optimal result less than 0.25 o'clock.
(4) Pm was greater than 0.3 o'clock, and program is difficult to convergence.
From analysis result as can be known, the Pm value is 0.01 ~ 0.025, and the Pc value is that 0.3 ~ 0.5 scope is a population of parameters preferably.See that back in the robustness test in front, the present invention gets population size N=30, selects encumbrance K=3, hybrid rate Pc=0.25, Pm=0.015 are preferable parameter sets.
3.4 example
Be an example that obtains centralized repository about definite PPAR γ as the albumen target below:
The selection of molecular fragment in design by combined route in preceding two parts and the molecule construction unit, the present invention has selected three molecule construction unit, and the reaction step number was two steps, and route is first A+B, again+C.The molecular fragment number of each construction unit is respectively 118,88,98.Target structure in the molecular docking serves as that the basis obtains with PPAR γ crystal structure (PDB numbers 2PRG).In the molecular docking parameter, the conformation number that the orientation number of " anchor " and the segment that increases newly are selected all is 50.Other parameter of program all Uses Defaults.
Result and discussion
The interblock space of PPAR gamma agonist is very big, and the scale that the present invention will make up centralized repository accordingly adds greatly 10 * 10 * 10.Program run finished after 434 generations, the results are shown in table 7.
The optimum centralized repository of table 7.PPAR gamma agonist is formed
Building Blocks of optimal focus library forms the selected structural unit of optimum centralized repository
A 1 4 6 40 48 50 51 63 64 114
B 6 8 14 14 19 20 23 38 60 74
C 6 12 13 14 21 33 43 58 71 75
The present invention finds by the composition analysis to centralized repository, in polar head A part, the most important thing is to form the hydrophilic radical of hydrogen bond.Molecular fragment 1 is exactly TZD head (thiazolidinedione), and its two oxygen atoms and nitrogen-atoms can both form hydrogen bond with acceptor.Other fragment also has this architectural feature.Similar structures about nitro head, carboxyl head and TZD all occurred here.A in the structural formula 5 is meant the connection site (down together) between head and the center section.
Figure A20041005310200361
Structural formula 5
Center section B mainly has been the effect that connects end to end.From crystal structure as can be known, this part of acceptor is a flat passage, is fit to hold the rigid structure of the almost plane of phenyl ring shape.The molecular fragment that centralized repository is selected for use partly is the structure that contains phenyl ring; Some is the five-ring heterocycles of sulfur-bearing.In the structural formula 6, molecular fragment 38 and 60 is a pair of enantiomters, and the carbon atom of connection site has chirality.But both are selected simultaneously, and less demanding to chirality is described herein.Both have a chain alkyl, and from the structure picture of molecular docking, it stretches in the long and narrow cavity of the acceptor in the hydrophobic channel opposite location, increased and acceptor between binding energy.Q in the structural formula 6 is meant the connection site (down together) between afterbody and the center section
Structural formula 6
The otherness maximum of hydrophobic tail C part, its common ground is to have hydrophobic aromatic rings.But also just like the 58 such hydrogen bond receptors of molecular fragment in the structural formula 7, may cause conformational inversion in molecular docking, promptly afterbody enters hydrophilic pocket position.
Figure A20041005310200372
Structural formula 7
The polar head part of molecular fragment in analyzing optimum centralized repository, the present invention has found effect polar head (48) preferably.The molecule DC-14 that contains this polar head has higher activity (seeing Table 8) in the test of molecular level.In analysis to hydrophobic tail, find that also the compound that contains three brand new afterbodys has very high activity, wherein the activity of DC-E15, DC-E57 is suitable with positive control GI262570, and concrete test data sees Table 8.The test of cellular level shows that also these several compounds have hypoglycemic activity and promote the ability that adipocyte breaks up.
Compound DC-14 DC-E15 DC-E57 DC-E86
Active (Activity) suppresses constant (Ki) 4.7×10 -7 Effective concentration (EC) ≈ 5 * 10 -8 Effective concentration (EC) ≈ 5 * 10 -8 5.8×10 -7
Above result proves absolutely that library is effectively, reliably in the design to the PPAR gamma agonist in the combination of sets.The PPAR gamma agonist of using some novel structures that library is found in the combination of sets is synthesized, may search out the active better PPAR gamma agonist of novel structure.
4. whole the commentary
By system testing and application calculation case study explanation, the method of the present invention combines genetic algorithm with combinational chemistry combination centralized repository design and optimization, can be used for the brand-new medicine design studies based on big molecular target three-dimensional structure, the application in drug research provides new method for combinatorial chemistry and area of computer aided drug design.
The design and optimization method of the area of computer aided combinatorial chemistry centralized repository based on the biomacromolecule three-dimensional structure of the present invention and the operation result explanation by the calculation example, the program that the present invention worked out realizes making up the design function of centralized repository substantially, finish the main every function in the program principle block diagram, and can accurately from whole combinatorial chemistry space, search out optimal subspace efficiently.Simultaneously, operation result has good reappearance and reliability.Utilize virtual portfolio chemistry centralized repository of the present invention, can well optimize the selection of the molecular fragment that is used to make up, dwindle the scale of actual combinatorial chemistry centralized repository of synthesizing greatly, improve the probability of finding outstanding lead compound, reduce cost.
The present invention has adopted object-oriented method and modular design, has good extendibility.

Claims (9)

1, a kind of combinatorial chemistry centralized repository design and optimization method is the method that a kind of design, foundation and evaluation of virtual portfolio chemistry centralized repository software package are optimized, and it is characterized in that method step comprises:
<1〉at first design based on the virtual portfolio of target and tentatively select the storehouse, comprising: determine combination synthetic at target proteins; The composite design synthetic route is selected the combination construction unit; The unit member set of each the basic building unit during selected again combination is synthetic;
<2〉second step was set up virtual portfolio preliminary screening storehouse according to above-mentioned design, comprising: according to the combinatorial chemistry reaction, read the unit of appointment from specify tectonic element, the corresponding virtual portfolio preliminary screening of member storehouse; Then the molecular conformation in the virtual portfolio preliminary screening storehouse that obtains is optimized;
<3〉the 3rd step was estimated the virtual portfolio preliminary screening storehouse of above-mentioned foundation and optimized, and comprising: the appraisement system of at first setting up modular structure; Adopt genetic algorithm that the compound in virtual portfolio preliminary screening storehouse is optimized then.
2, combinatorial chemistry centralized repository design and optimization method according to claim 1; it is characterized in that said combinatorial chemistry reaction is a 16th National Congress of Communist Party of China class solid phase synthesis; comprise: anchor reaction, amino binding reaction, fragrant substitution reaction; condensation reaction, cycloaddition reaction, grignard reaction, Michael addition reaction, heterocycle reaction of formation, multi-component reaction, alkene reaction of formation, oxidation reaction, reduction reaction, non-fragrant substitution reaction, protection and deprotection reaction, solid phase organic synthesis and cracking reaction.
3, combinatorial chemistry centralized repository design and optimization method according to claim 1 is characterized in that the space search algorithm that the molecular conformation in the virtual portfolio preliminary screening storehouse that makes up adopts molecular force field and simplicial method and conjugate direction method to combine is advanced to optimize.
4, combinatorial chemistry centralized repository design and optimization method according to claim 1, the appraisement system that it is characterized in that setting up modular structure comprises to be set up the molecular activity evaluation module, set up molecule quasi-medicated property evaluation module, sets up the molecular diversity evaluation module and sets up bad pharmacokinetic property or unfavorable toxic reaction evaluation module.
5, combinatorial chemistry centralized repository design and optimization method according to claim 4, the evaluation method that it is characterized in that the molecular activity evaluation module set up are to adopt the molecular docking method.
6, combinatorial chemistry centralized repository design and optimization method according to claim 4, the evaluation method that it is characterized in that the molecular diversity evaluation module set up is to adopt the description of structure diversity, selects 40 kinds of descriptors that contain the molecular polarization surface area parameters for use.
7, combinatorial chemistry centralized repository design and optimization method according to claim 4, the evaluation method that it is characterized in that the molecule quasi-medicated property evaluation module set up select to comprise 7 kinds of descriptors of the ratio of the ratio of ratio, hydrogen atom number and the heavy atom number except that halogen atom of n-octyl alcohol/water partition coefficient, molecular weight, hydrogen bond receptor, hydrogen bond donor, saturated carbon atom number and the heavy atom number except that halogen atom of molecular structure ratio descriptor and the number of keys between molecule degree of unsaturation and the heavy atom except that halogen atom.
8, combinatorial chemistry centralized repository design and optimization method according to claim 1 is characterized in that said employing genetic algorithm is to select metric coding method for use.
9, combinatorial chemistry centralized repository design and optimization method according to claim 1 is characterized in that the design of virtual portfolio chemistry centralized repository software package is set up with evaluation optimization and selected for use C Plus Plus to operate.
CNB2004100531026A 2004-07-23 2004-07-23 Design and optimization method of combined chemical central base Expired - Fee Related CN100362519C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100531026A CN100362519C (en) 2004-07-23 2004-07-23 Design and optimization method of combined chemical central base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100531026A CN100362519C (en) 2004-07-23 2004-07-23 Design and optimization method of combined chemical central base

Publications (2)

Publication Number Publication Date
CN1725222A true CN1725222A (en) 2006-01-25
CN100362519C CN100362519C (en) 2008-01-16

Family

ID=35924693

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100531026A Expired - Fee Related CN100362519C (en) 2004-07-23 2004-07-23 Design and optimization method of combined chemical central base

Country Status (1)

Country Link
CN (1) CN100362519C (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329698B (en) * 2008-07-31 2010-06-16 四川大学 Novel medicament molecule construction method based on pharmacophore model
CN101916330A (en) * 2010-08-06 2010-12-15 辽宁大学 Virtual screening method for novel cancer-preventing or anti-cancer medicament by taking Keap1 as target point
CN102117370A (en) * 2011-03-25 2011-07-06 西安近代化学研究所 Method for virtually synthesizing azacyclo-energetic compound based on MOL (machine-oriented language) file format
CN101131391B (en) * 2006-08-24 2011-07-20 中国科学院上海药物研究所 Gene toxicity probability forecasting method based on molecule electrophilic vector and extend supporting vector machine
CN102646171A (en) * 2011-04-11 2012-08-22 闫京波 Application of multidimensional matrix used for molecular design of drug-like compounds and method of molecular design of drug-like compounds
CN102663214A (en) * 2012-05-09 2012-09-12 四川大学 Construction and prediction method of integrated drug target prediction system
CN102663249A (en) * 2011-04-11 2012-09-12 闫京波 Method for designing medicine building block by referring to target compound via multi-dimensional matrix and application thereof
CN102693356A (en) * 2011-04-11 2012-09-26 闫京波 Application of multidimensional matrix used for medical molecule design and medical molecule design method
CN104965998A (en) * 2015-05-29 2015-10-07 华中农业大学 Screening method for multi-target drugs and/or pharmaceutical combinations
CN104021265B (en) * 2013-03-01 2017-02-22 上海交通大学 Complex system reaction access calculating system and implementing method thereof
CN109712685A (en) * 2019-01-24 2019-05-03 湘潭大学 A kind of prescription medicament construction method and system based on multi-objective Evolutionary Algorithm
CN110379468A (en) * 2019-07-17 2019-10-25 成都火石创造科技有限公司 A kind of improved chemical molecular formula cutting method
CN110875085A (en) * 2018-09-03 2020-03-10 中国石油化工股份有限公司 Method for efficiently optimizing molecular structures in batches
CN111402966A (en) * 2020-03-06 2020-07-10 华东师范大学 Fingerprint design method for describing properties of small molecule fragments based on small molecule three-dimensional structure
WO2021103516A1 (en) * 2020-06-28 2021-06-03 深圳晶泰科技有限公司 System and method for virtual drug screening for crystalline complexes
WO2021103469A1 (en) * 2020-05-29 2021-06-03 深圳晶泰科技有限公司 Atom sequence rearrangement method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001292740A1 (en) * 2000-09-20 2002-04-02 Dimitris K. Agrafiotis Method, system, and computer program product for encoding and building products of a virtual combinatorial library

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131391B (en) * 2006-08-24 2011-07-20 中国科学院上海药物研究所 Gene toxicity probability forecasting method based on molecule electrophilic vector and extend supporting vector machine
CN101329698B (en) * 2008-07-31 2010-06-16 四川大学 Novel medicament molecule construction method based on pharmacophore model
CN101916330A (en) * 2010-08-06 2010-12-15 辽宁大学 Virtual screening method for novel cancer-preventing or anti-cancer medicament by taking Keap1 as target point
CN101916330B (en) * 2010-08-06 2012-06-20 辽宁大学 Virtual screening method for novel cancer-preventing or anti-cancer medicament by taking Keap1 as target point
CN102117370A (en) * 2011-03-25 2011-07-06 西安近代化学研究所 Method for virtually synthesizing azacyclo-energetic compound based on MOL (machine-oriented language) file format
CN102117370B (en) * 2011-03-25 2012-05-30 西安近代化学研究所 Method for virtually synthesizing azacyclo-energetic compound based on MOL (machine-oriented language) file format
CN102693356B (en) * 2011-04-11 2015-05-27 闫京波 Application of multidimensional matrix used for medical molecule design and medical molecule design method
CN102663249A (en) * 2011-04-11 2012-09-12 闫京波 Method for designing medicine building block by referring to target compound via multi-dimensional matrix and application thereof
CN102693356A (en) * 2011-04-11 2012-09-26 闫京波 Application of multidimensional matrix used for medical molecule design and medical molecule design method
CN102646171B (en) * 2011-04-11 2014-12-10 闫京波 Application of multidimensional matrix used for molecular design of drug-like compounds and method of molecular design of drug-like compounds
CN102646171A (en) * 2011-04-11 2012-08-22 闫京波 Application of multidimensional matrix used for molecular design of drug-like compounds and method of molecular design of drug-like compounds
CN102663249B (en) * 2011-04-11 2015-11-25 闫京波 Multi-dimensional matrix reference object compound is adopted to carry out method and the application thereof of medicine framework compound design
CN102663214B (en) * 2012-05-09 2013-11-06 四川大学 Construction and prediction method of integrated drug target prediction system
CN102663214A (en) * 2012-05-09 2012-09-12 四川大学 Construction and prediction method of integrated drug target prediction system
CN104021265B (en) * 2013-03-01 2017-02-22 上海交通大学 Complex system reaction access calculating system and implementing method thereof
CN104965998A (en) * 2015-05-29 2015-10-07 华中农业大学 Screening method for multi-target drugs and/or pharmaceutical combinations
CN104965998B (en) * 2015-05-29 2017-09-15 华中农业大学 The screening technique of many target agents and/or drug regimen
CN110875085A (en) * 2018-09-03 2020-03-10 中国石油化工股份有限公司 Method for efficiently optimizing molecular structures in batches
CN110875085B (en) * 2018-09-03 2022-07-29 中国石油化工股份有限公司 Method for efficiently optimizing molecular structure in batches
CN109712685A (en) * 2019-01-24 2019-05-03 湘潭大学 A kind of prescription medicament construction method and system based on multi-objective Evolutionary Algorithm
CN109712685B (en) * 2019-01-24 2020-11-06 湘潭大学 Prescription and medicament construction method and system based on multi-objective evolutionary algorithm
CN110379468A (en) * 2019-07-17 2019-10-25 成都火石创造科技有限公司 A kind of improved chemical molecular formula cutting method
CN110379468B (en) * 2019-07-17 2022-08-23 成都火石创造科技有限公司 Improved chemical molecular formula segmentation method
CN111402966A (en) * 2020-03-06 2020-07-10 华东师范大学 Fingerprint design method for describing properties of small molecule fragments based on small molecule three-dimensional structure
CN111402966B (en) * 2020-03-06 2022-08-19 华东师范大学 Fingerprint design method for describing properties of small molecule fragments based on small molecule three-dimensional structure
WO2021103469A1 (en) * 2020-05-29 2021-06-03 深圳晶泰科技有限公司 Atom sequence rearrangement method
WO2021103516A1 (en) * 2020-06-28 2021-06-03 深圳晶泰科技有限公司 System and method for virtual drug screening for crystalline complexes

Also Published As

Publication number Publication date
CN100362519C (en) 2008-01-16

Similar Documents

Publication Publication Date Title
CN1725222A (en) Combinatorial chemistry centralized repository design and optimization method
JP4776146B2 (en) Method and system for modeling cellular metabolism
Croes et al. Inferring meaningful pathways in weighted metabolic networks
Croitoru et al. Additive CHARMM36 force field for nonstandard amino acids
Marques et al. Web-based tools for computational enzyme design
Cole et al. Exploiting models of molecular evolution to efficiently direct protein engineering
Norin et al. Structural proteomics: developments in structure-to-function predictions
US20060177865A1 (en) Computational method for designing enzymes for incorporation of amino acid analogs into proteins
CN1592852A (en) Biological discovery using gene regulatory networks generated from multiple-disruption expression libraries
CN1533400A (en) Probes, system and methods for drug discovery
Prywes et al. Rubisco function, evolution, and engineering
Tang et al. Metabolic flux analysis of Shewanella spp. reveals evolutionary robustness in central carbon metabolism
CN1493051A (en) Method for operating a computer system to perform discrete substructural analysis
WO2005001736A2 (en) Intracellular metabolic flux analysis method using substrate labeled with isotope
Linial et al. Methodologies for target selection in structural genomics
Gmelch et al. Molecular dynamics analysis of a rationally designed aldehyde dehydrogenase gives insights into improved activity for the non-native cofactor NAD+
Yan et al. IntEnzyDB: an Integrated Structure–Kinetics Enzymology Database
US20230073351A1 (en) Selecting biological sequences for screening to identify sequences that perform a desired function
Sveshnikova et al. ARBRE: Computational resource to predict pathways towards industrially important aromatic compounds
Yang et al. Mutexa: a computational ecosystem for intelligent protein engineering
Vankayala et al. Elucidating a chemical defense mechanism of Antarctic sponges: a computational study
Klapa et al. The quest for the mechanisms of life
Boojari et al. Developing a metabolic model‐based fed‐batch feeding strategy for Pichia pastoris fermentation through fine‐tuning of the methanol utilization pathway
Nilmeier et al. 3D Motifs
KRISHNARAJ Enzyme-substrate interaction based approach for screening electroactive microorganisms for Microbial Fuel Cell applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080116

Termination date: 20130723