CN101789047A - Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis - Google Patents

Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis Download PDF

Info

Publication number
CN101789047A
CN101789047A CN201010106648A CN201010106648A CN101789047A CN 101789047 A CN101789047 A CN 101789047A CN 201010106648 A CN201010106648 A CN 201010106648A CN 201010106648 A CN201010106648 A CN 201010106648A CN 101789047 A CN101789047 A CN 101789047A
Authority
CN
China
Prior art keywords
reaction
molecule
synthetic route
route
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201010106648A
Other languages
Chinese (zh)
Other versions
CN101789047B (en
Inventor
杨胜勇
黄奇
李琳丽
郑仁林
魏于全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN2010101066489A priority Critical patent/CN101789047B/en
Publication of CN101789047A publication Critical patent/CN101789047A/en
Application granted granted Critical
Publication of CN101789047B publication Critical patent/CN101789047B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for evaluating the synthesization of organic small-molecule compounds based on reverse synthesis, relating to the molecular design field of computer-aided drugs. The method comprises the following steps of: establishing a raw material database, a transformation rule database and a fixed route database, carrying out reverse synthesis analysis on the target compounds by applying the three database to automatically generate the synthesis route of the organic small-molecule compounds, optimizing the generation of a reverse synthesis analyzing tree in a splitting process, and deleting nodes in the reverse synthesis analyzing tree in advance. The splitting difficulty of the compounds and the realizing difficulty of the synthesis route are simultaneously evaluated after the synthesis route is generated. The program process comprises the steps of identifying input modules, reading the data of the raw material database, the transformation rule database and the fixed route database, generating the synthesis route in the mode of optimizing the reverse synthesis analyzing tree and grading the synthesization. The invention solves the problem of bottleneck existing in the development of the traditional method that the compound is designed from the begining, effectively shortens the operation time and provides effective and accurate synthesization evaluation of the compounds.

Description

Based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method
Affiliated technical field
The present invention relates to area of computer aided SARS drug design field, particularly a kind of based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method.
Background technology
Since the nineties in 20th century, subject development such as computer technology, chemistry, molecular biology, bioinformatics have promoted area of computer aided SARS drug design development of technology greatly, many new area of computer aided SARS drug design method emerge in multitudes, the area of computer aided SARS drug design has developed into a perfect relatively emerging research field at present.
Area of computer aided SARS drug design method mainly comprises methods such as molecular docking, from the beginning design, structure-activity relationship and Pharmacophore Model.Under the known situation of the three-dimensional structure of target molecule; general adopt molecular docking and from head design method; the organic micromolecule compound that can have best geometry and physicochemical property coupling with target molecule is sought in molecular docking from the compound known database; it is known that its shortcoming is that the compound that finds is, and do not have novelty or be subjected to patent protection.From the beginning design is directly to adopt the method for fragment growth or assembling to carry out the splicing of compound molecule in the avtive spot of target molecule, and designed compound has novelty.Can produce brand-new drug molecule from head design method, but, therefore at first need it is synthesized, just can carry out follow-up pharmacological evaluation checking then because these brand-new molecules all are non-existent.Because the generation of molecule is the result of different fragments combination, the molecule number that obtains at last is quite huge.These molecules all are synthesized, and to experimentize be a process that extremely consumes manpower, financial resources and time, almost can not realize.Therefore when practical operation, can therefrom select the synthetic easily molecule of fraction and synthesize.But will pick out the synthetic easily molecule of dozens of from thousands of compounds, also be very a difficulty and a very long job consuming time.If can be by Computer Applied Technology, the quick evaluation that solves the synthesizing property of compound has immeasurable positive role undoubtedly.
Summary of the invention
The objective of the invention is to utilize computer-aided design (CAD), provide a kind of and can synthesize evaluation method, solve the bottleneck problem of existing compound from the head design method development based on reverse synthetic organic micromolecule compound quickly and accurately.
Basic ideas of the present invention are: by the evaluating objects molecular structure, it is disassembled the precursor that obtains for simpler, easier.Then, precursor is progressively disassembled as new target molecule, to the last the precursor in a step can directly be bought again.Wherein, the precursor of final step is called raw material, and middle precursor is called intermediate.After reverse synthesis analysis finished, the precursor that fractionation is obtained or its equivalent combined by synthetic reaction again conversely.According to the result who splits, backstepping progressively finally obtains the synthetic route of expecting target compound from former, and on this basis the synthesizing property of compound is estimated.The present invention generates the synthetic route of compound automatically under computer program control.Generate in the process of synthetic route at reverse synthesis analysis, simultaneously the retrosynthetic analysis tree is optimized, useless synthesis step is rejected as early as possible, split again as target compound with the raw material of avoiding useless synthesis step.After synthetic route generates, the fractionation difficulty of compound and the realization difficulty of synthetic route are estimated, in the hope of accurate synthesizing property of assessing compound.
Main theory of the present invention is according to based on following 4 points: during synthesizing property of (1) chemist's assessing compound, can split and design synthetic route to target compound, mark according to the difficulty or ease that split.With comparing of using in the past, can obtain result the most accurately based on reverse synthetic evaluation method based on the molecule complexity or based on the method for raw material.(2) along with the development of Chemoinformatics, a large amount of compound databases and organic reaction database engender that these databases are that abundant, practical more basis is established in computer-aided design (CAD).(3) when using reverse synthetic method and generate synthetic route automatically, can obtain a retrosynthetic analysis tree, except the summit, other nodes in the tree are represented the intermediate or the raw material that obtain in the reverse synthesis analysis process.Because target molecule and intermediate product may have a plurality of fractionation points, perhaps a fractionation point can have multiple transformation rule to be suitable for, can produce shot array when computing machine splits compound automatically, this also is reverse synthetic method long problem consuming time when being applied to the evaluation of synthesizing property.But in the numerous routes of this of generation, the overwhelming majority is useless.Therefore, in split process, simultaneously the retrosynthetic analysis tree is deleted, to solve the problem of operation time.(4) the synthetic difficulty of compound not only is the difficulty that splits, also is the realization difficulty of synthetic route.Simultaneously these two parts are estimated, can be obtained appraisal result more accurately.
The objective of the invention is to reach like this: collect various common chemical reagent and set up raw database, the organic name reaction who collects organic synthesis commonly used and classics sets up the transformation rule database, collects the synthetic route of the compound molecule of having reported and sets up the route database.Use this three databases, target compound is carried out reverse synthesis analysis, generate the synthetic route of organic micromolecule compound automatically, in split process, the generation that retrosynthetic analysis is set is optimized processing, deletes the node in the retrosynthetic analysis tree in advance.After synthetic route generates, simultaneously the fractionation difficulty of compound and the realization difficulty of synthetic route are estimated.Program circuit comprises following 4 steps: 1) identification input molecule; The two-dimensional structure of molecule is converted to procedure identification after the Hash numerical coding of 64 word lengths of one dimension of unique computer Recognition; 2) read raw database, transformation rule database and route database data; For the chemical reagent data of collecting, represent with the Hash numerical coding; For the organic reaction data of collecting, extract a reaction center from the transformation rule database of reaction, for the synthetic route data of known compound molecule, after taking to simplify earlier again the step process at abstraction reaction center and becoming after the data structure that program can use again row read; 3) generate synthetic route to optimize retrosynthetic analysis tree mode; Computed figure matching algorithm is handled the chemical constitution of compound as the data structure of figure: 4) to the scoring of synthesizing property; To the number of effective synthetic route and the comprehensive grading of route realization difficulty.
The concrete steps of identification input molecule are: computing machine at first reads in the target molecule file of needs assessment, divide the essential information of son file with MOL2 or SD representation of file molecule, comprise, atom number, the number of key, each atom element type and three-dimensional coordinate, the key type of each key with become the key atom, after reading in essential information, judge whether each atom is in the saturated structures of valence state, to the automatic hydrogenation of unsaturated atom, behind the hydrogenation, by above-mentioned essential information, the topological structure of recognition objective molecule, the connection degree, the structure of functional group and ring, at last, by above-mentioned all information, the two-dimensional structure of molecule is converted to the Hash numerical coding of 64 word lengths of one dimension, this molecule two-dimensional structure of unique expression of encoding, a two-dimensional structure also can only be converted to a unique numerical coding simultaneously.
Read the concrete steps of raw database, transformation rule database and route database data: the raw database data that pre-deposit the various common chemical reagent of collection in the computing machine, the transformation rule database data of the organic synthesis of collecting commonly used and classical organic name reaction's data, collect the synthetic route of the known compound molecule of report and set up route data data, these three data database datas are converted to respectively in such a way go reading of data after the data structure that program can use again:
For the chemical reagent data of collecting, to each data wherein according to the identical method of identification input molecule, convert 64 long Hash numerical codings to, replenish price and the CAS number information that goes up raw material simultaneously;
For the organic reaction data of collecting,, extract a reaction center, the reaction center extraction step from the transformation rule database of reaction to each data wherein:
(1) recognition reaction site: reaction site only comprises the atom that the chemical bond that changes and these chemical bonds directly link to each other, by the chemical constitution of product and raw material in the contrast reaction, find chemical bond and the directly continuous atom of these chemical bonds that change has taken place;
(2) extension at fundamental reaction center: extend at the fundamental reaction center that step (1) is obtained, and the reaction center after the extension also comprises the above-mentioned chemical environment of mentioning, i.e. the functional group that links to each other with in the heart atom in the fundamental reaction;
(3) reaction center is abstract: will react the same reaction abstract of essence, the standard of abstract comprises:
(a) if a reaction center comprises halogen, but irrelevant with the type of halogen atom, the abstract halogen of concrete halogen atom in then will reacting,
(b) if the organic reaction reaction mechanism that reaction center is represented is identical, then these reactions are represented with same reaction center, with reaction center abstract after, the reaction center that deletion repeats;
For the synthetic route data of known compound molecule, take synthetic route is simplified to the one-step reaction step process at abstraction reaction center again.
After extracting reaction center, with reaction center is main information, replenish raw material information, reaction conditions information and the productive rate of going up reaction, as a reverse synthetic transformation rule, simultaneously, the complexity that can every reaction realize is manually given a mark, score information is stored in the transformation rule, these information will be used for the split process of back and last synthesizing property scoring.
The concrete steps that generate synthetic route in optimization retrosynthetic analysis tree mode are: the matching algorithm of computed figure is handled the chemical constitution of compound as the data structure of figure: at first, remove to mate target molecule with reaction center successively, which functional group of detection molecules or minor structure can become the fractionation site; If the match is successful, then with this kernel texture as a kind of possible synthesis step, finish the conversion of a step from the target compound to the raw material; Next, the raw material that previous step is converted to becomes the target compound of next step fractionation again, proceeds to split, and finishes up to splitting; At last, the split result in each step coupled together with the form of tree obtain a retrosynthetic analysis tree, the summit of tree is a target compound to be evaluated, and the bottom of tree is final raw material, finishes to this split process; Conversely, from destination node to the limit, constitute a complete synthetic route.
Scoring to synthesizing property is meant after the synthetic route that has generated target compound, on the synthetic route basis, carry out the scoring of the realization difficulty of the number of effective synthetic route and route, effectively synthetic route is meant, under the fractionation step number n that the user sets goes on foot, finally can split the synthetic route of raw material; The realization difficulty of route is meant the methods of marking of reaction test condition and product separating difficulty.
Generating to optimize retrosynthetic analysis tree mode that to have adopted in the synthetic route be the method for optimizing while splitting: the method that a parameter rate represents the possibility of this step realization is set in short-cut method and every reaction of promptly directly stopping the method for reverse synthesis analysis, reverse synthesis analysis subtree.
Good effect of the present invention is: solve the bottleneck problem of existing compound from the head design method development, generate the synthetic route of compound automatically under computer program control.Generate in the process of synthetic route at reverse synthesis analysis, simultaneously the retrosynthetic analysis tree is optimized, rejected useless synthesis step as early as possible, avoided the raw material of useless synthesis step to split again, effectively shortened operation time thereby efficiently solve the problem that produces shot array when computing machine splits compound automatically as target compound.After synthetic route generates, the fractionation difficulty of compound and the realization difficulty of synthetic route are estimated simultaneously, for providing effectively accurately, synthesizing property of compound estimates.
Description of drawings
Fig. 1 is this based on the program flow diagram of reverse synthetic synthesizing property of organic micromolecule compound evaluation method.
Fig. 2 is a retrosynthetic analysis tree synoptic diagram.
Fig. 3 is fixing synthetic route synoptic diagram.
Fig. 4 is the one-step reaction synoptic diagram after simplifying.
Embodiment
Referring to accompanying drawing.
In step 1) identification input molecule, divide the essential information of son file with MOL2 or SD representation of file molecule, these two kinds of file layouts are the most frequently used file layouts of expression molecular structure in chemistry, molecular biology, the field of bioinformatics.The essential information that reads comprise atom number, key number, each atom element type and the key type of three-dimensional coordinate, each key with become the key atom.After reading in essential information, judge whether each atom is in the saturated structures of valence state, to the automatic hydrogenation of unsaturated atom.Behind the hydrogenation, by above-mentioned essential information, the structure of the topological structure of recognition objective molecule, connection degree, functional group and ring.At last,, convert the two-dimensional structure of molecule the Hash numerical coding of 64 word lengths of one dimension to by above-mentioned all information, this molecule two-dimensional structure of unique expression of encoding, a two-dimensional structure also can only be converted to a unique numerical coding simultaneously.
In step 2) read in raw database, transformation rule database and the route database data, the raw database of foundation, transformation rule and route database data convert earlier the data structure that program can be used in such a way to:
For the chemical reagent data of collecting, each data are wherein converted to 64 long Hash numerical codings according to the method that the identification input divides substep, replenish price and the known unique digit recognition number CAS number information that goes up raw material simultaneously.
For the organic reaction data of collecting,, extract a reaction center from the transformation rule database of reaction to each data wherein.Each organic reaction can be represented with the form of A → B, but the essence of reaction is the process of fracture, generation and the change of a chemical bond.Here the reaction center of Ti Chuing, it is the minor structure that a plurality of atoms or functional group form, it has not only comprised the information of chemical bond rupture, generation and change in this course of reaction, also comprise influence chemical bond rupture and generation around the information of chemical environment, essence that can the effectively expressing organic reaction.The extraction step of reaction center is as follows:
(1) recognition reaction site.Reaction site only comprises chemical bond that changes and the atom that these chemical bonds directly link to each other, and by the chemical constitution of product and raw material in the contrast reaction, can find chemical bond and the directly continuous atom of these chemical bonds that change has taken place.
(2) extension at fundamental reaction center.Extend at the fundamental reaction center that step (1) is obtained, and the reaction center after the extension also comprises the above-mentioned chemical environment of mentioning, i.e. the functional group that links to each other with in the heart atom in the fundamental reaction.
(3) reaction center is abstract.By (1), (2) step, each reaction can be extracted a reaction center, but the reaction essence of a plurality of reactions of possibility is the same.To react the same reaction abstract of essence in this step, the standard of abstract comprises: if (a) reaction center comprises halogen, but irrelevant with the type of halogen atom, the abstract halogen of concrete halogen atom in then will reacting.(b), then these reactions are represented with same reaction center if the organic reaction reaction mechanism that reaction center is represented is identical.With reaction center abstract after, the reaction center that deletion repeats.
After extracting reaction center, be main information with reaction center, replenish raw material information, reaction conditions information and the productive rate of going up reaction, as a reverse synthetic transformation rule.Simultaneously, the complexity that can every reaction realize is manually given a mark, score information is stored in the transformation rule, these information will be used for the split process of back and last synthesizing property scoring.
Referring to accompanying drawing 3,4.For the synthetic route data of known drug molecule, take to simplify earlier the step process at abstraction reaction center again.This part data is processes of a multistep reaction, can be represented by the form of A → B → C → D.Wherein D is the drug target molecule that will synthesize, and B and C are the intermediates of synthetic route, and A is a raw material.Simplify exactly this synthetic route is reduced to one-step reaction, promptly represent with A → D.Then, extract the reaction center of this reaction.
The program of having set up in advance as stated above can the recognition data storehouse, just can read 3 data of database.
Generate in the synthetic route to optimize retrosynthetic analysis tree mode in step 3), utilize raw database, transformation rule and route database data, target compound is carried out reverse synthesis analysis to optimize reverse synthesis analysis tree mode.In Chemoinformatics and bioinformatics, the chemical constitution of compound is handled as the data structure of figure.Like this, the matching algorithm of basic figure in the science that uses a computer can judge whether contain certain class minor structure in the target compound exactly, and the matching algorithm of the figure that the present invention uses is Hungary's algorithm.At first, remove to mate target molecule with reaction center successively, which functional group of detection molecules or minor structure can become the fractionation site.If the match is successful, then with this kernel texture as a kind of possible synthesis step, finish the conversion of a step from the target compound to the raw material.Next, the raw material that previous step is converted to becomes the target compound of next step fractionation again, proceeds to split, and finishes up to splitting.The split result in each step is coupled together with the form of setting, just obtained a retrosynthetic analysis tree, the summit of tree is a target compound to be evaluated, and the bottom of tree is final raw material, finishes to this split process.Conversely, from destination node to the limit, just constitute a complete synthetic route.
In said process, this method uses the mode of optimizing reverse synthetic tree to carry out, and promptly optimizes while splitting, and is specific as follows:
The optimization of reverse synthesis analysis tree comprises sets the method that a parameter rate represents the possibility that this step realizes to the simplification of the reverse synthesis analysis principle of direct termination, reverse synthesis analysis subtree and every reaction.About being described below of these three methods:
1. directly stop the principle of reverse synthesis analysis: to target compound, if in its multiple fractionation possibility, it is raw material that the getable precursor of a kind of fractionation is arranged, and that this fractionation may be exactly unique fractionation.Whether determine that the method for raw material is that precursor is compared with each molecule in the raw database, the matching algorithm of utilization figure.After determining, for other fractionation possibilities of this target compound, be raw material if not the precursor that may equally obtain with this fractionation, that just directly weeds out, may be with this fractionation as unique method for splitting of target compound, and stop analysis on this branch.
2. the simplification of reverse synthesis analysis subtree: for some specific minor structures, had relatively-stationary synthetic route, the synthetic route data of the promptly above-mentioned known drug molecule of mentioning.These routes are to be made of multistep reaction.This method is when carrying out reverse synthesis analysis, and wherein a kind of may be to split according to the synthetic route that this fixing multistep reaction is formed.When the chemist split, the fractionation of intermediate was fallen in meeting automatic fitration.But when computing machine splits target compound, also can the intermediate of these multistep reactions be split again, this has caused a large amount of wastes of time.Therefore, according to the route database of setting up, the matching algorithm of use figure detects in the target molecule whether contain this class minor structure, if having, then just directly splits raw material according to this transformation rule.This method is compared with the principle of the reverse synthesis analysis of direct termination, and it does not have unique exclusiveness, and promptly the possible transformation rule of other of target compound still can be recorded in the reverse synthesis analysis tree.
3. the possibility that a parameter rate represents that this step realizes is set in every reaction: the initial value of rate value is relevant with the complexity of the organic reaction of representing this transformation rule, the difficulty of this reaction itself big more, and then the rate value is more little; Simultaneously, the rate value is also relevant with the raw molecule of reaction.When the numerical value of rate value less than setting, the realization difficulty that promptly should go on foot reactions steps is too big, and this possible route just directly is abandoned and is no longer continued fractionation, promptly deletes this branch on the reverse synthesis analysis tree.The relation of the change of rate value and the molecule of raw material is as follows:
(1) method of the electronic effect coupling that requires of the electronic effect of detection reaction site chemical environment and the splitting step of carrying out: according to vitochemical basic theories, electronic effect can have a strong impact on the activity of reaction center, the complexity of final decision reaction.If the electronic effect of reaction site chemical environment is mated with the electronic effect that the splitting step reaction of carrying out requires on the raw material, then reaction can be more prone to; On the contrary, the reaction difficulty increases.The reaction site chemical environment is meant the alpha position functional group of reaction site, and therefore, the electronic effect of chemical environment is to be represented by concrete functional group, shows as sucting electronic effect as nitro, halogen, and alkyl generally shows as the sub-effect of power supply.Electronic effect is exactly the Diels-Alder reaction to the most typical example of activity influence.For each splitting step, all will detect the chemical environment of this step reaction raw materials merchant reaction site automatically.If the electronic effect that the alpha position functional group of reaction site shows is identical with the electronic effect that increases reactivity, then the rate value increases; On the contrary, the rate value reduces.
(2) detect the sterically hindered method of the splitting step carry out: the same with electronic effect, steric effect affects the activity of reaction center equally, the complexity of decision reaction.Atom or group near reaction site in the molecule occupy certain locus, and influence the effect of molecular reaction activity, and the steric effect that reduces the molecular reaction activity claims promptly sterically hindered.Sterically hindered is to represent by functional group equally, bulky functional group can produce one sterically hindered, as the tert-butyl group.For some reactions, reactive activity is closely related with steric effect.If it is sterically hindered that reaction center exists on every side, the difficulty of that reaction can increase greatly.For the conversion of per step,, then can detect the raw material that is converted to automatically if should the step conversion need to consider sterically hindered.If it is sterically hindered that near the functional group in this raw material reaction site exists, then the rate value reduces.
(3) detect the method that influences the reaction selectivity factor: in organic reaction, may have the selectivity of reaction.A chemical reaction is if can give birth to multiple product simultaneously, and wherein target compound wishes to obtain most, so the size of this target compound productive rate has been represented the quality of this reaction selectivity.Though reactive quality can not influence the complexity of reaction itself, it can have influence on the productive rate of reaction, more can increase the separation and purification difficulty of product and accessory substance, and in practical operation, the difficulty of separation and purification may be greater than reaction itself far away.In the method, two classes influence optionally that factor is considered emphatically.The first kind is to contain a plurality of identical functional groups on the raw molecule, this means when splitting in such a way, perhaps that site of reaction center can be undertaken by the process of imagination, but also exists other sites also can be undertaken by the same manner, finally causes low-yield and separation and purification difficulty.Second class is to contain similar reaction site, on carboxyl and amino, though the different reactive hydrogens that all contain of functional group, and, be actually the reaction on the reactive hydrogen for some reaction.To this class reaction,, cause subsidiary reaction to produce and top the same result though carboxyl is different from functional group with amino, but still can influence each other.Therefore, for the factor that this two class may cause reaction selectivity to reduce, if exist in the raw material of this step conversion, then the rate value reduces.
(4) detect the method that whether contains unstable chemical constitution: in organic compound molecule, may have some unsettled chemical constitutions, these structures contain under the oxygen situation in room temperature, can react automatically, convert other more stable structures to.Therefore, for the reaction that contains this structure, purifying and transhipment, generally need under the anhydrous situation of low temperature anaerobic, carry out.And the low temperature anaerobic anhydrous be a very harsh reaction conditions, this means that also the reaction of this class is difficult to carry out.We have collected common unsettled chemical constitution, after splitting, raw material is judged, if raw material contains the unsettled chemical constitution of this class, mean that then the reaction practical operation difficulty of the conversion representative when splitting again is big, so the rate value of this step conversion under this class situation reduces.
(5) evaluation method of molecule complexity: reverse synthetic process is exactly that target molecule with complexity progressively splits simple complexity, the complexity here and simply can representing with the molecule complexity.The molecule complexity also may be used solely to the synthesizing property of assessing compound, but because the definition of molecule complexity and algorithm are still fuzzy, so accuracy is not high.In this method, we with the molecule complexity as an auxiliary evaluation standard.For each step conversion, be higher than target molecule as the complexity of raw material, then it may be an irrational conversion of step, the minimizing of rate value.Molecule complexity cpxtx carries out under hydrogen suppressed graph in this method, and algorithm is as follows:
A). ring: cpxtx=cpxtx+size (i) * k.Size is the one-tenth ring restitution subnumber of each ring, and k is an empirical constant, k=6 in this method.
B). connection degree: cpxtx=cpxtx+i, i is difference with difference connection degree.The connection degree is meant the heavy atom number that each atom links to each other in the molecule.If link to each other, then be 2 with two keys.If triple bond then is 3.
If connection degree cnt (i)=4, then i=24;
If connection degree cnt (i)=3, then i=12;
If connection degree cnt (i)=2, then i=6;
If connection degree cnt (i)=1, then i=3;
C). atomic type: cpxtx=cpxtx+k.If atomic type is C, then k=3; If other atoms, then k=6;
D). obtain a last cpxtx value.
In the scoring of step 4) to synthesizing property, be after the synthetic route that has generated target compound to the scoring of synthesizing property, on the synthetic route basis, carry out the scoring of synthesizing property.Final scoring SA is made up of two parts, and the one, the number of effective synthetic route, the 2nd, the difficulty of realization synthetic route, i.e. SA=S a+ S r, specific as follows: wherein, SA represents final scoring, and Sa represents the number of effective synthetic route, and Sr represents to realize the difficulty of synthetic route.
1. effectively synthetic route is meant, under the fractionation step number N that the user sets goes on foot, finally can split the synthetic route of raw material.Because the final purpose that splits is to synthesize with the raw material that can buy, the synthetic route that therefore can arrive raw material is the most effective.Other routes, though finally do not split raw material, through N step fractionation, the target compound with complexity converts simple molecule to, has reference significance.Effectively the number n of synthetic route is big more, selects greatly more when specifically synthetic, and final synthesising target compound is easier.According to the difference of n, S aBe worth as follows:
(1)S a=-4.25*n+38.25 1<<n<<S
(2)S a=-0.95*In?n+18.7 n>5
(3)S a=0.87*In?X+30 n=0
X is the number of node in the reverse synthesis analysis tree.
2. for each bar synthetic route, difficulty is different in realization, the simple synthetic route that is easy to means that the synthetic difficulty of compound is lower, in general, normal temperature and pressure, do not have that anhydrous and oxygen-free, reagent catalyzer are stablely handled easily, segregative reaction is formed between the product synthetic route, easier realization.The difficulty that realizes is to use the methods of marking based on the reaction difficulty, for each step reaction, score S pAs follows:
(1) per step reaction pair response rule of answering, when setting up the transformation rule database, manually give a mark d, at first S of the complexity that can realize to this reaction p=d
(2) score of calculating product and Reaction Separation difficulty.Separating difficulty is represented S with the logP difference DELTA logP of product and raw material pRelation with Δ logP: S p=S p+ In Δ logP.The computing method of logP are carried out according to known mode: this be a kind of based on atom adding and method, to all kinds of atoms in the organic micromolecule compound, according to its hybridization state, become the state of key situation and continuous atom to be divided into 76 kinds of fundamental types.These four kinds of end groups of cyano group, isothiocyanate group, nitro and nitroso-are also on the whole treated as one in addition, are defined as four kinds of " pseudoatom " types, totally 80 kinds of atomic types.Every kind of atomic type has specific contribution margin, and the logP value of molecule is each atom contribution sum in the molecule.
The realization S of synthetic route yDifficulty be adding of one-step reaction difficulty score and, i.e. S y=∑ S p
For the score Sr of whole synthetic route, get the expression of score minimum in the synthetic line, that is: S r=minS y

Claims (8)

1. one kind based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: collect various common chemical reagent and set up raw database, the organic name reaction who collects organic synthesis commonly used and classics sets up the transformation rule database, collects the synthetic route of the compound molecule of having reported and sets up the route database; Use this three databases, target compound is carried out reverse synthesis analysis, generate the synthetic route of organic micromolecule compound automatically, in split process, the generation that retrosynthetic analysis is set is optimized processing, deletes the node in the retrosynthetic analysis tree in advance; After synthetic route generates, simultaneously the fractionation difficulty of compound and the realization difficulty of synthetic route are estimated; Program circuit comprises following 4 steps:
1) identification input molecule; The two-dimensional structure of molecule is converted to procedure identification after the Hash numerical coding of 64 word lengths of one dimension of unique computer Recognition;
2) read raw database, transformation rule database and route database data; For the chemical reagent data of collecting, represent with the Hash numerical coding; For the organic reaction data of collecting, extract a reaction center from the transformation rule database of reaction, for the synthetic route data of known compound molecule, after taking to simplify earlier again the step process at abstraction reaction center and becoming after the data structure that program can use again row read;
3) generate synthetic route to optimize retrosynthetic analysis tree mode; The matching algorithm of computed figure is handled the chemical constitution of compound as the data structure of figure:
4) to the scoring of synthesizing property; To the number of effective synthetic route and the comprehensive grading of route realization difficulty.
2. as claimed in claim 1 based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: the concrete steps that described identification input molecule is are: computing machine at first reads in the target molecule file of needs assessment, divide the essential information of son file with MOL2 or SD representation of file molecule, comprise, atom number, the number of key, each atom element type and three-dimensional coordinate, the key type of each key with become the key atom, after reading in essential information, judge whether each atom is in the saturated structures of valence state, to the automatic hydrogenation of unsaturated atom, behind the hydrogenation, by above-mentioned essential information, the topological structure of recognition objective molecule, the connection degree, the structure of functional group and ring, at last, by above-mentioned all information, convert the two-dimensional structure of molecule the Hash numerical coding of 64 word lengths of one dimension to, this molecule two-dimensional structure of unique expression of encoding, a two-dimensional structure also can only be converted to a unique numerical coding simultaneously.
3. as claimed in claim 1 based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: the described raw database that reads, the concrete steps of transformation rule database and route database data are: the raw database data that pre-deposited the various common chemical reagent of collecting in the computing machine, the transformation rule database data of the organic synthesis of collecting commonly used and classical organic name reaction's data, collect the synthetic route of the known compound molecule of report and set up route data data, these three data database datas are converted to respectively in such a way go reading of data after the data structure that program can use again:
For the chemical reagent data of collecting, to each data wherein according to the identical method of identification input molecule, convert 64 long Hash numerical codings to, replenish price and the CAS number information that goes up raw material simultaneously;
For the organic reaction data of collecting,, extract a reaction center, the reaction center extraction step from the transformation rule database of reaction to each data wherein:
(1) recognition reaction site: reaction site only comprises the atom that the chemical bond that changes and these chemical bonds directly link to each other, by the chemical constitution of product and raw material in the contrast reaction, find chemical bond and the directly continuous atom of these chemical bonds that change has taken place;
(2) extension at fundamental reaction center: extend at the fundamental reaction center that step (1) is obtained, and the reaction center after the extension also comprises the above-mentioned chemical environment of mentioning, i.e. the functional group that links to each other with in the heart atom in the fundamental reaction;
(3) reaction center is abstract: will react the same reaction abstract of essence, the standard of abstract comprises:
(a) if a reaction center comprises halogen, but irrelevant with the type of halogen atom, the abstract halogen of concrete halogen atom in then will reacting,
(b) if the organic reaction reaction mechanism that reaction center is represented is identical, then these reactions are represented with same reaction center, with reaction center abstract after, the reaction center that deletion repeats;
For the synthetic route data of known compound molecule, take synthetic route is simplified to the one-step reaction step process at abstraction reaction center again.
After extracting reaction center, with reaction center is main information, replenish raw material information, reaction conditions information and the productive rate of going up reaction, as a reverse synthetic transformation rule, simultaneously, the complexity that can every reaction realize is manually given a mark, score information is stored in the transformation rule, these information will be used for the split process of back and last synthesizing property scoring.
4. as claimed in claim 1 based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: describedly to optimize the concrete steps that retrosynthetic analysis tree mode generates synthetic route be: the matching algorithm of computed figure is handled the chemical constitution of compound as the data structure of figure: at first, remove to mate target molecule with reaction center successively, which functional group of detection molecules or minor structure can become the fractionation site; If the match is successful, then with this kernel texture as a kind of possible synthesis step, finish the conversion of a step from the target compound to the raw material; Next, the raw material that previous step is converted to becomes the target compound of next step fractionation again, proceeds to split, and finishes up to splitting; At last, the split result in each step coupled together with the form of tree obtain a retrosynthetic analysis tree, the summit of tree is a target compound to be evaluated, and the bottom of tree is final raw material, finishes to this split process; Conversely, from destination node to the limit, constitute a complete synthetic route.
5. as claimed in claim 1 based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: described scoring to synthesizing property is meant after the synthetic route that has generated target compound, on the synthetic route basis, carry out the scoring of the realization difficulty of the number of effective synthetic route and route, effectively synthetic route is meant, under the fractionation step number n that the user sets goes on foot, finally can split the synthetic route of raw material; The realization difficulty of route is meant the methods of marking of reaction test condition and product separating difficulty.
6. as claimed in claim 4 based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: describedly generating to optimize retrosynthetic analysis tree mode that to have adopted in the synthetic route be the method for optimizing while splitting: the method that a parameter rate represents the possibility of this step realization is set in short-cut method and every reaction of promptly directly stopping the method for reverse synthesis analysis, reverse synthesis analysis subtree:
Directly stop the method for reverse synthesis analysis: to target compound, precursor is compared with each molecule in the raw database, the matching algorithm comparison of utilization figure, if in its multiple fractionation possibility, the precursor that has a kind of fractionation to obtain is a raw material, other that then directly weed out this target compound split may, may be with this fractionation, and stop analysis on this branch as unique method for splitting of target compound:
The method of the simplification of reverse synthesis analysis subtree: for some specific minor structures, there has been relatively-stationary synthetic route, be the synthetic route data of known compound molecule, according to the route database of setting up, the matching algorithm of use figure detects in the target molecule whether contain this class minor structure, if have, then just directly split raw material, but other possible transformation rules of target compound can be recorded to still in the reverse synthesis analysis tree according to this transformation rule;
Every the method that a parameter rate represents the possibility that this step realizes is set in reaction:
Every the possibility that a parameter rate represents that this step realizes is set in reaction: the initial value of rate value is relevant with the complexity of the organic reaction of representing this transformation rule, the difficulty of this reaction itself big more, and then the rate value is more little; Simultaneously, the rate value is also relevant with the raw molecule of reaction, and when the numerical value of rate value less than setting, the realization difficulty that promptly should go on foot reactions steps is too big, and this possible route just directly is abandoned and is no longer continued fractionation, promptly deletes this branch on the reverse synthesis analysis tree.
7. as claimed in claim 6 based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: the method that a parameter rate represents the possibility that this step realizes is set in described every reaction, and the relation of the change of its rate value and the molecule of raw material is:
The electronic effect of reaction site chemical environment is mated with the electronic effect that the splitting step of carrying out requires:
For each splitting step, all detect the chemical environment of this step reaction raw materials merchant reaction site automatically, be the alpha position functional group of reaction site, if the electronic effect that the alpha position functional group of reaction site shows is identical with the electronic effect that increases reactivity, then the rate value increases; On the contrary, the rate value reduces;
The splitting step of carrying out sterically hindered: for per step conversion,, then detect the raw material that is converted to automatically if should the step conversion need to consider sterically hindered, if near this raw material reaction site functional group exist sterically hindered, then rate value minimizing;
Influence the reaction selectivity factor: detect the factor that two classes may cause reaction selectivity to reduce, the first kind is to contain a plurality of identical functional groups on the raw molecule, whether second class contains similar reaction site, if exist in the raw material of this step conversion, then the rate value reduces;
Whether contain unstable chemical constitution: detection is containing in room temperature under the oxygen situation, the chemical constitution whether existence can react automatically, after splitting, raw material is judged, if raw material contains the unsettled chemical constitution of this class, the reaction practical operation difficulty that then means the conversion representative when splitting again is big, and the rate value of this step conversion reduces;
The judge of molecule complexity cpxtx: for each step conversion, be higher than target molecule as the complexity of raw material, then it may be an irrational conversion of step, the minimizing of rate value, and molecule complexity cpxtx carries out under hydrogen suppressed graph, and algorithm is as follows:
A). ring: cpxtx=cpxtx+size (i) * k
Size is the one-tenth ring restitution subnumber of each ring, and k is an empirical constant, the k=6 of this method,
B). connection degree: cpxtx=cpxtx+i,
Wherein, i is difference with difference connection degree, and the connection degree is meant the heavy atom number that each atom links to each other in the molecule, if link to each other with two keys, then is 2, if triple bond then is 3;
If connection degree cnt (i)=4, then i=24;
If connection degree cnt (i)=3, then i=12;
If connection degree cnt (i)=2, then i=6;
If connection degree cnt (i)=1, then i=3;
C). atomic type: cpxtx=cpxtx+k
If atomic type is C, then k=3; If other atoms, then k=6;
D). obtain a last cpxtx value.
8. as claimed in claim 5 based on reverse synthetic synthesizing property of organic micromolecule compound evaluation method, it is characterized in that: on the synthetic route basis, carry out the number of effective synthetic route and realize that the scoring of the difficulty of synthetic route is meant:
Final scoring SA=S a+ S r, wherein, SA represents final scoring, and Sa represents the number of effective synthetic route, and Sr represents to realize the difficulty of synthetic route,
Effectively the number of the fractionation step number n of synthetic route is big more, selects greatly more when specifically synthetic, and final synthesising target compound is easier, according to the difference of n, S aBe worth as follows:
(1)S a=-4.25*n+38.251 1<<n<<5
(2)S a=-0.95*ln?n+18.7 n>5
(3) S a=0.87*ln X+30 n=0 wherein, X is the number of node in the reverse synthesis analysis tree;
For each bar synthetic route, the difficulty S of realization pExpression:
(1) per step reaction pair response rule of answering, when setting up the transformation rule database, manually give a mark d, at first S of the complexity that can realize to this reaction p=d
(2) score of calculating product and Reaction Separation difficulty: separating difficulty is represented S with the logP difference DELTA logP of product and raw material pRelation with Δ logP: S p=S p+ ln Δ logP
The realization S of synthetic route yDifficulty be adding of one-step reaction difficulty score and, i.e. S y=∑ S p
Score S for whole synthetic route r, get the expression of score minimum in the synthetic line, that is: S r=minS y
CN2010101066489A 2010-02-05 2010-02-05 Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis Expired - Fee Related CN101789047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101066489A CN101789047B (en) 2010-02-05 2010-02-05 Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101066489A CN101789047B (en) 2010-02-05 2010-02-05 Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis

Publications (2)

Publication Number Publication Date
CN101789047A true CN101789047A (en) 2010-07-28
CN101789047B CN101789047B (en) 2011-10-26

Family

ID=42532257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101066489A Expired - Fee Related CN101789047B (en) 2010-02-05 2010-02-05 Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis

Country Status (1)

Country Link
CN (1) CN101789047B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109872780A (en) * 2019-03-14 2019-06-11 北京深度制耀科技有限公司 A kind of determination method and device of chemical synthesis route
CN111524557A (en) * 2020-04-24 2020-08-11 腾讯科技(深圳)有限公司 Inverse synthesis prediction method, device, equipment and storage medium based on artificial intelligence
CN112272764A (en) * 2018-01-30 2021-01-26 斯坦福国际研究院 Computational generation of chemical synthetic routes and methods
CN112397155A (en) * 2020-12-01 2021-02-23 中山大学 Single-step reverse synthesis method and system
CN113140260A (en) * 2020-01-20 2021-07-20 腾讯科技(深圳)有限公司 Method and device for predicting reactant molecular composition data of composition
CN114144110A (en) * 2019-07-30 2022-03-04 Emd密理博公司 Method for synthesizing compound
CN114613446A (en) * 2022-03-11 2022-06-10 冰洲石生物科技(上海)有限公司 Interactive/chemical synthesis route design method, system, medium, and electronic device
CN115831248A (en) * 2023-02-20 2023-03-21 新疆独山子石油化工有限公司 Method and device for determining reaction rule, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156405A (en) * 2016-06-24 2016-11-23 上海网化化工科技有限公司 Organic synthetic route design method based on chemical reaction data storehouse

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112272764A (en) * 2018-01-30 2021-01-26 斯坦福国际研究院 Computational generation of chemical synthetic routes and methods
CN109872780A (en) * 2019-03-14 2019-06-11 北京深度制耀科技有限公司 A kind of determination method and device of chemical synthesis route
CN114144110A (en) * 2019-07-30 2022-03-04 Emd密理博公司 Method for synthesizing compound
CN114144110B (en) * 2019-07-30 2023-02-03 Emd密理博公司 Method for synthesizing compound
CN113140260A (en) * 2020-01-20 2021-07-20 腾讯科技(深圳)有限公司 Method and device for predicting reactant molecular composition data of composition
CN113140260B (en) * 2020-01-20 2023-09-08 腾讯科技(深圳)有限公司 Method and device for predicting reactant molecular composition data of composition
CN111524557A (en) * 2020-04-24 2020-08-11 腾讯科技(深圳)有限公司 Inverse synthesis prediction method, device, equipment and storage medium based on artificial intelligence
CN111524557B (en) * 2020-04-24 2024-04-05 腾讯科技(深圳)有限公司 Inverse synthesis prediction method, device, equipment and storage medium based on artificial intelligence
CN112397155B (en) * 2020-12-01 2023-07-28 中山大学 Single-step reverse synthesis method and system
CN112397155A (en) * 2020-12-01 2021-02-23 中山大学 Single-step reverse synthesis method and system
CN114613446A (en) * 2022-03-11 2022-06-10 冰洲石生物科技(上海)有限公司 Interactive/chemical synthesis route design method, system, medium, and electronic device
CN115831248B (en) * 2023-02-20 2023-06-06 新疆独山子石油化工有限公司 Method and device for determining reaction rules, electronic equipment and storage medium
CN115831248A (en) * 2023-02-20 2023-03-21 新疆独山子石油化工有限公司 Method and device for determining reaction rule, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN101789047B (en) 2011-10-26

Similar Documents

Publication Publication Date Title
CN101789047B (en) Method for evaluating synthesization of organic small-molecule compounds based on reverse synthesis
Warr Representation of chemical structures
US6434542B1 (en) Statistical deconvoluting of mixtures
US20160350478A1 (en) De novo diploid genome assembly and haplotype sequence reconstruction
US20130317755A1 (en) Methods, computer-accessible medium, and systems for score-driven whole-genome shotgun sequence assembly
CA2424031C (en) System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map
US20120041727A1 (en) Method, computer-accessible medium and systems for score-driven whole-genome shotgun sequence assemble
CA2930597A1 (en) Methods for the graphical representation of genomic sequence data
CN107103205A (en) A kind of bioinformatics method based on proteomic image data notes eukaryotic gene group
Hardwick et al. Digitising chemical synthesis in automated and robotic flow
CN115798621A (en) Transformer-based context-aware single-step inverse synthesis prediction method and device
CN114388071A (en) Method and device for managing compound synthesis path and storage medium
CN104573405B (en) Phylogenetic tree rebuilding method for building sub trees on basis of big trees
CN104317244A (en) Reconfigurable manufacturing system part family construction method
Funatsu et al. Automatic perception of reactivity characteristics of molecular structures directed to the planning of organic synthesis
Zhan et al. Towards pandemic-scale ancestral recombination graphs of SARS-CoV-2
CN109545283A (en) A kind of phylogenetic tree construction method based on Sequential Pattern Mining Algorithm
CN105844028B (en) A kind of energy-containing compound computer aided design system
Thong et al. Synthesis of distillation sequences for separating multicomponent azeotropic mixtures
US20210225462A1 (en) Method Of Synthesizing Chemical Compounds
CN113517033B (en) XGboost-based chemical reaction yield intelligent prediction and analysis method in small sample environment
Mabrouk et al. BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab
Li et al. aPhyloGeo-Covid: A web interface for reproducible phylogeographic analysis of SARS-CoV-2 variation using Neo4j and Snakemake
Jorna et al. Species boundaries in the messy middle--testing the hypothesis of micro-endemism in a recently diverged lineage of coastal fog desert lichen fungi
CN117409872A (en) Biological synthesis path prediction method based on machine learning and user platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111026

Termination date: 20120205