CN114664388A - Rapid screening method for electrophilic substitution reaction of aromatic system - Google Patents

Rapid screening method for electrophilic substitution reaction of aromatic system Download PDF

Info

Publication number
CN114664388A
CN114664388A CN202011533369.0A CN202011533369A CN114664388A CN 114664388 A CN114664388 A CN 114664388A CN 202011533369 A CN202011533369 A CN 202011533369A CN 114664388 A CN114664388 A CN 114664388A
Authority
CN
China
Prior art keywords
reaction
library
data
reactions
aromatic ring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011533369.0A
Other languages
Chinese (zh)
Inventor
夏宁
沈国文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhihua Technology Co ltd
Original Assignee
Wuhan Zhihua Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhihua Technology Co ltd filed Critical Wuhan Zhihua Technology Co ltd
Priority to CN202011533369.0A priority Critical patent/CN114664388A/en
Publication of CN114664388A publication Critical patent/CN114664388A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/10Analysis or design of chemical reactions, syntheses or processes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Organic Low-Molecular-Weight Compounds And Preparation Thereof (AREA)

Abstract

The invention provides a method for rapidly screening electrophilic substitution reaction of an aromatic system, which comprises the following steps: a) establishing a database containing chemical reaction documents, and extracting all document reactions related to electrophilic aromatic ring substitution from the database to obtain a electrophilic aromatic ring substitution reaction library; then preprocessing the aromatic ring electrophilic substitution reaction library to generate three data tables of an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library; b) and b) screening the predicted reaction to be screened by adopting the three data tables generated in the step a) to obtain the target electrophilic substitution reaction. The method for quickly screening the electrophilic substitution reaction of the aromatic system provided by the invention judges the reasonable degree of the reaction by mining the reaction information by virtue of big data, can quickly and reasonably screen the predicted single-step reaction related to the electrophilic substitution in a reasonable time range, and provides the optimal substitution site, so that the most reliable synthetic route or the most interesting synthetic route of a user is found, and the reaction practice is guided.

Description

Rapid screening method for electrophilic substitution reaction of aromatic system
Technical Field
The invention relates to the technical field of data processing, in particular to a method for rapidly screening electrophilic substitution reaction of an aromatic system.
Background
In the molecular structure of drugs and natural products now approved for marketing, important aromatic and heterocyclic systems are ubiquitous; electrophilic substitutions at different positions on these aromatic rings can greatly alter the biological activity of the compounds. In many cases, the field of molecular design has built substrates around a few existing initial molecules, which are usually derived from a database of known compounds. Ideally, chemists can select appropriate molecules from the intermediate molecules, and efficiently substitute the molecules at specific interested sites through a series of effective synthetic reaction means, so that the intermediate structure is functionalized, and the final target molecules are obtained. Because of the importance of the substitution reaction of aromatic systems, many substitution patterns are being studied during drug development and are expected to help establish the structure and activity relationship. In general, in the design of synthetic pathways for drug molecules, chemists and synthetic experts prefer to have highly selective reactive pathways, and due to the lack of reliable data and data support, pathway schemes that have ambiguous access to target molecules with respect to site selectivity may be discarded as much as possible. These ambiguous site substitutions may imply unfavorable results such as harsh reaction conditions or low yields. Therefore, how to reasonably screen site substitution reactions under different aromatic systems in the modern organic synthesis path planning will directly determine the reliability degree of the path prediction system.
Computer-aided synthetic route design is often faced with rational design of electrophilic substitution reactions of aromatic and heterocyclic systems, particularly in some pharmaceutical fields where electrophilic substitution at critical positions will contribute to altering the biological activity of a compound. In one of the prior art approaches, empirical parameters are used to assess the nucleophilicity of the aromatic system, such as using hammett substituent constants or 1H and 13C nmr shifts to empirically estimate the sites of substitution that may occur; the accuracy of 80% can be achieved in 130 actual electrophilic substitution reactions; however, if complex electron-rich aromatic systems are encountered, the accuracy drops to a relatively low level. And another technical method capable of improving the empirical parameter evaluation capability is to combine more accurate density functional calculation, and under the condition of sufficient calculation resources, the same electrophilic substitution data set can show more than 95% accuracy. In fact, the full electron wave function obtained by utilizing functional theory calculation can really obtain more reliable qualitative and even quantitative results under reasonable technical analysis; however, the calculation itself is a time-consuming process, and the involvement of different heterocyclic systems may require wave function approximations of different degrees to obtain reasonable results, which makes it impossible to quickly and effectively determine the involved reactions in computer-assisted organic synthetic routes.
In summary, the prior art generally has two different levels of disadvantages: the characteristic of low estimation accuracy usually appears in some existing empirical parameters for estimating the substitution sites of the aromatic system, because the existing empirical parameters are mostly from approximate estimation obtained by limited experimental data, or depend on the pre-existing ideal hypothesis of a theory presenter and are limited by the current theoretical level limit and the complexity of a chemical system, and the requirements of increasing reaction estimation accuracy cannot be met only by manually set parameters; in order to pursue quantum chemical calculation means with the maximum accuracy, although the quantum chemical calculation means helps us to obtain the best site evaluation under the condition of understanding molecules to a certain extent, under the current calculation capability, the required financial expenditure and time consumption reach the non-negligible degree, even under the more complex heterocyclic system, the time consumption is huge, and the condition of inaccurate result prediction still occurs.
Disclosure of Invention
In view of the above, the present invention provides a method for rapidly screening electrophilic substitution reactions of aromatic systems, which can rapidly and reasonably screen predicted single-step reactions involving electrophilic substitution within a reasonable time range, and provide optimal substitution sites, so as to find the most reliable or most interesting synthetic routes for users, and further guide the reaction practice.
The invention provides a method for rapidly screening electrophilic substitution reaction of an aromatic system, which comprises the following steps:
a) establishing a database containing chemical reaction documents, and extracting all document reactions related to electrophilic aromatic ring substitution from the database to obtain a electrophilic aromatic ring substitution reaction library; then preprocessing the aromatic ring electrophilic substitution reaction library to generate three data tables of an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library;
b) and b) screening the predicted reaction to be screened by adopting the three data tables generated in the step a) to obtain the target electrophilic substitution reaction.
Preferably, the process of establishing the database containing the chemical reaction literature in the step a) is specifically as follows:
a1) storing the collected various chemical reactions in advance through a computer, converting the chemical reactions into a computer storage format, and then performing data processing to obtain one-to-one correspondence information of atoms in reactants and products;
a2) according to the corresponding information, identifying the reaction sites, and extracting the atoms, chemical bonds and groups which are directly connected with or conjugated with the reaction sites and the groups which are indirectly connected with the reaction sites and influence the reaction as the information for identifying the chemical reaction, and further storing the information in a database as the reaction rule.
Preferably, the various chemical reactions described in step a1) include conventional chemical reactions, classical organic human name reactions, chemical reactions reported in academic journals and chemical reactions reported in patents.
Preferably, the reactive sites in step a2) comprise altered chemical bonds and atoms to which these bonds are directly attached;
the identification process specifically comprises the following steps:
by comparing the chemical structures of the starting materials and the products in the reaction, the altered chemical bonds and the atoms to which these bonds are directly attached are found.
Preferably, the pretreatment process in step a) specifically comprises:
obtaining a corresponding aromatic ring skeleton structure table by identifying the skeleton of a reaction main body to form an aromatic ring skeleton library;
carrying out structure simplification on original reactant molecules with aromatic rings and generating corresponding simplified molecular structures, then calculating the substituted groups on the simplified rings by adopting a group electronegativity calculation method, and combining according to the numbers of skeleton atoms connected with the substituted groups, thereby generating substituted group fingerprints aiming at the molecules and forming a substituted group fingerprint library;
and (3) intensively analyzing all reactions with electrophilic substitutions as a unified template, carrying out unique numbering, recording the related information of specific reaction sites and example numbers, generating a substitution reaction information table, and forming a reaction site information base.
Preferably, the process of simplifying the structure of the original reactant molecule with aromatic ring specifically comprises:
and (3) taking atoms on the skeleton ring as a starting point, expanding three layers of atoms and bonds to the group, and discarding other redundant group atoms to obtain the simplified molecule.
Preferably, the screening process in step b) is specifically:
b1) after molecular structure reduction is carried out on the prediction reaction to be screened, an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library are sequentially matched;
if no matching data is found, the data is regarded as that no reference data which can be referred to is found, and finally the data is marked as unreasonable reaction to be screened out;
if similar data exists, taking out the data from the database for further comparison;
b2) calculating the reaction similarity between the similar data screened in the step b1) and the corresponding data of the predicted reaction to be screened;
if the similarity is less than or equal to the standard threshold, the data of the literature which can be referred to is considered to be not found, and finally the data is marked as unreasonable reaction to be screened out;
and if the similarity is greater than the standard threshold, selecting literature reactions with similar reaction sites to form a candidate group, sequencing the candidate reactions, and screening the target electrophilic substitution reaction.
Preferably, the standard threshold value in step b2) is equal to or greater than 70%.
Preferably, the process of ranking the candidate reactions in step b2) specifically comprises:
sorting according to similarity scoring; for the condition that a plurality of sites on an aromatic ring have related reasonable literature support, by mining data distribution under different sites, reactions with more literature support, higher yield and better conditions are preferentially selected as the final selected literature reaction as reference, and the selected reactions are marked as reasonable reactions.
The invention provides a method for rapidly screening electrophilic substitution reaction of an aromatic system, which comprises the following steps: a) establishing a database containing chemical reaction documents, and extracting all document reactions related to electrophilic aromatic ring substitution from the database to obtain a electrophilic aromatic ring substitution reaction library; then preprocessing the aromatic ring electrophilic substitution reaction library to generate three data tables of an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library; b) and b) screening the predicted reaction to be screened by adopting the three data tables generated in the step a) to obtain the target electrophilic substitution reaction. Compared with the prior art, the rapid screening method for electrophilic substitution of the aromatic system, provided by the invention, is based on data mining, judges the reasonable degree of the reaction by means of mining reaction information of big data, can rapidly and reasonably screen the predicted single-step reaction related to electrophilic substitution within a reasonable time range, and gives the optimal substitution site, so that the most reliable synthetic route or the most interesting synthetic route of a user can be found, and the reaction practice can be guided.
In addition, the rapid screening method provided by the invention can be used for carrying out more detailed self-definition according to the requirements of customers, and can be used as a customized content selection specific condition preferred selection reference document, so that the document support which is most interested by the user can be found from the database finally, the data can be perfected, and the rapid screening method has a wider application prospect.
Drawings
FIG. 1 is a flow chart of the generation of an aromatic ring substitution reaction information library in the rapid screening method provided by the present invention;
FIG. 2 is a flow chart of a target electrophilic substitution reaction obtained by a rapid screening method of an electrophilic substitution reaction of an aromatic system for a predicted reaction to be screened according to an embodiment of the present invention;
FIG. 3 shows electrophilic substitution as judged in example 1 of the present invention;
FIG. 4 shows the results of the rapid screening of example 1 of the present invention;
FIG. 5 shows electrophilic substitution as judged in example 2 of the present invention;
FIG. 6 shows the results of the rapid screening of example 2 of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a method for rapidly screening electrophilic substitution reaction of an aromatic system, which comprises the following steps:
a) establishing a database containing chemical reaction documents, and extracting all document reactions related to electrophilic aromatic ring substitution from the database to obtain a electrophilic aromatic ring substitution reaction library; then preprocessing the aromatic ring electrophilic substitution reaction library to generate three data tables of an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library;
b) and b) screening the predicted reaction to be screened by adopting the three data tables generated in the step a) to obtain the target electrophilic substitution reaction.
The invention first establishes a database containing chemical reaction literature, namely a chemical reaction database. In the present invention, the process of establishing the database containing the chemical reaction literature is preferably embodied as follows:
a1) storing the collected various chemical reactions in advance through a computer, converting the chemical reactions into a computer storage format, and then performing data processing to obtain one-to-one correspondence information of atoms in reactants and products;
a2) according to the corresponding information, identifying the reaction sites, and taking the atoms, chemical bonds and groups which are directly connected with or conjugated with the reaction sites and the groups which are indirectly connected with the reaction sites and influence the reaction as the information for identifying the chemical reaction, extracting the information and further storing the information in a database as the reaction rule.
In the present invention, the various chemical reactions preferably include conventional chemical reactions, classical organic human name reactions, chemical reactions reported in academic journals, and chemical reactions reported in patents. On the basis, the invention can also expand and update various chemical reactions in real time according to new research results; meanwhile, the data in the established database is also the data obtained after a chemist is required to perform a single chemical reaction.
In the present invention, the reaction site preferably comprises altered chemical bonds and atoms to which these bonds are directly attached; on this basis, the process of identification is preferably specifically:
by comparing the chemical structures of the starting materials and the products in the reaction, the altered chemical bonds and the atoms to which these bonds are directly attached are found.
After the literature reaction database is established, the invention extracts all literature reactions related to electrophilic substitution of the aromatic ring from the literature reaction database to obtain the electrophilic substitution reaction database of the aromatic ring. In the present invention, the reactions in the library of electrophilic aromatic ring substitution reactions will support the data for the predictions that follow.
After the aromatic ring electrophilic substitution reaction library is obtained, the aromatic ring electrophilic substitution reaction library is preprocessed to generate an aromatic ring substitution reaction information library, and the aromatic ring substitution reaction information library specifically comprises three data tables of an aromatic ring framework library, a substituent fingerprint library and a reaction site information library. In order to quickly compare the predicted reactions of electrophilic substitution in the later path generation, the invention further performs special pretreatment on the reactions of the documents to extract more information.
In the present invention, the pretreatment process preferably includes:
obtaining a corresponding aromatic ring skeleton structure table by identifying the skeleton of a reaction main body to form an aromatic ring skeleton library;
carrying out structure simplification on original reactant molecules with aromatic rings and generating corresponding simplified molecular structures, then calculating the substituted groups on the simplified rings by adopting a group electronegativity calculation method, and combining according to the numbers of skeleton atoms connected with the substituted groups, thereby generating substituted group fingerprints aiming at the molecules and forming a substituted group fingerprint library;
and (3) intensively analyzing all reactions with electrophilic substitutions as a unified template, carrying out unique numbering, recording the related information of specific reaction sites and example numbers, generating a substitution reaction information table, and forming a reaction site information base.
Firstly, because the distribution of the substituent groups can generate different effects due to different aromatic rings, preliminary screening needs to be performed by depending on the rings in advance; in particular, on the reaction data: obtaining a corresponding aromatic ring framework structure table by identifying the framework of a reaction main body; this table will aid in the specific screening of aromatic mother rings that are identical to the backbones involved in the response to be predicted;
secondly, the positioning effects generated by different substituents are different, and especially when a plurality of different substituents jointly act on a mother ring, different effects can be generated; on the basis, the method comprises the steps of firstly simplifying the structure of an original reactant molecule with an aromatic ring and generating a corresponding simplified molecular structure, then calculating the substituent on the simplified ring by adopting a group electronegativity calculation method (the group electronegativity is calculated by the electronegativity of a free valence unsaturated central atom on a group and the electronegativity of a linked atom or a secondary group), and combining according to the number of skeleton atoms connected with the substituent, thereby generating the substituent fingerprint aiming at the molecule and forming a substituent fingerprint library; the substituent fingerprint library forms a new aromatic substituted molecular structure information table for simplifying a molecular structure and a substituent fingerprint, and an electrophilic substitution reaction example with the same type can be further found out by utilizing the structural information in the aromatic substituted molecular structure information table, so that powerful data evidence is provided.
Finally, the invention integrates all reaction analysis with electrophilic substitution as a unified template and carries out unique numbering, and records the related information of specific reaction sites and example numbers, generates a substitution reaction information table, and forms a reaction site information base; this will vote the best reaction instance by mining the distribution of data in a particular reaction screen.
In the present invention, the process of simplifying the structure of the original reactant molecule with aromatic ring is preferably specifically:
and (3) taking atoms on the skeleton ring as a starting point, expanding three layers of atoms and bonds to the group, and discarding other redundant group atoms to obtain the simplified molecule. In the present invention, the simplified molecule will discard the excess and less effective subset of the atoms on the group.
The flow chart of the aromatic ring substitution reaction information base generated in the rapid screening method provided by the invention is shown in figure 1; the invention completes the preprocessing stage of the literature reaction database through the steps, and then carries out the screening stage of the target electrophilic substitution reaction, which specifically comprises the following steps:
and screening the predicted reaction to be screened by adopting the generated three data tables to obtain the target electrophilic substitution reaction.
In the present invention, the screening process preferably includes:
b1) after molecular structure reduction is carried out on the prediction reaction to be screened, an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library are sequentially matched;
if no matching data is found, the data is regarded as that no reference data which can be referred to is found, and finally the data is marked as unreasonable reaction to be screened out;
if similar data exists, taking out the data from the database for further comparison;
b2) calculating the reaction similarity of the similar data screened in the step b1) and the corresponding data of the predicted reaction to be screened;
if the similarity is less than or equal to the standard threshold, the data of the literature which can be referred to is considered to be not found, and finally the data is marked as unreasonable reaction to be screened out;
and if the similarity is greater than the standard threshold, selecting literature reactions with similar reaction sites to form a candidate group, sequencing the candidate reactions, and screening the target electrophilic substitution reaction.
In the invention, the main algorithm for rapidly screening electrophilic substitution reaction is mainly used in computer-aided organic synthesis path prediction, and can also be used for evaluating a certain unknown reaction independently. The method comprises the steps of firstly simplifying a molecular structure reduction of a predicted reaction to be screened, including extracting information such as molecular fragments and sites, similar to a simplified scheme in a preprocessing stage of a document reaction library, then querying an electrophilic reaction information library generated in the last stage by using elements such as a framework, the reaction template number and the like, and if matching data is not found, determining that reference document data cannot be found, and finally marking unreasonable reaction for screening; if similar data exists, further comparisons are taken from the database.
And then, calculating the reaction similarity between the similar data screened in the steps and the corresponding data of the predicted reaction to be screened. In the invention, the data inquired from the electrophilic substitution reaction database should contain the information about the skeleton, substituent fingerprint, reaction site and the like related to the literature reaction, and the weight of the substituent fingerprint of the data is gradually similar to the weight of the fingerprint extracted by the reaction to be screened; if the similarity is less than or equal to the standard threshold, the data of the literature which can be referred to is considered to be not found, and finally the data is marked as unreasonable reaction to be screened out; and if the similarity is greater than the standard threshold, selecting literature reactions with similar reaction sites to form a candidate group, sequencing the candidate reactions, and screening the target electrophilic substitution reaction.
In the present invention, the standard threshold is preferably 70% or more, and more preferably 75%.
In the present invention, the process of ranking the candidate reactions is preferably specifically:
sorting according to the similarity (cosine similarity of substituent fingerprint vectors); in the case that a plurality of sites on an aromatic ring have related reasonable literature support, the reactions supported by more literatures, higher yield and better conditions are preferentially selected by mining data distribution under different sites, and the reactions are used as the reference of the finally selected literature reaction, and the screened reactions are marked as reasonable reactions.
The method for rapidly screening the electrophilic substitution reaction of the aromatic system can deeply mine information to reasonably judge the reasonable degree of the reaction by following up the latest literature data, provide related references, and control the time cost within millisecond to complete the screening of single-step reaction; the technical problems that the accuracy of the technical scheme for evaluating the rationality of the electrophilic substitution reaction by using empirical parameters is low, and the wrong cases and the missed cases are increased continuously along with the appearance of new reaction technologies and theories can be solved, and the technical problems that financial resources and computing power are consumed by using the scheme combining quantum mechanics can be solved.
The invention provides a method for rapidly screening electrophilic substitution reaction of an aromatic system, which comprises the following steps: a) establishing a database containing chemical reaction documents, and extracting all document reactions related to electrophilic aromatic ring substitution from the database to obtain a electrophilic aromatic ring substitution reaction library; then preprocessing the aromatic ring electrophilic substitution reaction library to generate three data tables of an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library; b) and b) screening the predicted reaction to be screened by adopting the three data tables generated in the step a) to obtain the target electrophilic substitution reaction. Compared with the prior art, the method for rapidly screening the electrophilic substitution reaction of the aromatic system is based on data mining, judges the reasonable degree of the reaction by mining reaction information through big data, can rapidly and reasonably screen the predicted single-step reaction related to the electrophilic substitution in a reasonable time range, and provides the optimal substitution site, so that the most reliable synthetic route or the most interesting synthetic route of a user is found, and the reaction practice is guided.
In addition, the rapid screening method provided by the invention can be used for carrying out more detailed self-definition according to the requirements of customers, and can be used as a customized content selection specific condition preferred selection reference document, so that the document support which is most interested by the user can be found from the database finally, the data can be perfected, and the rapid screening method has a wider application prospect.
To further illustrate the present invention, the following examples are provided for illustration.
The flow chart of the target electrophilic substitution reaction obtained by the rapid screening method described in the above technical scheme for the predicted reaction to be screened is shown in fig. 2.
Example 1
Judging whether the electrophilic substitution reaction obtained by the backward pushing in the figure 3 is reasonable or not;
the optimal candidate molecule can be screened out by the technical scheme provided by the invention, and the method is shown in figure 4.
The candidate reference reaction has a similarity of 98.6% to the reaction to be screened and has similar reaction sites. Under the same molecular fragment, there is 74.5% literature support (number of references supporting the response at the site/number of all candidate references) to justify the site. It takes 523 ms.
The sources of the references: 8689168, respectively; article; desai, Lopa v.; stowers, Kara j.; sanford, Melanie s.; journal of the American chemical society; vol.130; 40; (2008) (ii) a p.13285-13293.
Example 2
Judging whether the electrophilic substitution reaction obtained by the backward pushing in the figure 5 is reasonable or not;
the optimal candidate molecule can be screened out by the technical scheme provided by the invention, which is shown in figure 6.
The candidate reference reaction has a similarity of 92.3% to the reaction to be screened and has similar reaction sites. Under the same molecular fragment, there is 100% literature support to justify this site. It takes 245 ms.
The sources of the references: 2069351, respectively; article; huebner; ulrich; justus Liebigs Annalen der Chemie; vol.222; (1884) (ii) a p.95.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A method for rapidly screening electrophilic substitution reaction of an aromatic system is characterized by comprising the following steps:
a) establishing a database containing chemical reaction documents, and extracting all document reactions related to electrophilic aromatic ring substitution from the database to obtain a electrophilic aromatic ring substitution reaction library; then preprocessing the aromatic ring electrophilic substitution reaction library to generate three data tables of an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library;
b) and b) screening the predicted reaction to be screened by adopting the three data tables generated in the step a) to obtain the target electrophilic substitution reaction.
2. The rapid screening method according to claim 1, wherein the process of creating the database containing the chemical reaction literature in step a) is specifically:
a1) storing the collected various chemical reactions in advance through a computer, converting the chemical reactions into a computer storage format, and then performing data processing to obtain one-to-one correspondence information of atoms in reactants and products;
a2) according to the corresponding information, identifying the reaction sites, and extracting the atoms, chemical bonds and groups which are directly connected with or conjugated with the reaction sites and the groups which are indirectly connected with the reaction sites and influence the reaction as the information for identifying the chemical reaction, and further storing the information in a database as the reaction rule.
3. The rapid screening method according to claim 2, wherein the various chemical reactions in step a1) include conventional chemical reactions, classical organic name reactions, chemical reactions reported in academic journals, and chemical reactions reported in patents.
4. The rapid screening method according to claim 2, wherein the reaction site in step a2) comprises changed chemical bonds and atoms directly connected to the chemical bonds;
the identification process specifically comprises the following steps:
by comparing the chemical structures of the starting materials and the products in the reaction, the altered chemical bonds and the atoms to which these bonds are directly attached are found.
5. The rapid screening method according to claim 1, wherein the pretreatment in step a) is specifically:
obtaining a corresponding aromatic ring skeleton structure table by identifying the skeleton of a reaction main body to form an aromatic ring skeleton library;
carrying out structure simplification on original reactant molecules with aromatic rings and generating corresponding simplified molecular structures, then calculating the substituted groups on the simplified rings by adopting a group electronegativity calculation method, and combining according to the numbers of skeleton atoms connected with the substituted groups, thereby generating substituted group fingerprints aiming at the molecules and forming a substituted group fingerprint library;
and (3) intensively analyzing all reactions with electrophilic substitutions as a unified template, carrying out unique numbering, recording the related information of specific reaction sites and example numbers, generating a substitution reaction information table, and forming a reaction site information base.
6. The rapid screening method according to claim 5, wherein the process of simplifying the structure of the original reactant molecule with aromatic ring is specifically as follows:
and (3) taking atoms on the skeleton ring as a starting point, expanding three layers of atoms and bonds to the group, and discarding other redundant group atoms to obtain the simplified molecule.
7. The rapid screening method according to claim 1, wherein the screening process in step b) is specifically:
b1) after molecular structure reduction is carried out on the prediction reaction to be screened, an aromatic ring skeleton library, a substituent fingerprint library and a reaction site information library are sequentially matched;
if no matching data is found, the data is regarded as that no reference data which can be referred to is found, and finally the data is marked as unreasonable reaction to be screened out;
if similar data exists, taking out the data from the database for further comparison;
b2) calculating the reaction similarity between the similar data screened in the step b1) and the corresponding data of the predicted reaction to be screened;
if the similarity is less than or equal to the standard threshold, the data of the literature which can be referred to is considered to be not found, and finally the data is marked as unreasonable reaction to be screened out;
and if the similarity is greater than the standard threshold, selecting literature reactions with similar reaction sites to form a candidate group, sequencing the candidate reactions, and screening the target electrophilic substitution reaction.
8. The rapid screening method according to claim 7, wherein the standard threshold in step b2) is 70% or more.
9. The rapid screening method according to claim 7, wherein the process of ranking the candidate responses in step b2) is specifically:
sorting according to similarity scoring; for the condition that a plurality of sites on an aromatic ring have related reasonable literature support, by mining data distribution under different sites, reactions with more literature support, higher yield and better conditions are preferentially selected as the final selected literature reaction as reference, and the selected reactions are marked as reasonable reactions.
CN202011533369.0A 2020-12-23 2020-12-23 Rapid screening method for electrophilic substitution reaction of aromatic system Pending CN114664388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011533369.0A CN114664388A (en) 2020-12-23 2020-12-23 Rapid screening method for electrophilic substitution reaction of aromatic system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011533369.0A CN114664388A (en) 2020-12-23 2020-12-23 Rapid screening method for electrophilic substitution reaction of aromatic system

Publications (1)

Publication Number Publication Date
CN114664388A true CN114664388A (en) 2022-06-24

Family

ID=82024147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011533369.0A Pending CN114664388A (en) 2020-12-23 2020-12-23 Rapid screening method for electrophilic substitution reaction of aromatic system

Country Status (1)

Country Link
CN (1) CN114664388A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115206450A (en) * 2022-09-15 2022-10-18 药融云数字科技(成都)有限公司 Synthetic route recommendation method and terminal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115206450A (en) * 2022-09-15 2022-10-18 药融云数字科技(成都)有限公司 Synthetic route recommendation method and terminal

Similar Documents

Publication Publication Date Title
Heinonen et al. Metabolite identification and molecular fingerprint prediction through machine learning
Hufsky et al. Computational mass spectrometry for small-molecule fragmentation
Munk Computer-based structure determination: Then and now
JP2007287531A (en) Mass spectrometry data analysis method
WO2008058923A2 (en) A system and method to identify the metabolites of a drug
Richard Application of SAR methods to non-congeneric data bases assocated with carcinogenicity and mutagenicity: Issues and approachs
Manikandan et al. Sequential pattern mining on chemical bonding database in the bioinformatics field
Godzien et al. Metabolite annotation and identification
CN114664388A (en) Rapid screening method for electrophilic substitution reaction of aromatic system
Stepišnik et al. A comprehensive comparison of molecular feature representations for use in predictive modeling
CN108416034A (en) Information acquisition system and its control method based on financial isomery big data
CN115576999A (en) Task data processing method, device and equipment based on cloud platform and storage medium
Portugaly et al. EVEREST: automatic identification and classification of protein domains in all protein sequences
Kreutter et al. Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search
Li et al. Retro-BLEU: quantifying chemical plausibility of retrosynthesis routes through reaction template sequence analysis
CN113707239A (en) Lead compound optimization method based on medicinal chemical transformation rule
Zhang et al. Semi-SGD: Semi-supervised learning based spammer group detection in product reviews
Tüysüzoğlu et al. Temporal bagging: a new method for time-based ensemble learning
JP6027436B2 (en) Mass spectrometry data analysis method
Yang et al. A novel approach to improving C-Tree for feature selection
Müller et al. MultiClust special issue on discovering, summarizing and using multiple clusterings
Xing et al. Molecular formula discovery via bottom-up MS/MS interrogation
Kutuzova et al. Bi-modal variational autoencoders for metabolite identification using tandem mass spectrometry
Loukil et al. Impact of a priori MS/MS intensity distributions on database search for peptide identification
Mrzic et al. Automated recommendation of metabolite substructures from mass spectra using frequent pattern mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination