CN115762661A

CN115762661A - Molecular design and structure optimization method, system, device and storage medium

Info

Publication number: CN115762661A
Application number: CN202211453614.6A
Authority: CN
Inventors: 王辉; 吴静巍; 徐景鑫; 曾琢; 田晓晖; 王斌; 刘真甫
Original assignee: Suzhou Woshi Digital Technology Co ltd
Current assignee: Suzhou Woshi Digital Technology Co ltd
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2023-03-07

Abstract

The invention discloses a method, a system, a device and a storage medium for molecular design and structure optimization, wherein the method comprises the following steps: acquiring molecular synthesis block data, and establishing a molecular synthesis block database; establishing a chemical reaction rule based on prior knowledge; performing structural segmentation on an input molecular structure according to a chemical reaction rule to obtain a decomposition segment; based on the decomposition fragments, obtaining a plurality of target structures through the data of the molecular synthesis building block database; training and carrying out structural recombination through a variational self-encoder based on a plurality of target structures and chemical reaction rules to obtain a plurality of target molecular structures; and performing molecular docking and pharmacodynamic parameter evaluation treatment on a plurality of target molecular structures to determine target drug-forming molecules. The invention ensures that the generated molecules have better synthesizability and are easy to rapidly prepare through similar chemical reaction operation, can efficiently realize the structure optimization of the compound and can be widely applied to the technical field of pharmaceutical chemistry.

Description

Molecular design and structure optimization method, system, device and storage medium

Technical Field

The invention relates to the technical field of medicinal chemistry, in particular to a molecular design and structure optimization method, a system, a device and a storage medium.

Background

Drug discovery is often faced with the problems of high cost and long cycle time. A successful candidate must not only meet the biological activity criteria, but also have good pharmacokinetic and other properties, as well as no significant toxic side effects, and at the same time have good synthesizability. The lead optimization stage improves the above properties by making structural modifications and modifications on the lead compound, and the basic logic is to analyze structure-activity relationship (SAR). Under the condition of not changing the molecular parent nucleus of the lead compound, accurate SAR can be obtained by comprehensively comparing side chain substituent groups, and the substituent groups are gradually adjusted to obtain a compound structure with better properties. If a database of structures similar to a given molecular structure can be quickly obtained, the efficiency of lead optimization can be greatly improved. There are some methods that can rapidly generate a structural database similar to a given structure, such as a deep learning-based generative model, an analysis method based on Matched Molecular Pairs (MMPs), and some molecular graph-based methods.

However, the current molecular generation method based on deep learning can rapidly explore a huge amount of chemical space, but also faces some problems: for example, the structure is not high in property; the synthesizability is poor; and due to the sequential nature of the SMILES string, molecules that retain structural parent are difficult to generate; structure-activity relationships are difficult to analyze; it is more difficult to rapidly prepare molecules by automated batch synthesis methods.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, a system, an apparatus, and a storage medium for molecular design and structure optimization, which can efficiently implement structure optimization of a compound.

In one aspect, an embodiment of the present invention provides a molecular design and structure optimization method, including:

acquiring molecular synthesis block data, and establishing a molecular synthesis block database;

establishing a chemical reaction rule based on prior knowledge;

performing structural segmentation on the input molecular structure according to the chemical reaction rule to obtain a decomposition segment; the decomposition fragment comprises a molecular parent nucleus and a substituent fragment;

based on the decomposition fragments, obtaining a plurality of target structures through the data of the molecular synthesis building block database;

based on the target structures and the chemical reaction rules, training through a variational self-encoder and carrying out structural recombination to obtain target molecular structures;

and performing molecular docking and pharmacodynamic parameter evaluation treatment on a plurality of target molecular structures to determine target drug-forming molecules.

Optionally, the method further comprises:

drawing a data analysis result report form through a data analysis library according to the target druggy molecule;

the data analysis result report comprises a molecular structure, a molecular descriptor, physicochemical properties and a structure-activity relationship data analysis result report.

Optionally, the obtaining of molecular synthesis building block data and building of a molecular synthesis building block database include:

based on a public structure database, collecting and synthesizing building block structure data, and converting the building block structure data into a standardized SMI LES sequence through a biochemical tool kit;

carrying out structure preparation on all molecules in the synthetic building block structure data, and storing in a preset format;

wherein the structure preparation comprises hydrogenation, protonation, and generation of a three-dimensional conformation.

Optionally, the establishing a chemical reaction rule based on the a priori knowledge comprises:

establishing a chemical reaction rule of drug synthesis based on organic chemistry textbooks and literatures;

wherein the chemical reaction rule is represented by a SMARTS expression.

Optionally, the method further comprises:

constructing a variational self-encoder through an encoder, a latent space and a decoder;

wherein the encoder comprises a recurrent neural network layer and a fully-connected layer; the decoder includes a recurrent neural network layer and a time-dependent fully-connected layer.

Optionally, the method further comprises:

training parameters of the variational self-encoder through a loss function;

wherein the loss function includes a reconstruction loss and a regularization loss; in the training, adam random gradient descent is used as an optimizer, weight initialization is carried out through an Xavier initialization method, and an early stop strategy is adopted.

Optionally, the performing molecular docking and pharmacodynamic parameter evaluation processing on a plurality of target molecular structures to determine a target drug-forming molecule includes:

carrying out structure preparation on a plurality of target molecular structures to obtain corresponding three-dimensional conformations;

performing molecular docking and affinity and pharmacodynamic property sequencing according to the three-dimensional conformation to obtain evaluation parameters;

wherein the evaluation parameters comprise docking scores, physicochemical properties, drug-like properties and synthesizability;

based on the evaluation parameters, a target druggable molecule is determined.

In another aspect, an embodiment of the present invention provides a molecular design and structure optimization system, including:

the first module is used for acquiring molecular synthesis building block data and establishing a molecular synthesis building block database;

a second module for establishing a chemical reaction rule based on the prior knowledge;

the third module is used for carrying out structural segmentation on the input molecular structure according to the chemical reaction rule to obtain a decomposition fragment; the decomposition fragment comprises a molecular parent nucleus and a substituent fragment;

a fourth module for obtaining a plurality of target structures from the data of the molecular synthesis block database based on the decomposed segments;

the fifth module is used for training and carrying out structural recombination through a variational self-encoder based on a plurality of target structures and the chemical reaction rule to obtain a plurality of target molecular structures;

and the sixth module is used for carrying out molecular docking and efficacy parameter evaluation processing on a plurality of target molecular structures and determining target drug-forming molecules.

In another aspect, an embodiment of the present invention provides a molecular design and structure optimization apparatus, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

In another aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a program, which is executed by a processor to implement the method as described above.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

The embodiment of the invention firstly obtains the data of the molecular synthesis building block and establishes a molecular synthesis building block database; establishing a chemical reaction rule based on prior knowledge; performing structural segmentation on the input molecular structure according to the chemical reaction rule to obtain a decomposition fragment; the decomposition fragment comprises a molecular parent nucleus and a substituent fragment; based on the decomposition fragments, obtaining a plurality of target structures through the data of the molecular synthesis building block database; training and carrying out structural recombination through a variational self-encoder based on a plurality of target structures and the chemical reaction rule to obtain a plurality of target molecular structures; and performing molecular docking and pharmacodynamic parameter evaluation treatment on a plurality of target molecular structures to determine target drug-forming molecules. According to the invention, by establishing a chemical reaction rule, the input molecular structure is segmented, and then molecular recombination is realized by combining a molecular synthesis building block database and a variational self-encoder, so that the generated molecules have better synthesizability, and are easy to rapidly prepare through similar chemical reaction operation. And optimized screening is realized through molecular docking and pharmacodynamic parameter evaluation processing, so that the target drug-forming molecules are finally obtained. The invention can efficiently realize the structural optimization of the compound.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a molecular design and structure optimization method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a molecular design and structure optimization method according to an embodiment of the present invention;

FIG. 3 is a schematic overall flow chart of a molecular design and structure optimization method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the principles of a chemical reaction provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of the principle and process of measuring chemical similarity provided by an embodiment of the present invention;

FIG. 6 is a schematic illustration of a molecular design structure resulting from molecular design and structural optimization provided by embodiments of the present invention;

FIG. 7 is a diagram illustrating a result of quantitative structure-activity relationship based on a statistical learning method according to an embodiment of the present invention;

FIG. 8 is a graph of chemical structure and physicochemical property data generated by molecular design and structure optimization provided by an embodiment of the present invention;

FIG. 9 is a diagram of the docking procedure and affinity frequency distribution histogram for molecular design and structure optimization provided by an embodiment of the present invention;

FIG. 10 is a diagram of a three-dimensional configuration of molecular design and structure optimization provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In one aspect, referring to fig. 1, an embodiment of the present invention provides a molecular design and structure optimization method, including:

s100, obtaining molecular synthesis block data and establishing a molecular synthesis block database;

it should be noted that, in some embodiments, the synthetic block structure data is collected based on the public structure database and converted into a standardized SMI LES sequence through a biochemical toolkit; carrying out structure preparation on all molecules in the structural data of the synthetic building block, and storing in a preset format; wherein the structural preparation comprises hydrogenation, protonation, and generation of a three-dimensional conformation.

Specifically, a high-quality molecular synthesis Building Block (BBs) structure is obtained from a database. The detailed steps are as follows: the building block structure data is collected from each public structure database and converted into a standardized SMI LES sequence (Simp I modified mole near input system, simplified molecule linear input specification, a specification for explicitly describing the molecular structure by an ASCI I character string) through an RDKit (RDKit is a common biochemical information python toolkit which provides a large number of calculation operations on chemical molecules 2D or 3D and can generate molecular descriptors for machine learning). Simultaneously, all molecules are subjected to structure preparation through an internal molecule preparation process, including hydrogenation, protonation, and generation of a three-dimensional conformation, which is preserved in the sdf format. The public data block comprises databases such as Reaxyz, pubChem, chEMBL and ZINC.

S200, establishing a chemical reaction rule based on prior knowledge;

it is noted that in some examples, the rules of chemical reactions for drug synthesis are established based on organic chemistry textbooks and literature; wherein the chemical reaction rule is expressed by a smart expression.

Specifically, in order to make the resulting molecules have good synthesizability, common drug synthesis reaction rules were established from organic chemistry textbooks and literature, and a total of 85 chemical reaction rules were obtained, expressed by smart expressions, and stored in a dictionary. The chemical reaction rules take common drug synthesis reaction types into consideration, including coupling reaction, esterification reaction, cycloaddition reaction, cyclization reaction and the like. Each reaction rule is coded by SMARTS, so that the decomposition and recombination of the molecular structure according to the reaction rule can be realized.

S300, carrying out structural segmentation on the input molecular structure according to a chemical reaction rule to obtain a decomposition fragment;

it should be noted that the decomposed fragment includes a molecular nucleus and a substituent fragment;

specifically, the prepared chemical reaction rule is utilized to carry out structure segmentation on the lead compound (namely, the input molecular structure) and decompose the lead compound into a molecular parent nucleus and a substituent fragment which have key effects on drug effect.

S400, based on the decomposition fragments, obtaining a plurality of target structures through the data of the molecular synthesis building block database;

specifically, based on the molecular parent nucleus and the substituent fragment, a structure (i.e., a target structure) similar to the decomposed fragment is searched from the prepared molecular synthesis building block library.

S500, training through a variational self-encoder and carrying out structural recombination on the basis of a plurality of target structures and chemical reaction rules to obtain a plurality of target molecular structures;

it should be noted that, in some embodiments, the method further includes: constructing a variational self-encoder through an encoder, a latent space and a decoder; the encoder comprises a recurrent neural network layer and a full connection layer; the decoder includes a recurrent neural network layer and a time-dependent fully-connected layer.

Specifically, a design Variation Autoencoder (VAE) architecture constructs a molecular generative model, the VAE including an encoder (encoder), a latent space (latent space), and a decoder (decoder). The designed encoder contains two Recurrent Neural Network (RNN) layers, and one fully-connected layer. After a given structure is segmented by a chemical reaction rule, a molecular parent nucleus and a substituent fragment which play a key role in biological activity are obtained. The input of the encoder is the chemical structure information tensor of the molecule mother nucleus-substituent fragment, and the output is the multidimensional probability distribution of the latent space. The latent space is a highly structured and continuous high-dimensional space whose specific orientation can represent a meaningful axis of change in the raw data. The potential space captures key statistics of the raw data. The decoder of the VAE also contains two recurrent neural network layers, and a time-dependent fully-connected layer. The input to the decoder is a sample of the latency space and the output is the probability of each character in the SMI LES string occurring. The molecular generation model constructed by the variational self-encoder is used for obtaining a plurality of molecular parent nuclei and substituent fragments (namely target molecular structures) similar to a given structure based on a target structure, and the step is used for realizing recombination of the generated structures through a chemical reaction rule to obtain the structures similar to the initial input molecules.

It should be noted that the parent-child nucleus and the substituent fragment are structurally recombined by the prepared chemical reaction rules. Due to the consideration of the chemical reaction rule, the recombined molecule has better synthesizability and better similarity with a given structure, thereby having similar functional properties. Similarity is measured by a Tan imoto similarity coefficient, which is based on ECFP4 molecular fingerprints of RDKit. Because the molecules have better similarity, a quantitative structure-activity relationship (QSAR) model is favorable to be established, and the advantages and disadvantages of different substituents can be compared by analyzing the structure-activity relationship. In addition, since molecules are designed by similar chemical reactions, they can be prepared by similar chemical reaction operations, facilitating automated synthesis and rapid acquisition of solid molecules.

It should be further noted that the molecular generation model VAE, which is a modern self-encoder, combines the deep learning idea with bayesian inference to realize encoding of input molecules into a low-dimensional latent vector space, and then decoding the input molecules by a decoder. The VAE converts the molecular data into the parameters of the statistical distribution, i.e., the mean and variance, then randomly samples an element from the distribution using both the mean and variance parameters, and decodes this element into the original input. This process is essentially a statistical process with some randomness. This randomness increases the robustness of the process and forces any location in the underlying space to correspond to a meaningful representation, i.e., each point of the underlying spatial sample can be decoded as a valid output.

In some embodiments, further comprising: training parameters of a variational self-encoder through a loss function; wherein the loss function comprises a reconstruction loss and a regularization loss; in the training, adam random gradient descent is used as an optimizer, weight initialization is carried out through an Xavier initialization method, and an early stop strategy is adopted.

Specifically, the parameters of the VAE are trained by two loss functions: (1) Reconstruction loss (reconstruction ion loss), forcing the decoded molecules to match the initial input; (2) Regularization loss (regl ar izot ion loss) helps to learn the potential space with good structure and can reduce overfitting on the training data.

The training process for VAE uses Adam random gradient descent as the optimizer with the learning rate set to 0.0001. Dropout is performed on each RNN level to prevent overfitting, with its ratio (dropout rate) set to 0.001. Using the ReLU activation function for all layers; initializing the weight by using an Xavier initialization method; the batch size is set to 1024. To further prevent overfitting, an early stop (ear ly stopping) strategy was used.

S600, carrying out molecular docking and efficacy parameter evaluation processing on a plurality of target molecular structures, and determining target drug-forming molecules.

It should be noted that, in some embodiments, a plurality of target molecular structures are structurally prepared to obtain corresponding three-dimensional conformations; performing molecular docking and affinity and pharmacodynamic property sequencing according to the three-dimensional conformation to obtain evaluation parameters; wherein the evaluation parameters comprise docking scores, physicochemical properties, drug-like properties and synthesizability; based on the evaluation parameters, the target druggable molecule is determined.

Specifically, a large number of molecules with similar structures to those of a given compound obtained in the specific example of S500 are subjected to structure preparation, including hydrogenation, protonation, etc., through an internal molecule preparation process to generate a three-dimensional conformation, and then subjected to molecular docking with a biological target, thereby sequencing the molecules according to affinity and pharmacodynamic properties. The final molecules with potential druggability are obtained by screening by comprehensively considering parameters such as docking scoring, physicochemical properties, drug-like properties, synthesizability and the like. The molecules obtained by the generation are subjected to rapid affinity and pharmacodynamic parameter sequencing through an internal molecule docking and pharmacodynamic parameter evaluation method, candidate molecules with potential druggy properties are screened from the molecules, and a molecular structure with stronger affinity and better pharmacodynamic properties than an input reference compound is obtained.

It should be noted that, in some embodiments, the method further includes: drawing a data analysis result report through a data analysis library according to the target druggy molecule; the data analysis result report comprises a data analysis result report of a molecular structure, a molecular descriptor, physicochemical properties and a structure-activity relationship.

Specifically, after all tasks are completed, data analysis result reports such as the molecular structure, the molecular descriptor, the physicochemical property, the structure-activity relationship and the like are directly drawn in Jupyter by calling data analysis databases such as RDKit, pandas, matp lot l ib and the like.

The technical solution of the present invention is further described below with reference to the specific embodiments and the accompanying drawings, and it should be understood that the following is an explanation of the technical solution and should not be construed as a limitation of the present invention.

The following example uses the principle as shown in fig. 2, and the workflow is shown in fig. 3.

Example 1

Lead compound structure optimization for bruton's tyrosine kinase BTK:

BTK is a key kinase in the B Cell Receptor (BCR) signal transduction pathway, is widely expressed in different types of hematologic malignancies, and participates in the proliferation, differentiation and apoptosis of B cells. In order to obtain a BTK inhibitor structure with better properties, the inhibitor structure and a lead compound for obtaining the target are firstly downloaded. The lead compound is subjected to structural segmentation by using the prepared chemical reaction rule, and decomposed into a molecular parent nucleus and a substituent fragment which have key effects on drug effects, as shown in fig. 4. And searching a structure similar to the decomposed fragment from the prepared molecular synthesis block library, and training by a variational self-encoder to obtain a similar functional group structure. The molecular parent nucleus and the generated structural fragment are recombined based on the chemical reaction rule to obtain a complete generated molecule, and the structural similarity is calculated with the initial input molecule, as shown in fig. 5. The complete molecular structure obtained after recombination is shown in FIG. 6. Meanwhile, a quantitative structure-activity relationship (QSAR) model is established for the collected inhibitors through a statistical learning method including a Random Forest (RF), a Support Vector Machine (SVM), a multilayer perceptron (MLP) and the like, and the result is shown in FIG. 7. And guiding drug design through a QSAR model.

Example 2

Inhibitor design and structural optimization for cyclooxygenase COX

Cyclooxygenase (COX), also known as prostaglandin endoperoxide synthase (PTGS), is an important enzyme in the organism responsible for the formation of important bioregulators, including prostaglandins, prostacyclins, thromboxanes, etc. COX inhibition by drugs can alleviate the symptoms of inflammation and pain, and inhibitors thereof are primarily non-steroidal anti-inflammatory drugs. In order to obtain COX inhibitors with excellent properties, a three-dimensional crystal structure of the target is downloaded, the existing drug molecule celecoxib is decomposed through a chemical reaction rule, and a new molecular structure is generated by utilizing VAE, as shown in figure 8. And (3) carrying out molecular docking on the obtained molecules and COX targets to obtain the three-dimensional conformation and the scoring ordering of the molecules and a docking scoring frequency distribution histogram, as shown in figure 9, and keeping the molecules with docking scores < -9 > as a next screening step. The three-dimensional conformation of the remaining partial chemical molecular structure is shown in fig. 10, and has better similarity with a given reference structure.

In conclusion, the method combines the chemical reaction rule and the deep learning model, and takes the chemical reaction rule into consideration in the molecular design stage, so that the generated molecules have better synthesizability and can be easily and quickly prepared by similar chemical reaction operation; and subsequently, a molecular docking and efficacy parameter evaluation model is integrated, and the affinity and efficacy parameters of the generated molecules can be further calculated, so that the initial structure is favorably transformed and optimized. Finally, the user can check and modify the execution state of the task in the Jupiter notebook through the data analysis module at any time, and can also visualize the existing calculation and analysis results through a graphical interface mode. The beneficial effects of the invention include: 1. the building of a molecular synthesis building block library is realized, and a synthesis building block list with different reaction functional groups is obtained; 2. the establishment of a chemical reaction rule based on a SMARTS rule is realized, the synthesizability of designed molecules is ensured, and the potential of the entity molecules can be obtained quickly by automatic equipment; 3. through molecular docking and a pharmacodynamic parameter model, a structure with better performance than a reference molecule is quickly obtained, and meanwhile, the structure-activity relationship is easy to analyze, so that an insight is provided for drug design and optimization, and an idea is provided for the next round of iteration.

In another aspect, an embodiment of the present invention provides a molecular design and structure optimization system, including: the first module is used for acquiring molecular synthesis building block data and establishing a molecular synthesis building block database; a second module for establishing a chemical reaction rule based on the prior knowledge; the third module is used for carrying out structural segmentation on the input molecular structure according to a chemical reaction rule to obtain a decomposition fragment; the decomposition fragment comprises a molecular parent nucleus and a substituent fragment; the fourth module is used for obtaining a plurality of target structures through the data of the molecular synthesis building block database based on the decomposition fragments; the fifth module is used for training through a variational self-encoder and carrying out structural recombination on the basis of a plurality of target structures and chemical reaction rules to obtain a plurality of target molecular structures; and the sixth module is used for carrying out molecular docking and efficacy parameter evaluation processing on a plurality of target molecular structures and determining target drug-forming molecules.

The content of the embodiment of the method of the invention is all applicable to the embodiment of the system, the function of the embodiment of the system is the same as the embodiment of the method, and the beneficial effect achieved by the embodiment of the system is the same as the beneficial effect achieved by the method.

In another aspect, an embodiment of the present invention further provides a molecular design and structure optimization apparatus, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

The contents of the embodiment of the method of the present invention are all applicable to the embodiment of the electronic device, the functions specifically implemented by the embodiment of the electronic device are the same as those of the embodiment of the method, and the beneficial effects achieved by the embodiment of the electronic device are also the same as those achieved by the method.

Yet another aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a program, which is executed by a processor to implement the method as described above.

The contents of the embodiment of the method of the present invention are all applicable to the embodiment of the computer-readable storage medium, the functions specifically implemented by the embodiment of the computer-readable storage medium are the same as those of the embodiment of the method described above, and the advantageous effects achieved by the embodiment of the computer-readable storage medium are also the same as those achieved by the method described above.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution apparatus, device, or device (e.g., a computer-based apparatus, processor-containing apparatus, or other device that can fetch the instructions from the instruction execution apparatus, device, or device and execute the instructions). For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution apparatus, device, or apparatus.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for molecular design and structural optimization, comprising:

establishing a chemical reaction rule based on prior knowledge;

performing structural segmentation on the input molecular structure according to the chemical reaction rule to obtain a decomposition fragment; the decomposition fragment comprises a molecular parent nucleus and a substituent fragment;

training and carrying out structural recombination through a variational self-encoder based on a plurality of target structures and the chemical reaction rule to obtain a plurality of target molecular structures;

and carrying out molecular docking and evaluation treatment on the pharmacodynamic parameters of a plurality of target molecular structures to determine target drug-forming molecules.

2. A method for molecular design and structural optimization according to claim 1, further comprising:

the data analysis result report comprises a data analysis result report of a molecular structure, a molecular descriptor, physicochemical properties and a structure-activity relationship.

3. The molecular design and structure optimization method according to claim 1, wherein the obtaining molecular synthesis block data and establishing a molecular synthesis block database comprises:

based on a public structure database, collecting and synthesizing building block structure data, and converting the building block structure data into a standardized SMILES sequence through a biochemical tool kit;

4. The method of claim 1, wherein the establishing a chemical reaction rule based on the prior knowledge comprises:

establishing a chemical reaction rule for drug synthesis based on organic chemistry textbooks and literatures;

wherein the chemical reaction rule is expressed by a SMARTS expression.

5. A method for molecular design and structural optimization according to claim 1, further comprising:

6. A molecular design and architecture optimization method according to claim 5, further comprising:

training parameters of the variational self-encoder through a loss function;

7. The method of claim 1, wherein the step of performing molecular docking and evaluation of pharmacodynamic parameters on a plurality of target molecular structures to determine target drug molecules comprises:

based on the evaluation parameters, a target druggable molecule is determined.

8. A molecular design and architecture optimization system, comprising:

a fifth module, which is used for training and carrying out structural recombination through a variational self-encoder based on a plurality of target structures and the chemical reaction rule to obtain a plurality of target molecular structures;

9. A molecular design and structure optimization device comprises a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1 to 7.