CN111863120A - Drug virtual screening system and method for crystal compound - Google Patents

Drug virtual screening system and method for crystal compound Download PDF

Info

Publication number
CN111863120A
CN111863120A CN202010597114.4A CN202010597114A CN111863120A CN 111863120 A CN111863120 A CN 111863120A CN 202010597114 A CN202010597114 A CN 202010597114A CN 111863120 A CN111863120 A CN 111863120A
Authority
CN
China
Prior art keywords
compound
subsystem
model
compounds
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010597114.4A
Other languages
Chinese (zh)
Other versions
CN111863120B (en
Inventor
杨立君
徐旻
张佩宇
马健
温书豪
赖力鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jingtai Technology Co Ltd
Original Assignee
Shenzhen Jingtai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jingtai Technology Co Ltd filed Critical Shenzhen Jingtai Technology Co Ltd
Priority to CN202010597114.4A priority Critical patent/CN111863120B/en
Publication of CN111863120A publication Critical patent/CN111863120A/en
Application granted granted Critical
Publication of CN111863120B publication Critical patent/CN111863120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/80Data visualisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a virtual drug screening system for a crystal compound, which comprises a visualization subsystem, an evaluation tool box system, an AI model management subsystem, a large-scale sampling subsystem, a virtual screening subsystem and a data log storage subsystem; the system recommends a batch of candidate compounds meeting the requirements after a known crystal compound passes through a visualization subsystem, an evaluation tool box system, an AI model management subsystem, a large-scale sampling subsystem and a virtual screening system in sequence. Based on the system, the generation of the compound library is organically combined with the subsequent virtual screening, and a batch of compounds meeting the expectation can be generated as long as a user describes the action mode of the drug on the protein and the requirements of the drug. The automatic system reduces the intervention of users and improves the research and development efficiency.

Description

Drug virtual screening system and method for crystal compound
Technical Field
The application belongs to the technical field of computer-aided drug design, and particularly relates to a drug virtual screening system and method of a crystal compound.
Background
In traditional drug development, after a crystal compound of a drug and a protein is obtained through early high-throughput screening, an action mode is analyzed, and the structure of an existing compound is replaced according to a biological electronic isostere principle and drug design experience to obtain a new compound. The traditional research and development means are as follows: biological electron isostere replacement, molecular docking, framework transition and virtual screening.
Generally speaking, the technologies are already provided in commercial software such as common drug design software MOE, Maestro, Discovery Studio and the like, and meet the requirements of conventional drug development.
However, with the development of current pharmaceutical chemistry theory and organic chemistry synthesis means, when a potential leptoprosopy compound is found, the pharmaceutical research institution usually studies the possible substituent groups deeply, synthesizes and tests the activity of the derivative, and finally obtains a fully perfect structure-activity relationship. This makes new drugs with the same skeleton almost unavailable to subsequent researchers.
The drug patents, considering traditional new drug design strategies, will protect the compound structures that may be obtained using traditional drug design strategies, making it difficult for the latter to obtain new drugs by simple replacement.
Traditional methods such as molecular docking, pharmacophore models, and the like rely heavily on selected compound libraries. The current compound libraries have molecular weight levels of hundreds of thousands, and the compound libraries released for years have been explored many times by predecessors, and have few compounds and difficult to have novel frameworks. The use of AI-derived compounds can produce hundreds of thousands of compounds at a time, allowing a wider search space.
Disclosure of Invention
Aiming at the technical problems, the invention aims to provide a virtual drug screening system for crystal compounds, which can effectively solve the problem that a new drug design strategy is difficult to obtain a new framework, breaks through the barriers of the existing compound patents, and simultaneously, the generated compound library has more target specificity compared with the traditional compound library.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a virtual drug screening system for crystalline complexes, comprising: the system comprises a visualization subsystem, an evaluation tool box system, an AI model management subsystem, a large-scale sampling subsystem, a virtual screening subsystem and a data log storage subsystem; the virtual drug screening system recommends a batch of candidate compounds meeting the requirements after a known crystal compound passes through a visualization subsystem, an evaluation tool box system, an AI model management subsystem, a large-scale sampling subsystem and a virtual screening subsystem in sequence.
The visualization subsystem is used for checking the binding position of the ligand in the crystal compound in the protein, analyzing the binding mode of the ligand and the protein and extracting the characteristic of enhancing the affinity of the drug to the protein.
The evaluation kit system is packaged with a plurality of compound evaluation modules and used for designing an evaluation function by selecting the plurality of compound evaluation modules and giving proper weight;
the AI model management subsystem is used for AI model, AI model training and AI model parameter updating;
the large-scale sampling subsystem is used for sampling and screening the trained AI model to obtain a compound library consisting of corresponding compounds;
the virtual screening subsystem is used for further screening the compounds in the compound library;
the data log storage subsystem is used for establishing and storing a log information document of a user; the log information document is used for recording operation records of a user and generating corresponding data.
By adopting the technical scheme, the invention has the advantages that a user defines the key characteristics of the medicine by analyzing the binding mode of the ligand in the crystal compound and sets the physicochemical properties of the candidate compound. The AI model updates the parameters according to user-defined requirements to generate a batch of compounds that satisfy the conditions. These compounds were conditioned into a library of compounds. Virtually screening compounds in a compound library to obtain a batch of candidate compounds. The functional structure and flow of the system are shown in fig. 1.
Preferably, the enhancing agent's affinity for the protein is characterized by hydrogen bonding and/or hydrophobic interactions.
Preferably, the evaluation function is a weighted arithmetic mean, a weighted geometric mean, or a user-defined function.
Preferably, the AI model management subsystem includes an AI model, an AI model training, and an updating of AI model parameters.
Preferably, the AI model, a neural network system generating the compound; the AI model parameters are parameters of the neural network system; the AI model itself enables random generation of compounds.
Preferably, the filtration conditions include the number of heavy atoms of the compound, the number of hydrogen bond donors, the number of hydrogen bond acceptors, the backbone structure, false positives, and compounds already reported in the prior patent literature.
Preferably, the data log storage subsystem further comprises a function of normalizing the user authority.
Correspondingly, the invention provides a screening method using the virtual drug screening system, which comprises the following steps:
step A: defining the binding characteristics of the ligand in the crystal complex through the analysis of the visualization subsystem, downloading the crystal complex structure of the target from a protein crystal structure database by a user, visually checking the binding position of the ligand in the protein, analyzing the binding mode of the ligand and the protein, and extracting the characteristics for enhancing the affinity of the drug to the protein;
B, inputting the compound into the evaluation tool box system, wherein each compound evaluation module in the evaluation tool box system outputs a score, and the scores are integrated into a comprehensive score through an evaluation function;
and C, forming a complete evaluation pipeline by the visualization subsystem and the evaluation tool box system, starting an AI model through the AI model management subsystem, and starting training.
Step D, the large-scale sampling subsystem receives a sampling quantity parameter input by a user, samples the trained AI model to generate a specified quantity of compounds, deletes unreasonable and repeated compounds, then inputs filtering conditions by the user to eliminate compounds which do not meet the requirements, and the rest compounds form a compound library;
step E, the virtual screening subsystem further screens the compounds in the compound library;
and step F, the data log storage subsystem establishes and stores the log information document of the user when the user uses the system to design the medicine.
Wherein, the specific steps of the step A are as follows: a user downloads a crystal compound structure of a target from a protein crystal structure database, visually checks the binding position of a ligand in protein, analyzes the binding mode of the ligand and the protein, and extracts the characteristics of hydrogen bond interaction, hydrophobic interaction and the like which can enhance the affinity of a drug to the protein. The user can give proper weight to each important characteristic on the interface according to the important characteristics of the drug activity, and finally the important characteristics are integrated into a pharmacophore evaluation module. When a compound is input to the pharmacophore evaluation module, the evaluation module outputs a score by evaluating the degree of matching of the compound to the important features.
Wherein, the binding characteristics of the ligand can be obtained by the analysis of a visualization subsystem, can also be obtained by the binding characteristics of a crystal complex which has been reported in relevant literatures, and can also be obtained by the analysis of the binding visualization subsystem and the ligand characteristics which have been reported in the literatures.
The compound evaluation module comprises: a substructure alarm, a selectivity prediction, an activity prediction, a structure similarity, a molecular weight, a number of rotational bonds, a number of hydrogen bond donors, a number of hydrogen bond acceptors, a number of rings, a molecular docking score, an FEP prediction value, a pharmacophore score, a lipid water distribution coefficient value, and a compound toxicity prediction evaluation module.
The compound evaluation module in the evaluation kit system comprises a compound evaluation module for evaluating a plurality of aspects of the properties of the compound, such as conformational characteristics, physical properties, chemical properties, pharmacokinetic properties, structural novelty and the like.
Preferably, in the step C, the AI model outputs the compound generated by the AI model to the evaluation pipeline through interaction with the evaluation pipeline, collects the score of the compound output by the evaluation pipeline, and automatically updates the AI model parameters; after the process is repeated for many times, the compound generated by the AI model can obtain a higher score in the evaluation pipeline; after the AI model training is completed, the AI model parameters are also optimized to suitable values.
Preferably, the step E includes the following steps:
step E1, downloading a protein pdb file of the compound from a pdb library, performing protein pretreatment operation, and performing pretreatment on the protein by deleting water molecules, hydrogenating and the like, deleting irrelevant ligands and defining sites needing butt joint;
e2, performing compound conformation optimization operation, and searching the conformation of the compound with the lowest energy by adopting a genetic algorithm after the compound generates a 3D conformation;
e3, carrying out molecular docking, sorting in a descending order according to molecular docking scores, and selecting compounds which are 5-15% of the top rank;
and E4, performing molecular dynamics simulation on the compound selected in the step E3, and screening the compound which meets the conditions from the compound library according to the simulation result.
Preferably, in the evaluation function, a weight is set for each score: w is a1,w2,w3,……wnForming an evaluation function, said evaluation functionArithmetic weighted average:
Figure BDA0002557729600000041
or a geometrically weighted average:
Figure BDA0002557729600000042
the data log storage subsystem can establish and store log information documents of a user when the user uses the system to design a medicine; the log information document records the operation record of a user and generates corresponding data;
The data log storage subsystem also comprises a function of normalizing user authority, the system can group users according to different research and development pipelines, and the authority of each user to data and logs of each project is different.
The invention has the beneficial effects that:
1. on the basis that the AI model generates a large amount of compounds, the design of an evaluation pipeline is adopted to enable the AI model to generate the compounds meeting specific requirements. The generated compound library has more target specificity compared with the traditional compound library.
2. Based on the system, the generation of the compound library is organically combined with the subsequent virtual screening, and a batch of compounds meeting the expectation can be generated as long as a user describes the action mode of the drug on the protein and the requirements of the drug. The automatic system reduces the intervention of users and improves the research and development efficiency.
3. The operation of the user in the system, the defined parameters and the developed molecules can be recorded in the system, which is beneficial to the research and development traceability. In addition, the system also has strict authority management, and the safety of data is ensured.
Drawings
The technical solution of the present application is further explained below with reference to the drawings and the embodiments.
FIG. 1 is a functional structure and a flow chart of a virtual drug screening system for crystal complexes;
fig. 2 is a flow chart of a virtual drug screening system for crystal complexes, exemplified by the PARP crystal complex.
FIG. 3 is a schematic diagram of an evaluation pipeline from a compound input to a final score returned by an evaluation function.
Detailed Description
Example 1
The flow shown in fig. 2:
adenosine diphosphate ribose polymerase (PARP) participates in the repair of bases by catalyzing ADP ribosylation, plays an important role in the repair of single-stranded DNA damage of cells, and is one of targets of anticancer drugs. PARP1 is a subtype of PARP and is one of the targets for the treatment of triple negative breast cancer. Starting with the crystal complex of PARP1, drug design was performed according to the steps shown in the scheme (as shown in figure 2).
(1) The crystal complex structure of PARP1 is downloaded from a protein crystal structure database, and through visual analysis of the crystal complex of PARP1 and combination with a combination mode reported in the literature, 4 key pharmacophore characteristics (one hydrogen bond donor characteristic, one hydrogen bond acceptor characteristic and two hydrophobic characteristics) are determined, and the 4 characteristics are respectively weighted (the weights are 3, 2 and 1 in sequence) to be integrated into a pharmacophore characteristic evaluation module.
(2) The key pharmacophore characteristics are integrated into a pharmacophore scoring module, six modules of substructure alarm, molecular weight, number of rotary bonds, number of hydrogen bond donors, number of hydrogen bond acceptors and lipid-water distribution coefficient value are added, and an evaluation pipeline is formed by an arithmetic weighted average method through an evaluation function. The module weights are all 1 except for the pharmacophore scoring module weight of 3.
(3) And starting an AI model management subsystem, and training the AI model for 1000 rounds.
(4) The method comprises the steps of inputting sampling quantity parameters of 700 thousands into a large-scale sampling subsystem, carrying out large-scale sampling on an AI model, producing more than 700 thousands of compounds, deleting unreasonable and repeated compounds, finally obtaining more than 80 thousands of compounds, setting screening conditions to filter the compounds, carrying out filtering on the compounds for physicochemical properties such as hydrogen bond donors, hydrogen bond acceptors, heavy atom numbers and the like, deleting compounds containing substructures such as macrocycles, bridged alkanes and the like, and finally obtaining more than 9 thousands of compounds.
(5) Patents were looked up, summarizing the known backbone of PARP inhibitors. Deletion of compounds containing known backbones resulted in more than 2000 compounds and made up a library of compounds.
(6) And virtually screening the compound library, processing PARP protein, optimizing the 3D conformation of the compound, performing molecular docking on the compounds, selecting the compounds which are 5 percent of the top of the score, and performing molecular dynamics simulation.
(7) The conformation of the compounds was manually checked and selected and the results of the kinetic simulations were analyzed to obtain a batch of candidate compounds.
(8) The system automatically records the operation records of the user and the generated candidate compounds and stores the candidate compounds in a classified mode.
Example 2
Alzheimer's disease is a representative degenerative disorder of the central nervous system. In the literature, a plurality of researches on Alzheimer disease are reported to find a plurality of targets. Acetylcholinesterase is one of the important targets. The crystal complex of acetylcholinesterase and its inhibitor is used as starting point to search for new skeleton inhibitor.
(1) According to the literature, one of the crystal complexes (PDB: 4EY7) was used as the starting point. Through visual analysis of a crystal compound (PDB: 4EY7), a ligand is positioned by combining with a literature report, and 5 key pharmacophore characteristics are determined, wherein the characteristics comprise 2 hydrogen bond receptors, 2 aromatic ring characteristics and 1 hydrophobic characteristic, the characteristic weight given to the pharmacophore is 1, and the two characteristics are integrated into a target point characteristic evaluation module.
(2) And (3) combining the pharmacophore models defined in the step (1) into a pharmacophore evaluation module, and supplementing two modules of substructure alarm and structure similarity. In order to be able to discover new backbones, the known acetylcholinesterase inhibitor backbones were collected from the literature as substructures. These substructures are input to a substructure alarm and it is determined whether the resulting compound contains a known backbone of the inhibitor. Meanwhile, the original ligand in the crystal compound is used as a template molecule, and the similarity between the generated molecule and the template molecule is calculated according to the molecular fingerprint. The evaluation function outputs a final score by means of an arithmetic weighted average. Wherein the weight of the pharmacophore scoring module is 5, the weight of the substructure alarm module is 10, and the weight of the structural similarity module is 3.
(3) And (5) using an AI model management subsystem to perform 1000 rounds of intensive training on the AI model.
(4) The sampling quantity parameter of 100 ten thousand is input into the large-scale sampling subsystem, and 100 ten thousand compounds are generated. And deleting invalid and repeated compounds to finally obtain more than 8 ten thousand compounds. Setting four rules of hydrogen bond donor number not more than 5, hydrogen bond acceptor number not more than 10, molecular mass less than 500 and lipid water distribution coefficient not more than 5 to filter compounds, and removing inhibitors containing reported skeletons to obtain more than 3 thousand compounds to form a compound library.
(5) And performing molecular docking on more than 3 thousand compounds in the compound library, and screening out more than 60 molecules with the interaction meeting the literature report.
(6) The system records the candidate compounds obtained by screening.
Example 3
The heat shock protein 90 is a new target of an anti-tumor drug discovered in recent years, and the inhibitor of the heat shock protein 90 can damage the structure of the protein in the body and has the anti-tumor effect in the degradation process. After the crystal structure of heat shock protein 90 is disclosed, computer-aided drug design has become the mainstream for developing novel heat shock protein 90 inhibitors. This example attempts to start with a crystalline complex of heat shock protein 90 and recommends a new family of heat shock protein 90 inhibitors.
(1) One of the heat shock proteins 90 (PDB: 1YET) was used as the starting point. Through visual analysis of heat shock protein 90 (PDB: 1YET), combined with literature reports, the binding position of an inhibitor on heat shock protein 90 (PDB: 1YET) is defined, 2 hydrogen bond receptors, 2 hydrophobic centers and 2 hydrogen bond donors form a pharmacophore model, the weights of the pharmacophore model are all 1, and the pharmacophore model is integrated into a target characteristic evaluation module.
(2) Combining the pharmacophore models defined in the step (1) into a pharmacophore evaluation module, adding a molecular weight module, and constraining the molecular weight to be lower than 500. In order to evaluate the compound more reasonably, a molecular docking scoring module (adopting Autodock docking) is connected, molecular docking is carried out on the compound, and the opposite number of a scoring score of the molecular docking is used as an evaluation score. The evaluation function outputs a final score by means of arithmetic weighted average. Wherein the weight of the pharmacophore scoring module is 3, the weight of the molecular docking scoring module is 5, and the weight of the molecular weight module is 10.
(3) And (5) using an AI model management subsystem to perform 1000 rounds of intensive training on the AI model.
(4) Inputting a sampling quantity parameter of 100 ten thousand in a large-scale sampling subsystem, generating 100 ten thousand compounds, removing duplicate ineffective and repeated compounds, finally obtaining more than 20 ten thousand compounds, filtering the compounds by setting four rules of hydrogen bond donor number not more than 5, hydrogen bond acceptor number not more than 10, molecular weight less than 500 and lipid water distribution coefficient not more than 5, removing an inhibitor containing a reported skeleton, obtaining more than 8 thousand compounds, and forming a compound library.
(5) The Tanimoto algorithm is used to calculate the similarity of compound molecular fingerprints (ECFP4), and more than 500 compounds which are most similar to the ligand in the heat shock protein 90 crystal complex are found from the compound library, and more than 30 candidate compounds are screened from the compound library by using molecular docking and molecular dynamics simulation.
(6) The system records the candidate compounds obtained by screening.
In light of the foregoing description of the preferred embodiments according to the present application, it is to be understood that various changes and modifications may be made without departing from the spirit and scope of the invention. The technical scope of the present application is not limited to the contents of the specification, and must be determined according to the scope of the claims.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (10)

1. A system for virtual drug screening of crystalline complexes, comprising: the system comprises a visualization subsystem, an evaluation tool box system, an AI model management subsystem, a large-scale sampling subsystem, a virtual screening subsystem and a data log storage subsystem; the virtual drug screening system screens out a batch of candidate compounds meeting the requirements from a known crystal compound through a visualization subsystem, an evaluation tool box system, an AI model management subsystem, a large-scale sampling subsystem and a virtual screening subsystem in sequence.
The visualization subsystem is used for checking the binding position of the ligand in the crystal compound in the protein, analyzing the binding mode of the ligand and the protein and extracting the characteristic of enhancing the affinity of the drug to the protein.
The evaluation kit system is packaged with a plurality of compound evaluation modules and used for designing an evaluation function by selecting the plurality of compound evaluation modules and giving proper weight;
the AI model management subsystem is used for AI model, AI model training and AI model parameter updating; the AI model generates a neural network system of the compound; the AI model parameters are parameters of the neural network system; the AI model itself can randomly generate compounds;
the large-scale sampling subsystem is used for sampling and screening the trained AI model to obtain a compound library consisting of corresponding compounds;
the virtual screening subsystem is used for further screening the compounds in the compound library;
the data log storage subsystem is used for establishing and storing a log information document of a user; the log information document is used for recording operation records of a user and generating corresponding data.
2. The virtual drug screening system of claim 1, wherein the enhanced affinity of the drug for the protein is characterized by hydrogen bonding and/or hydrophobic interactions.
3. The virtual drug screening system of claim 1, wherein the merit function is a weighted arithmetic mean, a weighted geometric mean, or a user-defined function.
4. The virtual drug screening system of claim 1, wherein the AI model management subsystem includes AI models, AI model training, and updating of AI model parameters;
the AI model generates a neural network system of the compound;
the AI model parameters are parameters of the neural network system; the AI model itself enables random generation of compounds.
5. The virtual drug screening system of claim 1, wherein the filtration conditions include the number of heavy atoms of the compound, the number of hydrogen bond donors, the number of hydrogen bond acceptors, the backbone structure, false positives, and compounds already reported in the prior patent literature.
6. The virtual drug screening system of claim 1 wherein the data log storage subsystem further comprises functionality to specify user privileges.
7. A screening method using the virtual drug screening system according to claim 1, comprising the steps of:
step A: defining the binding characteristics of the ligand in the crystal complex through the analysis of the visualization subsystem, downloading the crystal complex structure of the target from a protein crystal structure database by a user, visually checking the binding position of the ligand in the protein, analyzing the binding mode of the ligand and the protein, and extracting the characteristics for enhancing the affinity of the drug to the protein;
B, inputting the compound into the evaluation tool box system, wherein each compound evaluation module in the evaluation tool box system outputs a score, and the scores are integrated into a comprehensive score through an evaluation function;
and C, forming a complete evaluation pipeline by the visualization subsystem and the evaluation tool box system, starting an AI model through the AI model management subsystem, and starting training.
Step D, the large-scale sampling subsystem receives a sampling quantity parameter input by a user, samples the trained AI model to generate a specified quantity of compounds, deletes unreasonable and repeated compounds, then inputs filtering conditions by the user to eliminate compounds which do not meet the requirements, and the rest compounds form a compound library;
step E, the virtual screening subsystem further screens the compounds in the compound library;
and step F, the data log storage subsystem establishes and stores the log information document of the user when the user uses the system to design the medicine.
8. The method according to claim 7, wherein in the step C, the AI model outputs the compound generated by the AI model to the evaluation pipeline through interaction with the evaluation pipeline, collects the score of the compound output by the evaluation pipeline, and automatically updates the AI model parameters; after the process is repeated for many times, the compound generated by the AI model can obtain a higher score in the evaluation pipeline; after the AI model training is completed, the AI model parameters are also optimized to suitable values.
9. The method of claim 7, wherein the step E comprises the steps of:
protein pretreatment, namely downloading a protein pdb file of a compound from a pdb library, performing protein pretreatment operation, and performing pretreatment on the protein by deleting water molecules, hydrogenation and deleting irrelevant ligands to define sites needing butt joint;
performing compound conformation optimization operation, and searching the conformation of the compound with the lowest energy by adopting a genetic algorithm after the compound generates a 3D conformation;
performing molecular docking, namely performing molecular docking, sorting in a descending order according to molecular docking scores, and selecting compounds which are 5-15% of the top rank;
and (3) molecular dynamics simulation, namely performing molecular dynamics simulation on the selected compounds, and screening the compounds meeting the conditions from the compound library according to the simulation result.
10. The method according to claim 7, wherein in the evaluation function, a weight is respectively set for each score: w is a1,w2,w3,……wnForming an evaluation function, said evaluation function arithmetically weighted average:
Figure FDA0002557729590000031
or a geometrically weighted average:
Figure FDA0002557729590000032
CN202010597114.4A 2020-06-28 2020-06-28 Medicine virtual screening system and method for crystal compound Active CN111863120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010597114.4A CN111863120B (en) 2020-06-28 2020-06-28 Medicine virtual screening system and method for crystal compound

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010597114.4A CN111863120B (en) 2020-06-28 2020-06-28 Medicine virtual screening system and method for crystal compound

Publications (2)

Publication Number Publication Date
CN111863120A true CN111863120A (en) 2020-10-30
CN111863120B CN111863120B (en) 2022-05-13

Family

ID=72988558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010597114.4A Active CN111863120B (en) 2020-06-28 2020-06-28 Medicine virtual screening system and method for crystal compound

Country Status (1)

Country Link
CN (1) CN111863120B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992289A (en) * 2021-03-24 2021-06-18 北京晶派科技有限公司 Construction method and system of small molecule kinase inhibitor screening molecule library
WO2022120646A1 (en) * 2020-12-09 2022-06-16 深圳智药科技有限公司 Crystal space structure transformation method and system
CN114678082A (en) * 2022-03-08 2022-06-28 南昌立德生物技术有限公司 Computer-aided virtual high-throughput screening algorithm
WO2023123149A1 (en) * 2021-12-30 2023-07-06 深圳晶泰科技有限公司 Virtual molecule screening system and method, electronic device, and computer-readable storage medium
CN116864036A (en) * 2023-08-02 2023-10-10 山东政法学院 Compound library construction method based on artificial intelligence

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1673626A1 (en) * 2003-10-14 2006-06-28 Verseon Method and apparatus for analysis of molecular combination based on computations of shape complementarity using basis expansions
CN101149327A (en) * 2007-11-06 2008-03-26 浙江大学 Antineoplastic drug evaluation and screening method based on cell microscopic image information
US20090012767A1 (en) * 2006-01-20 2009-01-08 Dmitry Gennadievich Tovbin Method for Selecting Potential Medicinal Compounds
CN102142064A (en) * 2011-04-21 2011-08-03 华东师范大学 Biomolecular network exhibition analysis system and analysis method thereof
CN103049675A (en) * 2013-01-26 2013-04-17 北京东方灵盾科技有限公司 Traditional drug toxicity evaluation method and system thereof
EP2929453A1 (en) * 2012-12-06 2015-10-14 Clarient Diagnostic Services, Inc. Selection and display of biomarker expressions
EP2977923A1 (en) * 2013-03-19 2016-01-27 Fujitsu Limited Program for designing compound, device for designing compound, and method for designing compound
CN107325964A (en) * 2017-06-20 2017-11-07 西安交通大学 A kind of three-dimensional medicaments sifting model of instant high flux and preparation method
US20180357377A1 (en) * 2017-06-13 2018-12-13 Alexander Bagaev Systems and methods for generating, visualizing and classifying molecular functional profiles
CN109411024A (en) * 2018-11-08 2019-03-01 辽宁石油化工大学 A kind of modeling method of dislocation ring atomic structure
WO2019084315A1 (en) * 2017-10-26 2019-05-02 Zymergen Inc. Device-agnostic system for planning and executing high-throughput genomic manufacturing operations
CN110121747A (en) * 2016-10-28 2019-08-13 伊鲁米那股份有限公司 For executing the bioinformatics system, apparatus and method of second level and/or tertiary treatment
US20190295685A1 (en) * 2016-07-07 2019-09-26 Cornell University Computational analysis for predicting binding targets of chemicals
US20190304568A1 (en) * 2018-03-30 2019-10-03 Board Of Trustees Of Michigan State University System and methods for machine learning for drug design and discovery
WO2019202292A1 (en) * 2018-04-20 2019-10-24 DrugAI Limited Interaction property prediction system and method
CN110459263A (en) * 2019-06-27 2019-11-15 青岛海洋科学与技术国家实验室发展中心 A kind of virtual drug screening method based on BFGS algorithm
US20200143903A1 (en) * 2017-04-18 2020-05-07 X-Chem, Inc. Methods for identifying compounds
CN111192638A (en) * 2019-12-31 2020-05-22 四川大学 High-dimensional low-sample gene data screening and protein network analysis method and system
CN111221562A (en) * 2019-12-31 2020-06-02 深圳晶泰科技有限公司 Medicine research and development software warehouse and software package management system thereof
CN111261224A (en) * 2020-03-10 2020-06-09 苏州科技大学 Virtual screening method of targeting IKK β medicine

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1673626A1 (en) * 2003-10-14 2006-06-28 Verseon Method and apparatus for analysis of molecular combination based on computations of shape complementarity using basis expansions
US20090012767A1 (en) * 2006-01-20 2009-01-08 Dmitry Gennadievich Tovbin Method for Selecting Potential Medicinal Compounds
CN101149327A (en) * 2007-11-06 2008-03-26 浙江大学 Antineoplastic drug evaluation and screening method based on cell microscopic image information
CN102142064A (en) * 2011-04-21 2011-08-03 华东师范大学 Biomolecular network exhibition analysis system and analysis method thereof
EP2929453A1 (en) * 2012-12-06 2015-10-14 Clarient Diagnostic Services, Inc. Selection and display of biomarker expressions
CN103049675A (en) * 2013-01-26 2013-04-17 北京东方灵盾科技有限公司 Traditional drug toxicity evaluation method and system thereof
EP2977923A1 (en) * 2013-03-19 2016-01-27 Fujitsu Limited Program for designing compound, device for designing compound, and method for designing compound
US20190295685A1 (en) * 2016-07-07 2019-09-26 Cornell University Computational analysis for predicting binding targets of chemicals
CN110121747A (en) * 2016-10-28 2019-08-13 伊鲁米那股份有限公司 For executing the bioinformatics system, apparatus and method of second level and/or tertiary treatment
US20200143903A1 (en) * 2017-04-18 2020-05-07 X-Chem, Inc. Methods for identifying compounds
US20180357377A1 (en) * 2017-06-13 2018-12-13 Alexander Bagaev Systems and methods for generating, visualizing and classifying molecular functional profiles
CN107325964A (en) * 2017-06-20 2017-11-07 西安交通大学 A kind of three-dimensional medicaments sifting model of instant high flux and preparation method
WO2019084315A1 (en) * 2017-10-26 2019-05-02 Zymergen Inc. Device-agnostic system for planning and executing high-throughput genomic manufacturing operations
US20190304568A1 (en) * 2018-03-30 2019-10-03 Board Of Trustees Of Michigan State University System and methods for machine learning for drug design and discovery
WO2019202292A1 (en) * 2018-04-20 2019-10-24 DrugAI Limited Interaction property prediction system and method
CN109411024A (en) * 2018-11-08 2019-03-01 辽宁石油化工大学 A kind of modeling method of dislocation ring atomic structure
CN110459263A (en) * 2019-06-27 2019-11-15 青岛海洋科学与技术国家实验室发展中心 A kind of virtual drug screening method based on BFGS algorithm
CN111192638A (en) * 2019-12-31 2020-05-22 四川大学 High-dimensional low-sample gene data screening and protein network analysis method and system
CN111221562A (en) * 2019-12-31 2020-06-02 深圳晶泰科技有限公司 Medicine research and development software warehouse and software package management system thereof
CN111261224A (en) * 2020-03-10 2020-06-09 苏州科技大学 Virtual screening method of targeting IKK β medicine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄琦 等: "基于配体、受体和复合物指纹的虚拟筛选方法比较", 《化学学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022120646A1 (en) * 2020-12-09 2022-06-16 深圳智药科技有限公司 Crystal space structure transformation method and system
CN112992289A (en) * 2021-03-24 2021-06-18 北京晶派科技有限公司 Construction method and system of small molecule kinase inhibitor screening molecule library
CN112992289B (en) * 2021-03-24 2023-06-23 北京晶泰科技有限公司 Method and system for constructing small molecule kinase inhibitor screening molecular library
WO2023123149A1 (en) * 2021-12-30 2023-07-06 深圳晶泰科技有限公司 Virtual molecule screening system and method, electronic device, and computer-readable storage medium
CN114678082A (en) * 2022-03-08 2022-06-28 南昌立德生物技术有限公司 Computer-aided virtual high-throughput screening algorithm
CN116864036A (en) * 2023-08-02 2023-10-10 山东政法学院 Compound library construction method based on artificial intelligence

Also Published As

Publication number Publication date
CN111863120B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN111863120B (en) Medicine virtual screening system and method for crystal compound
WO2021103516A1 (en) System and method for virtual drug screening for crystalline complexes
Yu et al. Translation of genotype to phenotype by a hierarchy of cell subsystems
Li et al. Evaluation of the performance of four molecular docking programs on a diverse set of protein‐ligand complexes
US8296116B2 (en) Bioinformatics system
Green et al. BRADSHAW: a system for automated molecular design
KR20190077372A (en) Phenotype / disease-specific gene grading using prepared gene libraries and network-based data structures
Mohammadi et al. Automated design of synthetic cell classifier circuits using a two-step optimization strategy
US20160203256A1 (en) Inter-class molecular association connectivity mapping
CN114203269B (en) Anticancer traditional Chinese medicine screening method based on machine learning and molecular docking technology
CN115050428A (en) Drug property prediction method and system based on deep learning fusion molecular graph and fingerprint
CN114678082A (en) Computer-aided virtual high-throughput screening algorithm
CN115132270A (en) Drug screening method and system
Mooers et al. Phylogenetic noise leads to unbalanced cladistic tree reconstructions
Poli et al. Consensus docking in drug discovery
WO2007038414A2 (en) Mining protein interaction networks
Bacardit et al. Hard data analytics problems make for better data analysis algorithms: bioinformatics as an example
CN114842924A (en) Optimized de novo drug design method
Dagur et al. Virtual screening of phytochemicals for drug discovery
Ma et al. Deep Learning Model of Dock by Dock Process Significantly Accelerate the Process of Docking-based Virtual Screening
KR102219140B1 (en) Method, apparatus, and program of expansion of biochemical pathway
Mani‐Varnosfaderani et al. CS‐MINER: a tool for association mining in Binding‐Database
Wallace Structure generation and de novo design using reaction networks
Stiel et al. Identification of protein scaffolds for enzyme design using scaffold selection
Tayebi et al. Simulating Tumor Evolution from scDNA-Seq as an Accumulation of both SNVs and CNAs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant