WO2021103516A1 - 晶体复合物的药物虚拟筛选系统及方法 - Google Patents

晶体复合物的药物虚拟筛选系统及方法 Download PDF

Info

Publication number
WO2021103516A1
WO2021103516A1 PCT/CN2020/098530 CN2020098530W WO2021103516A1 WO 2021103516 A1 WO2021103516 A1 WO 2021103516A1 CN 2020098530 W CN2020098530 W CN 2020098530W WO 2021103516 A1 WO2021103516 A1 WO 2021103516A1
Authority
WO
WIPO (PCT)
Prior art keywords
compounds
model
subsystem
compound
evaluation
Prior art date
Application number
PCT/CN2020/098530
Other languages
English (en)
French (fr)
Inventor
杨立君
徐旻
张佩宇
马健
温书豪
赖力鹏
Original Assignee
深圳晶泰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳晶泰科技有限公司 filed Critical 深圳晶泰科技有限公司
Priority to PCT/CN2020/098530 priority Critical patent/WO2021103516A1/zh
Priority to US17/427,103 priority patent/US20220130487A1/en
Publication of WO2021103516A1 publication Critical patent/WO2021103516A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • G06N3/105Shells for specifying net layout
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/62Design of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/80Data visualisation

Definitions

  • This application belongs to the technical field of computer-aided drug design, in particular to a virtual drug screening system and method involving crystal complexes.
  • Drug patents take into account traditional new drug design strategies and protect the structure of compounds that may be obtained by applying traditional drug design strategies, making it difficult for latecomers to obtain new drugs through simple substitutions.
  • the purpose of the present invention is to provide a virtual drug screening system for crystal complexes.
  • This method can effectively solve the problem of traditional new drug design strategies that are difficult to obtain new skeletons, break the barriers of existing compound patents, and at the same time, generate compounds Compared with the traditional compound library, the library has more target specificity.
  • a virtual drug screening system for crystal complexes including: a visualization subsystem, an evaluation tool box system, an AI model management subsystem, a large-scale sampling subsystem, a virtual screening subsystem, and a data log storage subsystem; the drug virtual screening system Starting from a known crystal complex, through the visualization subsystem, evaluation tool box system, AI model management subsystem, large-scale sampling subsystem, and virtual screening subsystem, a batch of candidate compounds that meet the requirements are recommended.
  • the visualization subsystem is used to view the binding position of the ligand in the protein in the crystal complex, analyze the binding mode of the ligand and the protein, and extract features that enhance the affinity of the drug to the protein.
  • the evaluation tool box system encapsulates a plurality of compound evaluation modules, and is used to design an evaluation function by selecting a plurality of compound evaluation modules and assigning appropriate weights;
  • the AI model management subsystem is used for AI model, AI model training, and AI model parameter update;
  • the large-scale sampling subsystem is used to sample and screen the trained AI model to obtain a compound library composed of corresponding compounds;
  • the virtual screening subsystem is used for further screening of compounds in the compound library
  • the data log storage subsystem is used to establish and store the user's log information file; the log information file is used to record the user's operation record and generate corresponding data.
  • the present invention adopts the above technical solution, and its advantage is that the user can define the key characteristics of the drug by analyzing the binding mode of the ligand in the crystal complex, and set the physical and chemical properties that the candidate compound should have.
  • the AI model updates the parameters according to user-defined requirements, and generates a batch of compounds that meet the conditions. These compounds are sorted into a compound library after conditional filtering. Virtually screen the compounds in the compound library, and finally get a batch of candidate compounds.
  • the functional structure and process of the system are shown in Figure 1.
  • the feature of enhancing the affinity of the drug to the protein is hydrogen bonding and/or hydrophobic interaction.
  • the evaluation function is a weighted arithmetic mean, a weighted geometric mean, or a user-defined function.
  • the AI model management subsystem includes an AI model, AI model training, and AI model parameter update.
  • the AI model is a neural network system for generating compounds; the AI model parameters are the parameters of the neural network system; the AI model itself can generate compounds randomly.
  • the filtering conditions include the number of heavy atoms of the compound, the number of hydrogen bond donors, the number of hydrogen bond acceptors, skeleton structure, false positives, and compounds that have been reported in existing patent documents.
  • the data log storage subsystem further includes a function of regulating user rights.
  • the present invention provides a screening method using the drug virtual screening system, which includes the following steps:
  • Step A Define the binding characteristics of the ligand in the crystal complex through the analysis of the visualization subsystem.
  • the user downloads the crystal complex structure of the target from the protein crystal structure database, and visualizes the binding position of the ligand in the protein , Analyze the binding mode of the ligand and the protein, and extract the features that enhance the affinity of the drug to the protein;
  • Step B Input the compound into the evaluation tool box system, and each compound evaluation module in the evaluation tool box system will output a score, which is then integrated into a comprehensive score through the evaluation function;
  • Step C The visualization subsystem and the evaluation tool box system form a complete evaluation pipeline, and the AI model is started through the AI model management subsystem to start training.
  • Step D The large-scale sampling subsystem accepts a sampling quantity parameter input by the user, samples the trained AI model, generates a specified number of compounds, deletes unreasonable and repetitive compounds, and then the user inputs filter conditions to eliminate non-compliant The required compounds, and the remaining compounds form a compound library;
  • Step E The virtual screening subsystem further screens the compounds in the compound library
  • Step F The data log storage subsystem creates and stores the user's log information file when the user uses the system to design drugs.
  • step A the specific steps of step A are: the user downloads the crystal complex structure of the target from the protein crystal structure database, visually view the binding position of the ligand in the protein, analyze the binding mode of the ligand and the protein, and extract the hydrogen bond interaction , Hydrophobic interaction and other features that may enhance the affinity of the drug to the protein. Users can assign appropriate weights to each important feature according to the important features of the drug's activity on the interface, and finally integrate it into a pharmacophore evaluation module. When a compound is input to the pharmacophore evaluation module, the evaluation module outputs a score by evaluating the degree of matching of the compound with important characteristics.
  • the binding characteristics of the ligand can be obtained through the analysis of the visualization subsystem, the binding characteristics of the crystal complexes that have been reported in the relevant literature, or the binding characteristics of the ligands that have been reported in the literature and the analysis of the visualization subsystem.
  • the compound evaluation module includes: substructure alert, selectivity prediction, activity prediction, structural similarity, molecular weight, number of rotating bonds, number of hydrogen bond donors, number of hydrogen bond acceptors, number of rings, molecular docking score, FEP prediction Value, pharmacophore score, fat-water partition coefficient value, compound toxicity prediction evaluation module.
  • the compound evaluation module in the evaluation tool box system contains the compound evaluation module of various properties such as the conformational characteristics, physical properties, chemical properties, pharmacokinetic properties, and structural novelty of the compound.
  • the AI model outputs the compounds generated by the AI model to the evaluation pipeline through interaction with the evaluation pipeline, collects the scores of the compounds output by the evaluation pipeline, and automatically updates the AI model parameters; this process is repeated many times After the second time, the compound generated by the AI model will get a higher score in the evaluation pipeline; after the AI model training is completed, the AI model parameters are also optimized to suitable values.
  • the step E includes the following steps:
  • Step E1 Download the protein pdb file of the compound from the pdb library, perform protein pretreatment operations, delete water molecules, hydrogenate the protein, etc., delete irrelevant ligands, and define the pretreatment of the site that needs to be docked;
  • Step E2 Carry out the compound conformation optimization operation, after generating the 3D conformation of the compound, use the genetic algorithm to search for the conformation of the compound in the lowest energy;
  • Step E3 Perform molecular docking, arrange in descending order according to the molecular docking score, and select the top 5%-15% compounds;
  • Step E4 Perform molecular dynamics simulation on the compound selected in Step E3, and screen out qualified compounds from the compound library according to the simulation results.
  • a weight is set for each score: w 1 , w 2 , w 3 ,...w n to form an evaluation function, and the evaluation function is arithmetic weighted average: Or geometrically weighted average:
  • the data log storage subsystem the system will create and store the user's log information file when the user uses the system to design drugs; the log information file records the user's operation records and generates corresponding data;
  • the data log storage subsystem also includes the function of standardizing user permissions.
  • the system groups users according to different R&D pipelines, and each user has different permissions for data and logs of various projects.
  • the design of the evaluation pipeline is adopted to make the AI model generate compounds that meet specific needs.
  • the generated compound library has more target specificity.
  • Figure 1 is the functional structure and flow chart of the virtual drug screening system for crystal complexes
  • Figure 2 is a flow chart of the crystal complex drug virtual screening system taking the PARP crystal complex as an example.
  • Figure 3 is a schematic diagram of the evaluation pipeline, from a compound input, and finally a final score is returned by the evaluation function.
  • Adenosine polydiphosphate ribose polymerase (PARP) participates in the repair of bases by catalyzing the ribosylation of ADP, and plays an important role in the repair of single-stranded DNA damage in cells. It is one of the targets of anticancer drugs.
  • PARP1 is a subtype of PARP and one of the targets for the treatment of triple-negative breast cancer. Starting from the crystal complex of PARP1, follow the steps shown in the process (as shown in Figure 2) to design the drug.
  • the system automatically records the user's operation records and candidate compounds generated and sorts and stores them.
  • Alzheimer's disease is a representative degenerative disease of the central nervous system.
  • Acetylcholinesterase is one of the important targets. Taking the crystal complex of acetylcholinesterase and its inhibitors as a starting point, looking for inhibitors with a new skeleton.
  • one of the crystal complexes (PDB: 4EY7) is used as a starting point.
  • the ligand was located, and 5 key pharmacophore characteristics were determined. These characteristics include 2 hydrogen bond receptors and 2 aromatic ring characteristics. , 1 hydrophobic feature, the weight given to the pharmacophore feature is 1, integrated into a target feature evaluation module.
  • step (2) Use the pharmacophore model defined in step (1) to combine into a pharmacophore evaluation module, which also supplements the two modules of substructure alarm and structural similarity.
  • a known acetylcholinesterase inhibitor skeleton was collected from the literature as a substructure. Enter these substructures into the substructure alert to determine whether the resulting compound contains the known backbone of the inhibitor.
  • the original ligand in the crystal complex is used as the template molecule, and the similarity between the generated molecule and the template molecule is calculated based on the molecular fingerprint.
  • the evaluation function uses arithmetic weighted average to output a final score. Among them, the weight of the pharmacophore scoring module is 5, the weight of the sub-structure alarm module is 10, and the weight of the structural similarity module is 3.
  • Heat shock protein 90 is a new target of anti-tumor drugs discovered in recent years. Inhibitors of heat shock protein 90 can destroy the structure of the protein in the body and the degradation process to play an anti-tumor effect. After the crystal structure of heat shock protein 90 was published, computer-aided drug design became the mainstream for the development of new heat shock protein 90 inhibitors. This example tried to start with the crystal complex of heat shock protein 90, and recommended a batch of new heat shock protein 90 inhibitors.
  • step (2) Use the pharmacophore model defined in step (1) to combine into a pharmacophore evaluation module, add the molecular weight module, and restrict the molecular weight to be less than 500.
  • a molecular docking scoring module (using Autodock docking) is connected, and the compound is molecularly docked, and the opposite number of the docking score of the molecular docking is used as the evaluation score.
  • the evaluation function uses arithmetic weighted average to output a final score. Among them, the weight of the pharmacophore scoring module is 3, the weight of the molecular docking scoring module is 5, and the weight of the molecular weight module is 10.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Library & Information Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

一种晶体复合物的药物虚拟筛选系统,包括可视化子系统、评价工具箱子系统、AI模型管理子系统、大规模采样子系统、虚拟筛选子系统和数据日志存储子系统;该系统从一个已知的晶体复合物开始,依次通过可视化子系统、评价工具箱子系统、AI模型管理子系统、大规模采样子系统、虚拟筛选系统后,推荐出一批符合要求的候选化合物。基于该系统,化合物库的生成与后续的虚拟筛选做到了有机结合,用户只要描述对药物对蛋白的作用模式和药物需要具备的要求,即可生成一批符合预期的化合物。自动化的系统减少了用户的干预,提高了研发的效率。

Description

晶体复合物的药物虚拟筛选系统及方法 技术领域
本申请属于计算机辅助药物设计技术领域,尤其是涉及晶体复合物的药物虚拟筛选系统及方法。
背景技术
在传统的药物研发中,早期高通量筛选获取到药物与蛋白的晶体复合物后,分析作用模式,根据生物电子等排原理和药物设计经验,对已有化合物的结构进行替换得到新的化合物。传统的研发手段有:生物电子等排替换、分子对接、骨架跃迁、虚拟筛选。
总体来说,这些技术在常见的药物设计软件MOE、Maestro、Discovery Studio等商业软件中已经具备,满足常规药物研发的需要。
但是,随着当前药物化学理论、有机化学的合成手段的发展,当发现一个具有潜力的苗头化合物时,药物研究机构通常会深入研究可能的取代基团,合成测试衍生物的活性,最后得到一个充分完善的构效关系。这使得后续研究者几乎不能获得相同骨架的新药。
药物专利考虑到传统的新药设计策略,会保护可能应用传统药物设计策略得到的化合物结构,导致后来者难以通过简单的替换获得新药。
分子对接、药效团模型等传统方法严重依赖于选取的化合物库。当前的化合物库具有的分子量级通常在几十万,发布多年的化合物库已经被前人多次探索,化合物数量少且难有新颖骨架。使用AI生成的化合物一次性就能产生几十万的化合物,具有更加广阔的探索空间。
发明内容
针对上述技术问题,本发明的目的在于提供晶体复合物的药物虚拟筛选系统,这种方法可以有效解决传统新药设计策略难以获得新骨架的问题,打破现有化合物专利的壁垒,同时,生成的化合物库与传统化合物库相比,更具有靶点特异性。
为实现上述目的,本发明的技术方案如下:
一种晶体复合物的药物虚拟筛选系统,包括:可视化子系统、评价工具箱子系统、AI模型管理子系统、大规模采样子系统、虚拟筛选子系统和数据日志存储子系统;该药物虚拟筛选系统从一个已知的晶体复合物开始, 依次通过可视化子系统、评价工具箱子系统、AI模型管理子系统、大规模采样子系统、虚拟筛选子系统后,推荐出一批符合要求的候选化合物。
所述可视化子系统,用于查看晶体复合物中配体在蛋白中的结合位置,分析配体与蛋白的结合模式,提取增强药物对蛋白亲和力的特征。
所述评价工具箱子系统,封装有多个化合物评价模块,用于通过选择多个化合物评价模块并赋予适当的权重设计出评价函数;
所述AI模型管理子系统,用于AI模型、AI模型训练和AI模型参数的更新;
所述大规模采样子系统,用于对训练后的AI模型进行采样、筛选,得到相应的化合物组成的化合物库;
所述虚拟筛选子系统,用于对所述化合物库中的化合物进行进一步筛选;
所述数据日志存储子系统,用于建立用户的日志信息文档并进行存储;所述日志信息文档用于记录用户的操作记录和产生相应的数据。
本发明采用以上技术方案,其优点在于,用户通过分析配体在晶体复合物的结合模式,定义出药物的关键特征,设置候选化合物应当具有的理化性质。AI模型根据用户定义的要求,更新参数,生成一批满足条件的化合物。这些化合物经过条件过滤后被整理成一个化合物库。虚拟筛选化合物库中的化合物,最后得到一批候选化合物。系统的功能结构及流程见图1。
优选的,所述增强药物对蛋白亲和力的特征为氢键作用和/或疏水相互作用。
优选的,所述评价函数为加权算术平均数、加权几何平均数或者用户自定义的函数。
优选的,所述AI模型管理子系统包括AI模型、AI模型训练和AI模型参数的更新。
优选的,所述AI模型,生成化合物的神经网络系统;所述AI模型参数就是神经网络系统的参数;AI模型本身能随机生成化合物。
优选的,所述过滤条件包括化合物的重原子数、氢键供体数量、氢键受体数量、骨架结构、假阳性,以及现有专利文献已经报道的化合物。
优选的,所述数据日志存储子系统还包括规范用户权限的功能。
相应的,本发明提供一种利用所述药物虚拟筛选系统的筛选方法,包括如下步骤:
步骤A:通过所述可视化子系统的分析来定义晶体复合物中配体的结合特征,用户从蛋白质晶体结构数据库中下载靶点的晶体复合物结构,通过可视化查看配体在蛋白中的结合位置,分析配体与蛋白的结合模式,提取增强药物对蛋白亲和力的特征;
步骤B:将化合物输入所述评价工具箱子系统中,所述评价工具箱子系统中的各个化合物评价模块会输出一个分数,再通过评价函数整合成一个综合的分数;
步骤C:将可视化子系统和评价工具箱子系统组成一个完整的评价管道,通过所述AI模型管理子系统启动AI模型,开始训练。
步骤D:所述大规模采样子系统接受用户输入的一个采样数量参数,对训练后的AI模型进行采样,生成指定数量的化合物,删除不合理、重复的化合物,接着用户输入过滤条件淘汰不符合要求的化合物,剩余的化合物组成一个化合物库;
步骤E:所述虚拟筛选子系统对所述化合物库中的化合物进行进一步筛选;
步骤F:所述数据日志存储子系统,在用户使用该系统设计药物时,建立用户的日志信息文档并进行存储。
其中,步骤A的具体步骤是:用户从蛋白质晶体结构数据库中下载靶点的晶体复合物结构,通过可视化查看配体在蛋白中的结合位置,分析配体与蛋白的结合模式,提取氢键作用、疏水相互作用等可能增强药物对蛋白亲和力的特征。用户可以在界面上根据药物发挥活性的重要特征,并赋予每一项重要特征适当的权重,最后整合成一个药效团评价模块。当一个化合物输入到药效团评价模块时,该评价模块通过评价化合物与重要特征的匹配程度,输出一个分数。
其中,所述配体的结合特征可以通过可视化子系统分析获得,还可以通过相关文献已经报道的晶体复合物结合特征获得,也可以通过结合可视化子系统分析和文献已经报道的配体特征获得。
所述的化合物评价模块包括:子结构警报、选择性预测、活性预测、结构相似性、分子量、旋转键数量、氢键供体数量、氢键受体数量、环数量、分子对接打分、FEP预测值、药效团打分、脂水分配系数值、化合物 毒性预测评价模块。
评价工具箱子系统中的化合物评价模块包含了化合物的构象特征、物理性质、化学性质、药物代谢动力学性质、结构新颖性等多个方面性质的化合物评价模块。
优选的,所述步骤C中,所述AI模型通过与评价管道的交互,将AI模型生成的化合物输出到评价管道,收集评价管道输出的化合物的分数,自动更新AI模型参数;该过程重复多次后,AI模型生成的化合物会在评价管道中得到一个较高的分数;AI模型训练完成后,AI模型参数也优化成适合的值。
优选的,所述步骤E包括如下几个步骤:
步骤E1:从pdb库中下载化合物的蛋白pdb文件,进行蛋白预处理操作,对蛋白进行删除水分子、加氢等,删除无关配体,定义需要对接的位点的预处理;
步骤E2:进行化合物构象优化操作,对化合物生成3D构象后,采用遗传算法搜索化合物处于最低能量的构象;
步骤E3:进行分子对接,按照分子对接打分降序排列,选取排名前5%-15%的化合物;
步骤E4:将步骤E3选取的化合物做分子动力学模拟,根据模拟结果,从化合物库中筛选出符合条件的化合物。
优选的,所述评价函数中,为各个分数分别设置了权重:w 1,w 2,w 3,……w n,形成一个评价函数,所述评价函数算术加权平均:
Figure PCTCN2020098530-appb-000001
或者几何加权平均:
Figure PCTCN2020098530-appb-000002
所述的数据日志存储子系统,该系统会在用户使用该系统设计药物时,建立用户的日志信息文档并进行存储;所述的日志信息文档记录了用户的操作记录和产生相应的数据;
所述的数据日志存储子系统还包括规范用户权限的功能,系统会根据研发管线的不同对用户进行分组,每个用户对各个项目的数据、日志的权限也会有所不同。
本发明的有益效果是:
1.在AI模型产生大量化合物的基础上,采用评价管道的设计,令AI模型生成满足特定需求的化合物。生成的化合物库与传统的化合物库相比,更加具有靶点特异性。
2.基于本系统,化合物库的生成与后续的虚拟筛选做到了有机结合,用户只要描述对药物对蛋白的作用模式和药物需要具备的要求,即可生成一批符合预期的化合物。自动化的系统减少了用户的干预,提高了研发的效率。
3.用户在系统的操作、定义的参数和研发生成的分子、都会被记录在系统中,有利于研发的追溯。此外,系统还具有严格的权限管理,确保了数据的安全性。
附图说明
下面结合附图和实施例对本申请的技术方案进一步说明。
图1是晶体复合物的药物虚拟筛选系统的功能结构及流程图;
图2是晶体复合物的药物虚拟筛选系统的以PARP晶体复合物为例的流程图。
图3是评价管道的示意图,从一个化合物输入,最终由评价函数返回一个最终分数。
具体实施方式
实施例1
如图2所示的流程:
多聚二磷酸腺苷核糖聚合酶(PARP)通过催化ADP核糖基化参与碱基的修复,在细胞的单链DNA损伤修复中发挥重要作用,是抗癌药物的靶点之一。PARP1是PARP的一个亚型,是治疗三阴性乳腺癌的靶点之一。从PARP1的晶体复合物开始,按照流程所示的步骤(如图2所示),进行药物设计。
(1)从蛋白质晶体结构数据库中下载PARP1的晶体复合物结构,通过对PARP1的晶体复合物的可视化分析,结合文献报道的结合模式,确定了4个关键药效团特征(一个氢键供体特征、一个氢键受体特征和两个疏水特征),并对4个特征分别赋予权重(权重依次是3、3、2、1)整合成一个药效团特征评价模块。
(2)将关键药效团特征整合成药效团打分模块,加入子结构警报、分子量、旋转键数量、氢键供体数量、氢键受体数量、脂水分配系数值六个模块,评价函数采用算术加权平均的方法组成评价管道。除了药效团打 分模块的权重是3以外,其余模块权重均为1。
(3)开启AI模型管理子系统,对AI模型训练1000轮。
(4)在大规模采样子系统输入采样数量参数700万,对AI模型进行大规模采样,生产700余万个化合物,删除不合理、重复的化合物,最后得到80万余个化合物,设置筛选条件对化合物过滤,对这些化合物进行氢键供体、氢键受体、重原子数等理化性质的过滤,删除含有大环、桥烷等子结构的化合物,最后得到了9万多个化合物。
(5)查找专利,汇总PARP抑制剂已知的骨架。删除含有已知骨架的化合物,得到2000余个化合物并组成化合物库。
(6)将组成的化合物库进行虚拟筛选,处理PARP蛋白并优化化合物的3D构象,对这些化合物做分子对接,并挑出打分排名前5%的化合物,进行分子动力学模拟。
(7)人工查看并挑选化合物的构象,分析动力学模拟的结果,得到一批候选的化合物。
(8)系统自动记录用户的操作记录和产生的候选化合物并进行分类存储。
实施例2
阿尔茨海默病是一种具有代表性的中枢神经系统退行性病变。文献中报道了多个针对阿尔茨海默病的研究发现了多个靶点。乙酰胆碱酯酶是其中一个重要的靶点。以乙酰胆碱酯酶及其抑制剂的晶体复合物作为起点,寻找具有全新骨架的抑制剂。
(1)根据文献的报道,采用其中的一个晶体复合物(PDB:4EY7)作为起点。通过对晶体复合物(PDB:4EY7)的可视化分析,结合文献报道,定位出配体,并确定了5个关键药效团特征,这些特征包括了2个氢键受体,2个芳香环特征、1个疏水特征,赋予药效团特征权重均为1,整合成一个靶点特征评价模块。
(2)使用步骤(1)定义的药效团模型组合成药效团评价模块,还补充了子结构警报、结构相似性两个模块。为了能发现新的骨架,从文献中采集已知的乙酰胆碱酯酶抑制剂骨架作为子结构。将这些子结构输入到子结构警报中,判断生成的化合物是否含有抑制剂的已知骨架。同时,以晶体复合物中的原始配体作为模板分子,依据分子指纹计算生成的分子与模板分子的相似性。评价函数采用算术加权平均的方式输出一个最终分数。 其中,药效团打分模块的权重是5,子结构警报模块的权重是10,结构相似性模块的权重是3。
(3)使用AI模型管理子系统,对AI模型强化训练1000轮。
(4)在大规模采样子系统输入采样数量参数100万,生成100万个化合物。删除无效、重复的化合物,最后得到了8万余个化合物。设置氢键供体数不超过5、氢键受体数不超过10、分子质量低于500以及脂水分配系数不超过5这四条规则过滤化合物,剔除含有已报道骨架的抑制剂,得到3千余个化合物,组成化合物库。
(5)对化合物库的3千余个化合物进行分子对接,筛选出具有符合文献报道的相互作用的分子60余个。
(6)系统记录筛选得到的候选化合物。
实施例3
热休克蛋白90是近几年发现的一个抗肿瘤药物的新靶点,热休克蛋白90的抑制剂能破坏体内蛋白的结构和降解过程起到抗肿瘤的作用。在热休克蛋白90的晶体结构公开后,计算机辅助药物设计成为研发新型热休克蛋白90抑制剂的主流。本实施例尝试以热休克蛋白90的晶体复合物作为起始,推荐一批新型热休克蛋白90抑制剂。
(1)采用其中的一个热休克蛋白90(PDB:1YET)作为起点。通过对热休克蛋白90(PDB:1YET)的可视化分析,结合文献报道,定义抑制剂在热休克蛋白90(PDB:1YET)上的结合位置,定义2个氢键受体、2个疏水中心和2个氢键供体组成药效团模型,这些药效团的权重均为1,整合成一个靶点特征评价模块。
(2)使用步骤(1)定义的药效团模型组合成药效团评价模块,加入分子量模块,约束分子量必须低于500。为了能够更加合理地评价化合物,接入了分子对接打分模块(采用Autodock对接),对化合物做分子对接,采用分子对接的打分docking score的相反数作为评价分数。评价函数采用算术加权平均的方式输出一个最终的分数。其中,药效团打分模块的权重是3,分子对接打分模块的权重是5,分子量模块的权重是10。
(3)使用AI模型管理子系统,对AI模型强化训练1000轮。
(4)在大规模采样子系统输入采样数量参数100万,生成100万个化合物,去重无效、重复的化合物,最后得到了20万余个化合物,设置氢键供体数不超过5、氢键受体数不超过10、分子质量低于500以及脂水分配 系数不超过5这四条规则过滤化合物,剔除含有已报道的骨架的抑制剂,得到8千余个化合物,组成化合物库。
(5)使用Tanimoto算法计算化合物分子指纹(ECFP4)相似度,从化合物库中找出与热休克蛋白90晶体复合物中的配体最相似的化合物500余个。使用分子对接和分子动力学模拟从中筛选出30余个候选化合物。
(6)系统记录筛选得到的候选化合物。
以上述依据本申请的理想实施例为启示,通过上述的说明内容,相关工作人员完全可以在不偏离本项申请技术思想的范围内,进行多样的变更以及修改。本项申请的技术性范围并不局限于说明书上的内容,必须要根据权利要求范围来确定其技术性范围。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的 功能的步骤。

Claims (10)

  1. 一种晶体复合物的药物虚拟筛选系统,其特征在于,包括:可视化子系统、评价工具箱子系统、AI模型管理子系统、大规模采样子系统、虚拟筛选子系统和数据日志存储子系统;该药物虚拟筛选系统从一个已知的晶体复合物开始,依次通过可视化子系统、评价工具箱子系统、AI模型管理子系统、大规模采样子系统、虚拟筛选子系统后,筛选出一批符合要求的候选化合物。
    所述可视化子系统,用于查看晶体复合物中配体在蛋白中的结合位置,分析配体与蛋白的结合模式,提取增强药物对蛋白亲和力的特征。
    所述评价工具箱子系统,封装有多个化合物评价模块,用于通过选择多个化合物评价模块并赋予适当的权重设计出评价函数;
    所述AI模型管理子系统,用于AI模型、AI模型训练和AI模型参数的更新;所述AI模型,生成化合物的神经网络系统;所述AI模型参数就是神经网络系统的参数;AI模型本身能随机生成化合物;
    所述大规模采样子系统,用于对训练后的AI模型进行采样、筛选,得到相应的化合物组成的化合物库;
    所述虚拟筛选子系统,用于对所述化合物库中的化合物进行进一步筛选;
    所述数据日志存储子系统,用于建立用户的日志信息文档并进行存储;所述日志信息文档用于记录用户的操作记录和产生相应的数据。
  2. 如权利要求1所述的药物虚拟筛选系统,其特征在于,所述增强药物对蛋白亲和力的特征为氢键作用和/或疏水相互作用。
  3. 如权利要求1所述的药物虚拟筛选系统,其特征在于,所述评价函数为加权算术平均数、加权几何平均数或者用户自定义的函数。
  4. 如权利要求1所述的药物虚拟筛选系统,其特征在于,所述AI模型管理子系统包括AI模型、AI模型训练和AI模型参数的更新;
    所述AI模型,生成化合物的神经网络系统;
    所述AI模型参数就是神经网络系统的参数;AI模型本身能随机生成化合物。
  5. 如权利要求1所述的药物虚拟筛选系统,其特征在于,所述过滤条件包括化合物的重原子数、氢键供体数量、氢键受体数量、骨架结构、假 阳性,以及现有专利文献已经报道的化合物。
  6. 如权利要求1所述的药物虚拟筛选系统,其特征在于,所述数据日志存储子系统还包括规范用户权限的功能。
  7. 一种利用如权利要求1所述的药物虚拟筛选系统的筛选方法,其特征在于,包括如下步骤:
    步骤A:通过所述可视化子系统的分析来定义晶体复合物中配体的结合特征,用户从蛋白质晶体结构数据库中下载靶点的晶体复合物结构,通过可视化查看配体在蛋白中的结合位置,分析配体与蛋白的结合模式,提取增强药物对蛋白亲和力的特征;
    步骤B:将化合物输入所述评价工具箱子系统中,所述评价工具箱子系统中的各个化合物评价模块会输出一个分数,再通过评价函数整合成一个综合的分数;
    步骤C:将可视化子系统和评价工具箱子系统组成一个完整的评价管道,通过所述AI模型管理子系统启动AI模型,开始训练。
    步骤D:所述大规模采样子系统接受用户输入的一个采样数量参数,对训练后的AI模型进行采样,生成指定数量的化合物,删除不合理、重复的化合物,接着用户输入过滤条件淘汰不符合要求的化合物,剩余的化合物组成一个化合物库;
    步骤E:所述虚拟筛选子系统对所述化合物库中的化合物进行进一步筛选;
    步骤F:所述数据日志存储子系统,在用户使用该系统设计药物时,建立用户的日志信息文档并进行存储。
  8. 如权利要求7所述的方法,其特征在于,所述步骤C中,所述AI模型通过与评价管道的交互,将AI模型生成的化合物输出到评价管道,收集评价管道输出的化合物的分数,自动更新AI模型参数;该过程重复多次后,AI模型生成的化合物会在评价管道中得到一个较高的分数;AI模型训练完成后,AI模型参数也优化成适合的值。
  9. 如权利要求7所述的方法,其特征在于,所述步骤E包括如下几个步骤:
    蛋白预处理:从pdb库中下载化合物的蛋白pdb文件,进行蛋白预处理操作,对蛋白进行删除水分子、加氢、删除无关配体,定义需要对接的 位点的预处理;
    化合物构象优化:进行化合物构象优化操作,对化合物生成3D构象后,采用遗传算法搜索化合物处于最低能量的构象;
    分子对接:进行分子对接,按照分子对接打分降序排列,选取排名前5%-15%的化合物;
    分子动力学模拟:将选取的化合物做分子动力学模拟,根据模拟结果,从化合物库中筛选出符合条件的化合物。
  10. 如权利要求7所述的方法,其特征在于,所述评价函数中,为各个分数分别设置了权重:w 1,w 2,w 3,……w n,形成一个评价函数,所述评价函数算术加权平均:
    Figure PCTCN2020098530-appb-100001
    或者几何加权平均:
    Figure PCTCN2020098530-appb-100002
PCT/CN2020/098530 2020-06-28 2020-06-28 晶体复合物的药物虚拟筛选系统及方法 WO2021103516A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/098530 WO2021103516A1 (zh) 2020-06-28 2020-06-28 晶体复合物的药物虚拟筛选系统及方法
US17/427,103 US20220130487A1 (en) 2020-06-28 2020-06-28 Drug virtual screening system for crystal complexes, and method of using the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/098530 WO2021103516A1 (zh) 2020-06-28 2020-06-28 晶体复合物的药物虚拟筛选系统及方法

Publications (1)

Publication Number Publication Date
WO2021103516A1 true WO2021103516A1 (zh) 2021-06-03

Family

ID=76129144

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098530 WO2021103516A1 (zh) 2020-06-28 2020-06-28 晶体复合物的药物虚拟筛选系统及方法

Country Status (2)

Country Link
US (1) US20220130487A1 (zh)
WO (1) WO2021103516A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436686A (zh) * 2021-06-23 2021-09-24 腾讯科技(深圳)有限公司 基于人工智能的化合物库构建方法、装置、设备及存储介质
CN113643826A (zh) * 2021-08-31 2021-11-12 重庆电子工程职业学院 病理药物作用监测系统及方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115228781B (zh) * 2022-07-20 2024-03-19 晨星基因(北京)智能科技有限公司 一种基于深度学习的药物筛选方法
CN115295091B (zh) * 2022-08-08 2023-09-01 苏州创腾软件有限公司 基于AutoDock可视化平台的分子对接方法和系统
CN117174164B (zh) * 2023-10-30 2024-02-13 晨伫(杭州)生物科技有限责任公司 基于预测蛋白质-小分子结合姿势筛选先导化合物的方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1725222A (zh) * 2004-07-23 2006-01-25 中国科学院上海药物研究所 组合化学集中库设计与优化方法
US20130226549A1 (en) * 2012-02-27 2013-08-29 Yufeng J. Tseng Structure-based fragment hopping for lead optimization and improvement in synthetic accessibility
US20190325984A1 (en) * 2013-03-15 2019-10-24 Arzeda Corp. Automated method of computational enzyme identification and design
CN110459263A (zh) * 2019-06-27 2019-11-15 青岛海洋科学与技术国家实验室发展中心 一种基于bfgs算法的计算机药物筛选方法
CN110851617A (zh) * 2019-10-10 2020-02-28 中国海洋大学 一种基于知识图谱的多源信息药物筛选方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1725222A (zh) * 2004-07-23 2006-01-25 中国科学院上海药物研究所 组合化学集中库设计与优化方法
US20130226549A1 (en) * 2012-02-27 2013-08-29 Yufeng J. Tseng Structure-based fragment hopping for lead optimization and improvement in synthetic accessibility
US20190325984A1 (en) * 2013-03-15 2019-10-24 Arzeda Corp. Automated method of computational enzyme identification and design
CN110459263A (zh) * 2019-06-27 2019-11-15 青岛海洋科学与技术国家实验室发展中心 一种基于bfgs算法的计算机药物筛选方法
CN110851617A (zh) * 2019-10-10 2020-02-28 中国海洋大学 一种基于知识图谱的多源信息药物筛选方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436686A (zh) * 2021-06-23 2021-09-24 腾讯科技(深圳)有限公司 基于人工智能的化合物库构建方法、装置、设备及存储介质
WO2022267752A1 (zh) * 2021-06-23 2022-12-29 腾讯科技(深圳)有限公司 基于人工智能的化合物处理方法、装置、设备、存储介质及计算机程序产品
CN113436686B (zh) * 2021-06-23 2024-02-27 腾讯科技(深圳)有限公司 基于人工智能的化合物库构建方法、装置、设备及存储介质
CN113643826A (zh) * 2021-08-31 2021-11-12 重庆电子工程职业学院 病理药物作用监测系统及方法

Also Published As

Publication number Publication date
US20220130487A1 (en) 2022-04-28

Similar Documents

Publication Publication Date Title
WO2021103516A1 (zh) 晶体复合物的药物虚拟筛选系统及方法
CN111863120B (zh) 晶体复合物的药物虚拟筛选系统及方法
Han et al. Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem
Merlot et al. Chemical substructures in drug discovery
US10275711B2 (en) System and method for scientific information knowledge management
CN109584969B (zh) 一种先导化合物的量子动力学计算方法
Wassermann et al. SAR matrices: automated extraction of information-rich SAR tables from large compound data sets
Seidel et al. 3D Pharmacophore Modeling Techniques in Computer‐Aided Molecular Design Using LigandScout
Lapins et al. Evaluation of gene expression and phenotypic profiling data as quantitative descriptors for predicting drug targets and mechanisms of action
CN114203269B (zh) 一种基于机器学习和分子对接技术的抗癌中药筛选方法
CN114678082A (zh) 一种计算机辅助虚拟高通量筛选算法
Poptodorov et al. Pharmacophore model generation software tools
Lakizadeh et al. Detection of polypharmacy side effects by integrating multiple data sources and convolutional neural networks
CA2700558A1 (en) Software assisted methods for probing the biochemical basis of biological states
Jalali-Heravi et al. Classification of anti-HIV compounds using counterpropagation artificial neural networks and decision trees
Clark et al. PRO_LIGAND: an approach to de novo molecular design. 5. Tools for the analysis of generated structures
Dagur et al. Virtual screening of phytochemicals for drug discovery
Ma et al. Deep Learning Model of Dock by Dock Process Significantly Accelerate the Process of Docking-based Virtual Screening
Dhingra et al. Virtual screening
Fattore et al. Knowledge discovery and system biology in molecular medicine: an application on neurodegenerative diseases
Jha et al. Network based algorithms for module extraction from RNASeq data: A quantitative assessment
van Beek Channeling the data flood: handling large-scale biomolecular measurements in silico
Krokidis et al. Recent Dimensionality Reduction Techniques for Visualizing High-Dimensional Parkinson’s Disease Omics Data
Mohapatra et al. Triclustering of gene expression microarray data using coarse-grained parallel genetic algorithm
Ayati et al. Utilization of Landscape of Kinases and Phosphosites To Predict Kinase-Substrate Association

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20891644

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/05/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20891644

Country of ref document: EP

Kind code of ref document: A1