WO2023279436A1 - Drug molecule intelligent generation method based on reinforcement learning and docking - Google Patents

Drug molecule intelligent generation method based on reinforcement learning and docking Download PDF

Info

Publication number
WO2023279436A1
WO2023279436A1 PCT/CN2021/107490 CN2021107490W WO2023279436A1 WO 2023279436 A1 WO2023279436 A1 WO 2023279436A1 CN 2021107490 W CN2021107490 W CN 2021107490W WO 2023279436 A1 WO2023279436 A1 WO 2023279436A1
Authority
WO
WIPO (PCT)
Prior art keywords
fragments
fragment
molecules
molecule
reinforcement learning
Prior art date
Application number
PCT/CN2021/107490
Other languages
French (fr)
Chinese (zh)
Inventor
魏志强
王茜
刘昊
李阳阳
王卓亚
Original Assignee
中国海洋大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国海洋大学 filed Critical 中国海洋大学
Priority to JP2022543606A priority Critical patent/JP7387962B2/en
Publication of WO2023279436A1 publication Critical patent/WO2023279436A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the invention relates to the fields of medicinal chemistry and computer technology, in particular to an intelligent generation method of drug molecules based on reinforcement learning and docking.
  • the invention provides a method for intelligently generating drug molecules based on reinforcement learning and docking.
  • the method is based on an Actor-critic reinforcement learning model and docking simulation, and is used to generate new drug molecules with optimal properties.
  • the Actor network adopts the bidirectional Transformer Encoder mechanism and DenseNet network modeling.
  • a drug molecular intelligence generation method based on reinforcement learning and docking which specifically includes the following steps:
  • Step 1 Construct a virtual fragment combination library for drug design
  • the drug molecule virtual fragment combination library is formed by fragmenting a group of molecules through the existing toolkit. When splitting molecules, the fragments are not classified, and all fragments are treated the same;
  • Step 2 Calculate fragment similarity for molecular fragment encoding
  • Step 3 Generate and optimize molecules based on Actor-critic reinforcement learning model
  • Actor-critic-based reinforcement learning model is used to generate and optimize molecules, which are modified by selecting a single fragment of the molecule and a bit in the fragment representation; then exchange the value in this bit, that is: if it is 0, change into 1 and vice versa; this allows to keep track of the degree of change applied to the molecule, the leading bits encoded will remain unchanged, so the model only allows changing bits at the end, to force the model to search only for molecules near known compounds;
  • the Actor-critic-based reinforcement learning model starts from the fragmented molecular state, that is, the current state; Actor extracts and checks all fragments, introduces the position information of different fragments in the molecule, and uses the Transformer Encoder mechanism to calculate the attention of different fragments in each molecule coefficients, and then through the DenseNet network output probability to decide which fragments to replace and which fragments to replace; according to the degree to which the new state satisfies all constraints, the new state is scored, and the critic then examines the value of the new state and the value of the current state.
  • the difference value TD-Error between is given to the actor; if yes, the actor's action will be reinforced, if negative, the action will be blocked; then, the current state is replaced by the new state, and the process repeats given the number of times;
  • the reward mechanism of the reinforcement learning model realizes reward result prediction by building a perceptron model.
  • the perceptron model includes two stages of training and prediction; during the training process , the data set includes two sources. One is that the positive samples of the data set come from known active molecules reported in the existing literature, and the other is that the negative samples of the data set come from the random sampling of the same number of ZINC libraries.
  • the calculated activity information obtained by sequential docking and the molecular intrinsic property information calculated by the existing toolkit are used as input after random order.
  • the model can learn the potential relationship between activity calculation information and property information and whether it is really active;
  • the model uses the calculated activity information of the generated molecules - using advanced and fast drug docking software to perform virtual molecular docking on the existing relevant PDB files of the generated molecules and disease-related targets, as well as the inherent properties of the generated molecules Information—calculated using a general-purpose software package as input to predict whether the generated molecule does have real activity, thereby further optimizing the activity of the generated molecule;
  • the Actor in the reinforcement learning model is rewarded every time it generates a valid molecule, if it manages to Produce molecules that meet the prediction model's expectations, and you get higher rewards.
  • step 1 when the molecule is split, all single bonds extending from a ring atom are destroyed, and a fragment chain list is created to record and store the original split point when splitting the molecule, which is convenient for the connection of subsequent molecular design points; if the total number of attachment points remains constant, the method allows the exchange of fragments with different numbers of attachment points; the open source toolkit RDKit is used for molecular cleavage in this process; fragments with more than 12 heavy atoms are discarded, with 4 Fragments with one or more attachment points are also discarded;
  • TMCS maximum common substructure Tanimoto-MCS
  • the TMCS distance between two molecules M1 and M2 is defined as:
  • molecular fragment encoding in step 2 as described are created by constructing a balanced binary tree based on fragment similarity, which is then used to generate binary strings for each fragment, thus in the extension Generates a binary string representing the molecule; the order of attachment points is treated as an identifier for each fragment; when assembling the tree, the similarity between all fragments is calculated, and then fragment pairs are formed in a greedy bottom-up fashion, where the most similar two segments are first paired, and the process is repeated to join the two pairs with the most similar segments into a new tree with four leaves; the computed similarity between two subtrees is measured as the maximum similarity between any two fragments of these trees; the joining process is repeated until all fragments are joined into a single tree;
  • each fragment is stored in a binary tree, use it to generate encodings for all fragments; the path from the root to the leaf where the fragment is stored determines the encoding of each fragment, for each branch in the tree, if left, in the encoding A 1 ("1") is appended to , or a 0 ("0") if to the right; thus, the rightmost character in the encoding corresponds to the branch closest to the fragment.
  • the invention is based on an Actor-critic reinforcement learning model and a docking simulation method for generating new molecules.
  • the model learns how to modify and improve the molecule so that it has the desired properties.
  • the present invention differs from previous reinforcement learning methods in that it focuses on how to generate new compounds that are structurally close to existing compounds by transforming fragments in lead compounds, thereby narrowing the searched chemical space.
  • the present invention is based on the Actor-critic reinforcement learning model, and the Actor network adopts a bidirectional Transformer Encoder mechanism and DenseNet network modeling, introduces the position information of different fragments in the molecule, and uses the Transformer Encoder mechanism to calculate the attention coefficient of different fragments in each molecule , save the relative or absolute position information of the fragments in the molecule, and realize parallel training.
  • the reward mechanism of reinforcement learning establishes a single-layer perceptron model.
  • the input of this model contains two parts of information, namely, molecular-related attribute information and activity information.
  • the activity information is the molecular docking of generated molecules and disease-related targets using docking software. Resulting, further optimization of the activity of the generated molecules.
  • the method of the present invention is estimated to generate more than 2 million candidate product molecules for targets corresponding to specific diseases.
  • the method of the present invention can generate more than 80% optimized high-quality AI molecules by adding more than 1,000 ultra-high-dimensional parameters to the molecular docking part and fusing molecular activity and related attribute information.
  • the method of the present invention relies on a large-scale supercomputing platform, and the molecular generation speed is significantly improved.
  • Figure 1 is a virtual molecular fragment library of Mpro-related compounds
  • Fig. 2 is the binary tree subpart comprising all fragments of Mpro related compounds
  • Figure 3 is a framework diagram of the Actor-critic reinforcement learning model
  • Figure 4 shows the detailed information of actors in the Actor-critic reinforcement learning model
  • Figure 5 shows the generation of active compound molecules of the Mpro new crown target.
  • the main goal of this example is: generation of active compounds against the Mpro target of the new crown, based on an initial set of lead compounds, and then improving and optimizing these molecules by replacing some fragments of them, resulting in an Mpro target with the desired properties new active compounds.
  • This embodiment is based on an Actor-critic reinforcement learning model and a docking simulation method for generating new drug molecules with optimal properties. The technical solution of this embodiment is described in detail below.
  • a method for generating drug molecule intelligence based on Actor-critic reinforcement learning model and docking which specifically includes the following steps:
  • Step 1 Construct a virtual fragment combination library for drug design.
  • the virtual fragment combinatorial library of drug molecules is constructed by fragmenting a set of molecules.
  • the virtual fragment library in this example is jointly constructed from 10172 compounds related to the Mpro target in the medicinal chemistry database ChEMBL database and 175 lead compounds from the Mpro target obtained by molecular docking screening in the laboratory, as shown in Figure 1 shown.
  • a common approach to fragmenting molecules is to group them into categories such as loop structures, side chains, and linkers. When splitting molecules, we basically follow the same scheme, but we don't sort the fragments. All fragments are thus treated identically. To break a molecule, all single bonds extending from a ring atom are broken.
  • Step 2 Calculate fragment similarity for molecular fragment encoding.
  • Step 2.1 Calculate the similarity between segments
  • mcs(M1,M2) is the number of atoms in the largest common substructure of molecules M1 and M2
  • atoms(M1) and atoms(M2) are the number of atoms in molecules M1 and M2, respectively.
  • Tanimoto-MCS similarity is that it directly compares the structures of fragments and thus does not depend on other specific representations. This approach often works well when comparing "drug-like" molecules.
  • Levenshtein distance is defined as the minimum number of insertions, deletions and substitutions required to make two strings identical. But considering the impact of the replacement operation on the edit distance, this embodiment finally introduces the Damerau-Levenshtein distance that improves the Levenshtein distance, then the Damerau-Levenshtein distance between two character strings is defined as:
  • All fragments are encoded into binary strings. These strings are created by building a balanced binary tree based on segment similarity. This tree is then used to generate binary strings for each fragment, and thus in extensions, to generate binary strings representing molecules. The order of attachment points is treated as an identifier for each fragment.
  • the path from the root to the leaf where the fragments are stored determines the encoding of each fragment.
  • a one is appended to the encoding if it goes left, and a zero (“0") if it goes right, as shown in Figure 2; thus, the encoding The rightmost character corresponds to the branch closest to the fragment.
  • Step 3 Generate and optimize molecules based on the Actor-critic reinforcement learning model.
  • the present invention uses an Actor-critic-based reinforcement learning model to generate and optimize molecules, and the optimization is modified by selecting a single fragment of the molecule and a bit in the representation of the fragment. Then swap the value in this bit. ie: if it is 0, it becomes 1 and vice versa.
  • This allows tracking of the degree of change applied to the molecule, since modification bits at the end of the code will represent changes for very similar fragments, while changes at the beginning will represent changes for very different types of fragments.
  • the leading bits of the encoding will remain the same, so the model is only allowed to change bits at the end to force the model to search only for molecules near known compounds. As shown in Figure 3.
  • the actor-critic-based reinforcement learning model starts from the fragmented molecular state, namely the current state S.
  • Actor extracts and checks all fragments, and uses the bidirectional Transformer Encoder mechanism and DenseNet network to decide which fragments to replace and which fragments to replace, that is, the actions Ai taken by the Actor get the new state Si.
  • the new state Si is scored R according to how well the new state satisfies all the constraints.
  • Actor network adopts two-way Transformer Encoder mechanism and DenseNet network modeling, introduces the position information of different fragments in the molecule, and uses Transformer Encoder mechanism to calculate the attention coefficient of different fragments in each molecule.
  • This structure reads the encoding fragment representing one molecule at a time.
  • the forward and backward outputs are concatenated, and an estimate of the probability distribution of which segment to change and to what is computed by passing the concatenated representation through a DenseNet neural network.
  • each molecule is constructed as a sequence of fragments, which is passed to the Transformer encoder mechanism in one pass.
  • the importance of different fragments is obtained by calculating the attention coefficients of different fragments in each molecule.
  • the forward and backward transformer Encoder a vectorized representation of a molecule with different fragment correlations is output; finally, the result of Concatenate is classified through the DenseNet network, and the probability distribution of which fragment to change and what to change is calculated. ,As shown in Figure 4.
  • a major challenge in drug discovery is designing molecules optimized for multiple properties that may not correlate well.
  • two different classes of properties were selected that characterize the viability of a molecule as a suitable drug.
  • the aim of this invention is to generate molecules of the drug that more closely match the properties of the real active molecule, ie in the "sweet spot" of the target.
  • the selected properties include the inherent property information of the molecule itself (such as: MW, clogP, and PSA, etc.) and the calculated activity information of the molecule (ie, the docking result information of the molecule and the corresponding target of a specific disease).
  • the model includes two stages of training and prediction.
  • the data set includes two sources. One is that the positive samples of the data set come from known active molecules reported in the existing literature, and the other is that the negative samples of the data set come from the random sampling of the same number of ZINC libraries. After the negative samples are scrambled, the calculated activity information obtained by docking and the molecular intrinsic attribute information calculated by the existing toolkit are used as input. After multiple rounds of training, the model can learn the active calculation information and attribute information and whether it is really active. connection relation.
  • the model is based on the computational activity information of the generated molecules - which is obtained by virtual molecular docking of the generated molecules and disease-related targets using advanced and fast drug docking software.
  • the model uses drug docking software, such as Ledock, to perform virtual molecular docking on the existing relevant PDB files of less than or equal to 512 molecules generated per epoch and 380 different conformational targets related to the Mpro new crown.
  • Intrinsic attribute information of the generated molecule —calculated by using the general software package RDKit.
  • a total of 1143 ultra-high-dimensional parameters including the calculated activity information of the generated molecule and the inherent attribute information of the molecule itself are used as the input of the single-layer perceptron to predict whether the generated molecule is true or not.
  • With real activity the activity of the generated molecules is further optimized.
  • An actor in this reinforcement learning framework is rewarded for each valid molecule it produces, with higher rewards if it manages to produce molecules that match the prediction model's expectations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Medicinal Chemistry (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A drug molecule intelligent generation method based on reinforcement learning and docking, belonging to the technical field of drug chemistry and computers. The method comprises the following steps: 1) constructing a virtual fragment combination library for drug design; 2) calculating fragment similarity, and performing molecular fragment coding; and 3) generating and optimizing a molecule on the basis of an actor-critic model for reinforcement learning. In the described method, on the basis of a lead compound, the chemical space of a search is reduced. Transformer modeling is used by means of an actor-critic model for reinforcement learning, position information of a molecular fragment is introduced, and relative or absolute position information of the fragment in a molecule is stored, thereby achieving parallel training. Furthermore, by means of establishing a single-layer perceptron model, a reward mechanism further optimizes the activity of a generated molecule.

Description

一种基于强化学习和对接的药物分子智能生成方法A method for intelligent generation of drug molecules based on reinforcement learning and docking 技术领域technical field
本发明涉及药物化学与计算机技术领域,具体涉及一种基于强化学习和对接的药物分子智能生成方法。The invention relates to the fields of medicinal chemistry and computer technology, in particular to an intelligent generation method of drug molecules based on reinforcement learning and docking.
背景技术Background technique
在药物化学专业中,设计和制造安全有效的化合物是关键。就金钱和时间而言,这是一个漫长、复杂和困难的多参数优化过程。有希望的化合物在临床试验中失败的风险很大(>90%),导致不必要的资源浪费。现在将一种新药推向市场的平均成本远远超过10亿美元,从发现到上市的平均时间为13年。在药物方面,从发现到商业生产的平均时间可能更长,例如高能分子为25年。分子发现的关键第一步是产生一批用于计算研究或合成和表征的候选者。这是一项艰巨的任务,因为可能分子的化学空间是巨大的-潜在的类药物化合物的数量估计在10 23到10 60之间,而已经合成的所有化合物的数量约为10 8个数量级。启发式方法,如利平斯基针对药学的“五条规则”,可以帮助缩小可能性的空间,但仍面临着巨大挑战。 Designing and manufacturing safe and effective compounds is key in the medicinal chemistry profession. This is a long, complex and difficult multi-parameter optimization process in terms of money and time. Promising compounds have a high risk (>90%) of failing in clinical trials, resulting in unnecessary waste of resources. The average cost of bringing a new drug to market is now well over $1 billion, and the average time from discovery to market is 13 years. In pharmaceuticals, the average time from discovery to commercial production can be longer, such as 25 years for high-energy molecules. A critical first step in molecular discovery is generating a pool of candidates for computational study or synthesis and characterization. This is a daunting task because the chemical space of possible molecules is enormous—the number of potential drug-like compounds is estimated to be between 10 and 10 , while the number of all compounds that have been synthesized is on the order of 10. Heuristics, such as Lipinski's "rules of five" for pharmacy, can help narrow the space of possibilities, but significant challenges remain.
随着计算机技术的革命,使用AI进行药物发现逐渐成为一种趋势。传统上,为了实现此目标,已使用各种计算模型的组合,例如定量结构-活性关系(QSAR),分子替代,分子模拟和分子对接。但传统的方法本质上是组合的,往往会导致大多数分子不稳定性或不可合成性。近年来,许多基于深度学习模型来设计类似药物的化合物的生成模型应运而生,例如基于变分自动编码器的 分子生成方法,基于生成对抗性网络分子生成方法。但目前的方法在候选化合物的生成速度、有效性和分子活性方面仍有待提高。With the revolution of computer technology, the use of AI for drug discovery is gradually becoming a trend. Traditionally, to achieve this goal, a combination of various computational models, such as quantitative structure-activity relationship (QSAR), molecular substitution, molecular simulation, and molecular docking, have been used. But traditional approaches are combinatorial in nature, often leading to instability or non-synthesis of most molecules. In recent years, many generative models based on deep learning models to design drug-like compounds have emerged, such as molecular generation methods based on variational autoencoders and molecular generation methods based on generative adversarial networks. However, current methods still need to be improved in terms of the generation speed, effectiveness and molecular activity of candidate compounds.
发明内容Contents of the invention
本发明提供一种基于强化学习和对接的药物分子智能生成方法,该方法基于Actor-critic强化学习模型和对接模拟,用于生成具有最优性质的药物新分子。其中,Actor网络采用双向Transformer Encoder机制和DenseNet网络建模。The invention provides a method for intelligently generating drug molecules based on reinforcement learning and docking. The method is based on an Actor-critic reinforcement learning model and docking simulation, and is used to generate new drug molecules with optimal properties. Among them, the Actor network adopts the bidirectional Transformer Encoder mechanism and DenseNet network modeling.
为解决上述问题,本发明是通过以下技术方案实现的:In order to solve the above problems, the present invention is achieved through the following technical solutions:
一种基于强化学习和对接的药物分子智能生成方法,其具体包括步骤如下:A drug molecular intelligence generation method based on reinforcement learning and docking, which specifically includes the following steps:
步骤1.构建药物设计的虚拟片断组合库; Step 1. Construct a virtual fragment combination library for drug design;
药物分子虚拟片段组合库通过现有工具包片段化一组分子而构成的,拆分分子时,不将片段分类,所有片段被相同地对待;The drug molecule virtual fragment combination library is formed by fragmenting a group of molecules through the existing toolkit. When splitting molecules, the fragments are not classified, and all fragments are treated the same;
步骤2.计算片段相似性进行分子片段编码Step 2. Calculate fragment similarity for molecular fragment encoding
利用现有的计算化学相似性的组合方法来测量不同分子片段之间的相似性,通过构建基于相似性的平衡二叉树使所有片段都被编码成二进制字符串,因此,相似的片段获得相似的编码;Using the existing combined method of calculating chemical similarity to measure the similarity between different molecular fragments, all fragments are encoded into binary strings by constructing a similarity-based balanced binary tree, so similar fragments get similar encodings ;
步骤3.基于Actor-critic强化学习模型生成并优化分子Step 3. Generate and optimize molecules based on Actor-critic reinforcement learning model
(1)基于Actor-critic强化学习模型框架介绍(1) Introduction to Actor-critic reinforcement learning model framework
采用基于Actor-critic强化学习模型来生成并优化分子,通过选择分子的单个片段和该片段表示中的一个bit来进行修改的;然后交换此位中的值,即:如果它是0,则变成1,反之亦然;这允许跟踪应用于分子的改变程度,编码的前导位将保持不变,因此模型只允许在末尾更改位,以迫使模型仅搜索已知化合物附近的分子;Actor-critic-based reinforcement learning model is used to generate and optimize molecules, which are modified by selecting a single fragment of the molecule and a bit in the fragment representation; then exchange the value in this bit, that is: if it is 0, change into 1 and vice versa; this allows to keep track of the degree of change applied to the molecule, the leading bits encoded will remain unchanged, so the model only allows changing bits at the end, to force the model to search only for molecules near known compounds;
基于Actor-critic强化学习模型起始于片段化的分子状态,即当前状态; Actor提取并检查所有片段,引入不同片段在分子中的位置信息,采用Transformer Encoder机制计算每个分子中不同片段的attention系数,然后通过DenseNet网络输出概率来决定替换哪些片段以及用哪个片段替换;根据新状态满足所有约束的程度,对新状态进行评分,critic随后考察了新状态和当前状态的价值所增加的奖励之间的差异值TD-Error,是给actor的;如果是肯定的,actor的行动将得到加强,如果是否定的,行动将被阻止;然后,当前状态被新状态替换,并且该过程重复给定的次数;The Actor-critic-based reinforcement learning model starts from the fragmented molecular state, that is, the current state; Actor extracts and checks all fragments, introduces the position information of different fragments in the molecule, and uses the Transformer Encoder mechanism to calculate the attention of different fragments in each molecule coefficients, and then through the DenseNet network output probability to decide which fragments to replace and which fragments to replace; according to the degree to which the new state satisfies all constraints, the new state is scored, and the critic then examines the value of the new state and the value of the current state. The difference value TD-Error between is given to the actor; if yes, the actor's action will be reinforced, if negative, the action will be blocked; then, the current state is replaced by the new state, and the process repeats given the number of times;
(2)强化学习模型奖励机制的最优化(2) Optimization of reward mechanism of reinforcement learning model
设计针对分子本身固有属性信息以及分子计算活性信息两种特性优化的分子,强化学习模型的奖励机制部分通过构建感知机模型实现奖励结果预测,感知机模型包括训练与预测两个阶段;训练过程中,数据集包括两部分来源,一是数据集的正样本来自现有文献报道已知有活性的分子,二是数据集的负样本来自同等数量的ZINC库的随机采样,通过将正负样本打乱顺序后依次对接获取的计算活性信息以及现有工具包计算得到的分子固有属性信息作为输入,经过多轮训练使得模型可以学习到活性计算信息及属性信息与是否真正有活性的潜在关联关系;预测过程中,该模型以生成分子的计算活性信息——使用先进且快速的药物对接软件将生成分子与疾病相关的靶点的现有相关PDB文件进行虚拟分子对接得到,以及生成分子的固有属性信息——使用通用软件包计算得到作为输入,预测生成分子是否确实具有真实活性,从而进一步优化了生成分子的活性;强化学习模型中的Actor每产生一个有效的分子就会得到奖励,如果它设法产生符合预测模型预期的分子,就会得到更高的奖励。Design molecules that are optimized for the two characteristics of the molecule's inherent attribute information and molecular calculation activity information. The reward mechanism of the reinforcement learning model realizes reward result prediction by building a perceptron model. The perceptron model includes two stages of training and prediction; during the training process , the data set includes two sources. One is that the positive samples of the data set come from known active molecules reported in the existing literature, and the other is that the negative samples of the data set come from the random sampling of the same number of ZINC libraries. The calculated activity information obtained by sequential docking and the molecular intrinsic property information calculated by the existing toolkit are used as input after random order. After multiple rounds of training, the model can learn the potential relationship between activity calculation information and property information and whether it is really active; During the prediction process, the model uses the calculated activity information of the generated molecules - using advanced and fast drug docking software to perform virtual molecular docking on the existing relevant PDB files of the generated molecules and disease-related targets, as well as the inherent properties of the generated molecules Information—calculated using a general-purpose software package as input to predict whether the generated molecule does have real activity, thereby further optimizing the activity of the generated molecule; the Actor in the reinforcement learning model is rewarded every time it generates a valid molecule, if it manages to Produce molecules that meet the prediction model's expectations, and you get higher rewards.
进一步,在所述的步骤1在分子拆分时,从一个环原子延伸的所有单键均被破坏,分裂分子时创建片段链列表来记录并存储原始拆分点,便于作为后面 分子设计的连接点;如果附接点的总数保持不变,则该方法允许交换具有不同附接点数量的片段;在此过程中使用开源工具包RDKit进行分子裂解;重原子超过12个的碎片将被丢弃,具有4个或更多附着点的碎片也会被丢弃;Further, in the step 1, when the molecule is split, all single bonds extending from a ring atom are destroyed, and a fragment chain list is created to record and store the original split point when splitting the molecule, which is convenient for the connection of subsequent molecular design points; if the total number of attachment points remains constant, the method allows the exchange of fragments with different numbers of attachment points; the open source toolkit RDKit is used for molecular cleavage in this process; fragments with more than 12 heavy atoms are discarded, with 4 Fragments with one or more attachment points are also discarded;
进一步,在所述的步骤2中的计算片段间相似性时,在比较“类药物”分子时,具体是使用最大共同子结构Tanimoto-MCS(TMCS)来比较相似性,对于较小的片段,引入改进Levenshtein距离的Damerau-Levenshtein距离,则两个字符串之间的Damerau-Levenshtein距离定义为:Further, when calculating the similarity between fragments in step 2, when comparing "drug-like" molecules, specifically use the maximum common substructure Tanimoto-MCS (TMCS) to compare the similarity, for smaller fragments, Introducing the Damerau-Levenshtein distance that improves the Levenshtein distance, the Damerau-Levenshtein distance between two strings is defined as:
Figure PCTCN2021107490-appb-000001
Figure PCTCN2021107490-appb-000001
两个分子M1和M2之间的TMCS距离定义为:The TMCS distance between two molecules M1 and M2 is defined as:
Figure PCTCN2021107490-appb-000002
Figure PCTCN2021107490-appb-000002
则测量两个分子M1和M2之间的相似性,以及相应的smiles表示S1和S2,即Max(TMCS(M 1,M 2),DL(S 1,S 2); Then measure the similarity between two molecules M1 and M2, and the corresponding smiles represent S1 and S2, namely Max(TMCS(M 1 , M 2 ), DL(S 1 , S 2 );
进一步,所述的步骤2中的分子片段编码:这些字符串是通过构建基于片段相似性的平衡二叉树来创建的,随后,该树被用来为每个片段生成二进制字符串,从而在延伸中生成表示分子的二进制字符串;附着点的顺序被视为每个片段的标识符;当组装树时,计算所有片段之间的相似度,然后以贪婪的自下而上的方式形成片段对,其中首先将最相似的两个片段配对,然后重复这一过程,将具有最相似片段的两对连接成一棵有四片叶子的新树;计算出的两个子树之间的相似度被测量为这些树的任意两个片段之间的最大相似度;重复连接过程,直到所有片段都连接到单个树中;Further, molecular fragment encoding in step 2 as described: these strings are created by constructing a balanced binary tree based on fragment similarity, which is then used to generate binary strings for each fragment, thus in the extension Generates a binary string representing the molecule; the order of attachment points is treated as an identifier for each fragment; when assembling the tree, the similarity between all fragments is calculated, and then fragment pairs are formed in a greedy bottom-up fashion, where the most similar two segments are first paired, and the process is repeated to join the two pairs with the most similar segments into a new tree with four leaves; the computed similarity between two subtrees is measured as the maximum similarity between any two fragments of these trees; the joining process is repeated until all fragments are joined into a single tree;
当每个片段都存储在二叉树中时,使用它为所有片段生成编码;从根到存 储片段的叶子的路径确定每个片段的编码,对于树中的每个分支,如果向左,则在编码中附加一个1(“1”),如果向右,则会添加一个0(“0”);因此,编码中最右边的字符对应于最接近片段的分支。As each fragment is stored in a binary tree, use it to generate encodings for all fragments; the path from the root to the leaf where the fragment is stored determines the encoding of each fragment, for each branch in the tree, if left, in the encoding A 1 ("1") is appended to , or a 0 ("0") if to the right; thus, the rightmost character in the encoding corresponds to the branch closest to the fragment.
本发明与现有技术相比的有益效果:The beneficial effect of the present invention compared with prior art:
本发明基于Actor-critic强化学习模型和对接模拟方法,用于生成新的分子。该模型学习如何修饰和改进分子,使其具有所需的性质。The invention is based on an Actor-critic reinforcement learning model and a docking simulation method for generating new molecules. The model learns how to modify and improve the molecule so that it has the desired properties.
(1)本发明与以前的强化学习方法的不同之处在于,它专注于如何通过转化先导化合物中的片段来产生结构上与现有化合物接近的新化合物,进而缩小搜索的化学空间。(1) The present invention differs from previous reinforcement learning methods in that it focuses on how to generate new compounds that are structurally close to existing compounds by transforming fragments in lead compounds, thereby narrowing the searched chemical space.
(2)本发明基于Actor-critic强化学习模型,Actor网络采用双向Transformer Encoder机制和DenseNet网络建模,引入不同片段在分子中的位置信息,采用Transformer Encoder机制计算每个分子中不同片段的attention系数,保存片段在分子中的相对或绝对位置信息,实现并列化训练。(2) The present invention is based on the Actor-critic reinforcement learning model, and the Actor network adopts a bidirectional Transformer Encoder mechanism and DenseNet network modeling, introduces the position information of different fragments in the molecule, and uses the Transformer Encoder mechanism to calculate the attention coefficient of different fragments in each molecule , save the relative or absolute position information of the fragments in the molecule, and realize parallel training.
(3)强化学习的奖励机制建立单层感知机模型,该模型输入包含两部分信息,即分子相关属性信息和活性信息,该活性信息是使用对接软件将生成分子与疾病相关靶点进行分子对接得到的,进一步优化了生成分子的活性。(3) The reward mechanism of reinforcement learning establishes a single-layer perceptron model. The input of this model contains two parts of information, namely, molecular-related attribute information and activity information. The activity information is the molecular docking of generated molecules and disease-related targets using docking software. Resulting, further optimization of the activity of the generated molecules.
(4)本发明方法在候选生成物规模上,针对特定疾病对应的靶点估计可产生200多万个候选生成分子。(4) On the scale of candidate products, the method of the present invention is estimated to generate more than 2 million candidate product molecules for targets corresponding to specific diseases.
(5)本发明方法通过分子对接部分加入1000多个超高维参数,融合分子活性和相关属性信息,可生成优化的80%以上的高质量AI分子。(5) The method of the present invention can generate more than 80% optimized high-quality AI molecules by adding more than 1,000 ultra-high-dimensional parameters to the molecular docking part and fusing molecular activity and related attribute information.
(6)本发明方法依托大规模超算平台,分子生成速度显著提高。(6) The method of the present invention relies on a large-scale supercomputing platform, and the molecular generation speed is significantly improved.
附图说明Description of drawings
图1为Mpro相关化合物的虚拟分子片段库;Figure 1 is a virtual molecular fragment library of Mpro-related compounds;
图2为包含Mpro相关化合物所有片段的二叉树子部分;Fig. 2 is the binary tree subpart comprising all fragments of Mpro related compounds;
图3为Actor-critic强化学习模型框架图;Figure 3 is a framework diagram of the Actor-critic reinforcement learning model;
图4为Actor-critic强化学习模型中actor的详细信息;Figure 4 shows the detailed information of actors in the Actor-critic reinforcement learning model;
图5为Mpro新冠靶点的活性化合物分子生成。Figure 5 shows the generation of active compound molecules of the Mpro new crown target.
具体实施方式detailed description
下面通过实施例结合附图来对本发明的技术方案做进一步解释,但本发明的保护范围不受实施例任何形式上的限制。The technical solution of the present invention will be further explained through the embodiments in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited in any form by the embodiments.
实施例1Example 1
本实施例的主要目标是:针对新冠Mpro靶点的活性化合物生成,基于一组初始的先导化合物,然后通过替换它们的一些片段来改进和优化这些分子,从而产生具有所需性质的Mpro靶点的新活性化合物。本实施例基于Actor-critic强化学习模型和对接模拟方法,用于生成具有最优性质的药物新分子。下面对本实施例的技术方案开展详细描述。The main goal of this example is: generation of active compounds against the Mpro target of the new crown, based on an initial set of lead compounds, and then improving and optimizing these molecules by replacing some fragments of them, resulting in an Mpro target with the desired properties new active compounds. This embodiment is based on an Actor-critic reinforcement learning model and a docking simulation method for generating new drug molecules with optimal properties. The technical solution of this embodiment is described in detail below.
一种基于Actor-critic强化学习模型和对接的药物分子智能生成方法,其具体包括步骤如下:A method for generating drug molecule intelligence based on Actor-critic reinforcement learning model and docking, which specifically includes the following steps:
步骤1.构建药物设计的虚拟片断组合库。 Step 1. Construct a virtual fragment combination library for drug design.
药物分子虚拟片段组合库通过片段化一组分子而构成的。本实施例的虚拟片段库是由来自药物化学数据库ChEMBL数据库中与Mpro靶点相关的10172个化合物和来自实验室进行分子对接筛选得到的Mpro靶点的175个先导化合物共同构建的,如图1所示。片段化分子的一种常见方法是将它们分为环结构,侧链和连接物等类别。拆分分子时,我们基本上遵循相同的方案,但是我们不将片段分类。因此,所有片段因此被相同地对待。为了使分子断裂,从一个环原子延伸的所有单键均被破坏。分裂分子时创建片段链列表来记录并存储原始 拆分点,便于作为后面分子设计的连接点。如果附接点的总数保持不变,则该方法允许交换具有不同附接点数量的片段。在此过程中使用现有化学信息学的开源工具包RDKit进行分子裂解。在这个过程中,重原子超过12个的碎片将被丢弃,具有4个或更多附着点的碎片也会被丢弃。实施这些约束是为了降低复杂性,同时仍然能够生成大量有趣的候选对象。The virtual fragment combinatorial library of drug molecules is constructed by fragmenting a set of molecules. The virtual fragment library in this example is jointly constructed from 10172 compounds related to the Mpro target in the medicinal chemistry database ChEMBL database and 175 lead compounds from the Mpro target obtained by molecular docking screening in the laboratory, as shown in Figure 1 shown. A common approach to fragmenting molecules is to group them into categories such as loop structures, side chains, and linkers. When splitting molecules, we basically follow the same scheme, but we don't sort the fragments. All fragments are thus treated identically. To break a molecule, all single bonds extending from a ring atom are broken. When splitting a molecule, create a chain list of fragments to record and store the original split point, which can be used as the connection point for subsequent molecular design. This method allows exchanging fragments with different numbers of attachment points if the total number of attachment points remains the same. In this process, the existing cheminformatics open source toolkit RDKit is used for molecular cleavage. During this process, fragments with more than 12 heavy atoms are discarded, as are fragments with 4 or more attachment points. These constraints are enforced to reduce complexity while still being able to generate a large number of interesting candidates.
步骤2.计算片段相似性进行分子片段编码。Step 2. Calculate fragment similarity for molecular fragment encoding.
步骤2.1计算片段间相似性Step 2.1 Calculate the similarity between segments
在本实施例中,所有片段都被编码为二进制字符串,并且编码的目的是相似的片段应该获得相似的编码。因此必须测量片段之间的相似性。有许多方法可以计算化学相似性。分子指纹是一种直接的二进制编码,其中相似的分子原则上应该给出相似的编码。但是,在比较分子片段及其固有的稀疏表示形式时,我们发现它们对于此目的的用处较小。化学上直观的方法来测量分子之间的相似性是使用最大共同子结构Tanimoto-MCS(TMCS)相似性:In this embodiment, all fragments are encoded as binary strings, and the purpose of the encoding is that similar fragments should obtain similar encodings. It is therefore necessary to measure the similarity between fragments. There are many ways to calculate chemical similarity. A molecular fingerprint is a straightforward binary code, where similar molecules should in principle be given similar codes. However, when comparing molecular fragments and their inherently sparse representations, we found them to be less useful for this purpose. A chemically intuitive way to measure the similarity between molecules is to use the maximum common substructure Tanimoto-MCS (TMCS) similarity:
Figure PCTCN2021107490-appb-000003
Figure PCTCN2021107490-appb-000003
这里,mcs(M1,M2)是分子M1和M2的最大公共子结构中的原子数,atoms(M1)和atoms(M2)分别是分子M1和M2中的原子数。Here, mcs(M1,M2) is the number of atoms in the largest common substructure of molecules M1 and M2, and atoms(M1) and atoms(M2) are the number of atoms in molecules M1 and M2, respectively.
Tanimoto-MCS相似性的一个优点是它直接比较片段的结构,因此不依赖于其他特定的表示。在比较“类药物”分子时,这种方法通常效果很好。但是,对于较小的片段使用Tanimoto-MCS相似性是有缺点的。因此,本发明中引入衡量两个文本字符串之间相似性的常用方法Levenshtein距离。Levenshtein距离被定义为使两个字符串相同所需的最小插入、删除和替换次数。但考虑置换操作对编辑距离的影响,本实施例最终引入改进Levenshtein距离的 Damerau-Levenshtein距离,则两个字符串之间的Damerau-Levenshtein距离定义为:An advantage of Tanimoto-MCS similarity is that it directly compares the structures of fragments and thus does not depend on other specific representations. This approach often works well when comparing "drug-like" molecules. However, there are drawbacks to using Tanimoto-MCS similarity for smaller fragments. Therefore, the present invention introduces Levenshtein distance, a commonly used method to measure the similarity between two text strings. Levenshtein distance is defined as the minimum number of insertions, deletions and substitutions required to make two strings identical. But considering the impact of the replacement operation on the edit distance, this embodiment finally introduces the Damerau-Levenshtein distance that improves the Levenshtein distance, then the Damerau-Levenshtein distance between two character strings is defined as:
Figure PCTCN2021107490-appb-000004
Figure PCTCN2021107490-appb-000004
作为折衷方案,我们选择测量两个分子M1和M2之间的相似性,以及相应的smiles表示S1和S2,即As a compromise, we choose to measure the similarity between two molecules M1 and M2, and the corresponding smiles representations S1 and S2, i.e.
Max(TMCS(M 1,M 2),DL(S 1,S 2) Max(TMCS(M 1 ,M 2 ), DL(S 1 ,S 2 )
步骤2.2分子片段编码Step 2.2 Molecular Fragment Encoding
所有片段都被编码成二进制字符串。这些字符串是通过构建基于片段相似性的平衡二叉树来创建的。随后,该树被用来为每个片段生成二进制字符串,从而在延伸中生成表示分子的二进制字符串。附着点的顺序被视为每个片段的标识符。当组装树时,计算所有片段之间的相似度。然后以贪婪的自下而上的方式形成片段对,其中首先将最相似的两个片段配对。然后重复这一过程,将具有最相似片段的两对连接成一棵有四片叶子的新树。计算出的两个子树之间的相似度被测量为这些树的任意两个片段之间的最大相似度。重复连接过程,直到所有片段都连接到单个树中。All fragments are encoded into binary strings. These strings are created by building a balanced binary tree based on segment similarity. This tree is then used to generate binary strings for each fragment, and thus in extensions, to generate binary strings representing molecules. The order of attachment points is treated as an identifier for each fragment. When assembling the tree, calculate the similarity between all fragments. Segment pairs are then formed in a greedy bottom-up fashion, where the most similar two segments are paired first. The process is then repeated to join the two pairs with the most similar segments into a new tree with four leaves. The calculated similarity between two subtrees is measured as the maximum similarity between any two fragments of these trees. The joining process is repeated until all fragments are joined into a single tree.
当每个片段都存储在二叉树中时,可以使用它为所有片段生成编码。从根到存储片段的叶子的路径确定每个片段的编码。对于树中的每个分支,如果向左,则在编码中附加一个1(“1”),如果向右,则会添加一个零(“0”),如图2所示;因此,编码中最右边的字符对应于最接近片段的分支。This can be used to generate encodings for all fragments as each fragment is stored in a binary tree. The path from the root to the leaf where the fragments are stored determines the encoding of each fragment. For each branch in the tree, a one ("1") is appended to the encoding if it goes left, and a zero ("0") if it goes right, as shown in Figure 2; thus, the encoding The rightmost character corresponds to the branch closest to the fragment.
步骤3.基于Actor-critic强化学习模型生成并优化分子。Step 3. Generate and optimize molecules based on the Actor-critic reinforcement learning model.
步骤3.1基于Actor-critic强化学习模型框架介绍Step 3.1 Introduction to Actor-critic reinforcement learning model framework
本发明采用基于Actor-critic强化学习模型来生成并优化分子,优化是通过选择分子的单个片段和该片段表示中的一个bit来进行修改的。然后交换此位中的值。即:如果它是0,则变成1,反之亦然。这允许跟踪应用于分子的改变程度,因为在编码的末尾修改位将代表非常相似的片段的改变,而开始处的改变将代表非常不同类型的片段的改变。编码的前导位将保持不变,因此模型只允许在末尾更改位,以迫使模型仅搜索已知化合物附近的分子。如图3所示。The present invention uses an Actor-critic-based reinforcement learning model to generate and optimize molecules, and the optimization is modified by selecting a single fragment of the molecule and a bit in the representation of the fragment. Then swap the value in this bit. ie: if it is 0, it becomes 1 and vice versa. This allows tracking of the degree of change applied to the molecule, since modification bits at the end of the code will represent changes for very similar fragments, while changes at the beginning will represent changes for very different types of fragments. The leading bits of the encoding will remain the same, so the model is only allowed to change bits at the end to force the model to search only for molecules near known compounds. As shown in Figure 3.
基于Actor-critic强化学习模型起始于片段化的分子状态,即当前状态S。Actor提取并检查所有片段,采用双向Transformer Encoder机制和DenseNet网络来决定替换哪些片段以及用哪个片段替换,即Actor所采取的行动Ai得到新状态Si。根据新状态满足所有约束的程度,对新状态Si进行评分R。critic随后考察了Si和S的价值所增加的奖励之间的差异值Td-error,是给actor的。如果是肯定的,actor的行动Ai将得到加强,如果是否定的,行动将被阻止。然后,当前状态被新状态替换,并且该过程重复给定的次数。其中,损失函数loss=-log(prob)*td_errorThe actor-critic-based reinforcement learning model starts from the fragmented molecular state, namely the current state S. Actor extracts and checks all fragments, and uses the bidirectional Transformer Encoder mechanism and DenseNet network to decide which fragments to replace and which fragments to replace, that is, the actions Ai taken by the Actor get the new state Si. The new state Si is scored R according to how well the new state satisfies all the constraints. The critic then examines the difference, Td-error , between the rewards increased by the values of Si and S, given to the actor. If yes, the actor's action Ai will be enhanced, if negative, the action will be blocked. Then, the current state is replaced by the new state, and the process repeats a given number of times. Among them, the loss function loss=-log(prob)*td_error
步骤3.2强化学习模型Actor的网络结构Step 3.2 Network structure of reinforcement learning model Actor
Actor网络采用双向Transformer Encoder机制和DenseNet网络建模,引入不同片段在分子中的位置信息,采用Transformer Encoder机制计算每个分子中不同片段的attention系数,该结构一次读取代表一个分子的编码片段,将前向和后向输出连接起来,并通过将连接的表示通过DenseNet神经网络来计算要改变哪个片段以及改变成什么的概率分布的估计。Actor network adopts two-way Transformer Encoder mechanism and DenseNet network modeling, introduces the position information of different fragments in the molecule, and uses Transformer Encoder mechanism to calculate the attention coefficient of different fragments in each molecule. This structure reads the encoding fragment representing one molecule at a time. The forward and backward outputs are concatenated, and an estimate of the probability distribution of which segment to change and to what is computed by passing the concatenated representation through a DenseNet neural network.
因为替换片段的概率取决于分子的前进片段和尾随片段。因此,每个分子 被构造成片段序列,将该序列一次性传递给Transformer encoder机制。通过计算每个分子中不同片段的attention系数获取不同片段的重要性。然后,经过前向和后向transformer Encoder输出一个分子的拥有不同片段相关联性的向量化表示;最后,Concatenate的结果经过DenseNet网络做分类,计算要改变哪个片段以及改变成什么的概率分布的估计,如图4所示。Because the probability of replacing a fragment depends on the advancing and trailing fragments of the molecule. Therefore, each molecule is constructed as a sequence of fragments, which is passed to the Transformer encoder mechanism in one pass. The importance of different fragments is obtained by calculating the attention coefficients of different fragments in each molecule. Then, through the forward and backward transformer Encoder, a vectorized representation of a molecule with different fragment correlations is output; finally, the result of Concatenate is classified through the DenseNet network, and the probability distribution of which fragment to change and what to change is calculated. ,As shown in Figure 4.
步骤3.3强化学习模型奖励机制的最优化Step 3.3 Optimization of Reinforcement Learning Model Reward Mechanism
药物发现中的一项主要挑战是设计针对多种特性优化的分子,这些特性可能不具有很好的相关性。为了表明所提出的方法可以处理这种情况,选择了两类不同的特性,这些特性可以表征分子适合用作药物的可行性。该发明的目的是生成更贴近真实活性分子性质的药物的分子,即在目标的“最佳位置”产生分子。如上所述,所选择的性质包括分子本身固有属性信息(如:MW、clogP和PSA等)以及分子计算活性信息(即分子与特定疾病对应靶点的对接结果信息)。特别值得强调的是,本发明中强化学习模型的奖励机制部分通过构建单层感知机模型实现奖励结果预测。针对该模型包括训练与预测两个阶段。训练过程中,数据集包括两部分来源,一是数据集的正样本来自现有文献报道已知有活性的分子,二是数据集的负样本来自同等数量的ZINC库的随机采样,通过将正负样本打乱顺序后依次对接获取的计算活性信息以及现有工具包计算得到的分子固有属性信息作为输入,经过多轮训练使得模型可以学习到活性计算信息及属性信息与是否真正有活性的潜在关联关系。预测过程中,该模型以生成分子的计算活性信息——使用先进且快速的药物对接软件将生成分子与疾病相关的靶点进行虚拟分子对接得到。该模型使用药物对接软件,比如Ledock,将每一个epoch生成的少于或等于512个分子与Mpro新冠相关的380个不同构象的靶点的现有相关PDB文件进行虚拟分子对接。生成分子的固有属性信息——使用通用软件 包RDKit计算得到,将生成分子的计算活性信息和分子本身的固有属性信息共1143个超高维参数作为单层感知机的输入,预测生成分子是否确实具有真实活性,从而进一步优化了生成分子的活性。该强化学习框架中的actor每产生一个有效的分子就会得到奖励,如果它设法产生符合预测模型预期的分子,就会得到更高的奖励。A major challenge in drug discovery is designing molecules optimized for multiple properties that may not correlate well. To show that the proposed method can handle this situation, two different classes of properties were selected that characterize the viability of a molecule as a suitable drug. The aim of this invention is to generate molecules of the drug that more closely match the properties of the real active molecule, ie in the "sweet spot" of the target. As mentioned above, the selected properties include the inherent property information of the molecule itself (such as: MW, clogP, and PSA, etc.) and the calculated activity information of the molecule (ie, the docking result information of the molecule and the corresponding target of a specific disease). It is particularly worth emphasizing that the reward mechanism of the reinforcement learning model in the present invention realizes reward result prediction by constructing a single-layer perceptron model. The model includes two stages of training and prediction. During the training process, the data set includes two sources. One is that the positive samples of the data set come from known active molecules reported in the existing literature, and the other is that the negative samples of the data set come from the random sampling of the same number of ZINC libraries. After the negative samples are scrambled, the calculated activity information obtained by docking and the molecular intrinsic attribute information calculated by the existing toolkit are used as input. After multiple rounds of training, the model can learn the active calculation information and attribute information and whether it is really active. connection relation. In the prediction process, the model is based on the computational activity information of the generated molecules - which is obtained by virtual molecular docking of the generated molecules and disease-related targets using advanced and fast drug docking software. The model uses drug docking software, such as Ledock, to perform virtual molecular docking on the existing relevant PDB files of less than or equal to 512 molecules generated per epoch and 380 different conformational targets related to the Mpro new crown. Intrinsic attribute information of the generated molecule—calculated by using the general software package RDKit. A total of 1143 ultra-high-dimensional parameters including the calculated activity information of the generated molecule and the inherent attribute information of the molecule itself are used as the input of the single-layer perceptron to predict whether the generated molecule is true or not. With real activity, the activity of the generated molecules is further optimized. An actor in this reinforcement learning framework is rewarded for each valid molecule it produces, with higher rewards if it manages to produce molecules that match the prediction model's expectations.
最终生成的Mpro新冠靶点的活性化合物分子如图5所示。The final active compound molecule of the Mpro new crown target is shown in Figure 5.
需要说明的是,尽管以上本发明所述的实施例是说明性的,但这并非是对本发明的限制,因此本发明并不局限于上述具体实施方式中。在不脱离本发明原理的情况下,凡是本领域技术人员在本发明的启示下获得的其它实施方式,均视为在本发明的保护之内。It should be noted that although the above-mentioned embodiments of the present invention are illustrative, they are not intended to limit the present invention, so the present invention is not limited to the above specific implementation manners. Without departing from the principles of the present invention, all other implementations obtained by those skilled in the art under the inspiration of the present invention are deemed to be within the protection of the present invention.

Claims (4)

  1. 一种基于强化学习和对接的药物分子智能生成方法,其特征在于所述方法具体包括步骤如下:A drug molecular intelligence generation method based on reinforcement learning and docking, characterized in that the method specifically includes the following steps:
    步骤1.构建药物设计的虚拟片断组合库;Step 1. Construct a virtual fragment combination library for drug design;
    药物分子虚拟片段组合库通过现有工具包片段化一组分子而构成的,拆分分子时,不将片段分类,所有片段被相同地对待;The drug molecule virtual fragment combination library is formed by fragmenting a group of molecules through the existing toolkit. When splitting molecules, the fragments are not classified, and all fragments are treated the same;
    步骤2.计算片段相似性进行分子片段编码Step 2. Calculate fragment similarity for molecular fragment encoding
    利用现有的计算化学相似性的组合方法来测量不同分子片段之间的相似性,通过构建基于相似性的平衡二叉树使所有片段都被编码成二进制字符串,因此,相似的片段获得相似的编码;Using the existing combined method of calculating chemical similarity to measure the similarity between different molecular fragments, all fragments are encoded into binary strings by constructing a similarity-based balanced binary tree, so similar fragments get similar encodings ;
    步骤3.基于Actor-critic强化学习模型生成并优化分子Step 3. Generate and optimize molecules based on Actor-critic reinforcement learning model
    (1)基于Actor-critic强化学习模型框架介绍(1) Introduction to Actor-critic reinforcement learning model framework
    采用基于Actor-critic强化学习模型来生成并优化分子,通过选择分子的单个片段和该片段表示中的一个bit来进行修改的;然后交换此位中的值,即:如果它是0,则变成1,反之亦然;这允许跟踪应用于分子的改变程度,编码的前导位将保持不变,因此模型只允许在末尾更改位,以迫使模型仅搜索已知化合物附近的分子;Actor-critic-based reinforcement learning model is used to generate and optimize molecules, which are modified by selecting a single fragment of the molecule and a bit in the fragment representation; then exchange the value in this bit, that is: if it is 0, change into 1 and vice versa; this allows to keep track of the degree of change applied to the molecule, the leading bits encoded will remain unchanged, so the model only allows changing bits at the end, to force the model to search only for molecules near known compounds;
    基于Actor-critic强化学习模型起始于片段化的分子状态,即当前状态;Actor提取并检查所有片段,引入不同片段在分子中的位置信息,采用Transformer Encoder机制计算每个分子中不同片段的attention系数,然后通过DenseNet网络输出概率来决定替换哪些片段以及用哪个片段替换;根据新状态满足所有约束的程度,对新状态进行评分,critic随后考察了新状态和当前状态的价值所增加的奖励之间的差异值TD-Error,是给actor的;如果是肯定的,actor的行动将得到加强,如果是否定的,行动将被阻止;然后,当前状态被新状态 替换,并且该过程重复给定的次数;The Actor-critic-based reinforcement learning model starts from the fragmented molecular state, that is, the current state; Actor extracts and checks all fragments, introduces the position information of different fragments in the molecule, and uses the Transformer Encoder mechanism to calculate the attention of different fragments in each molecule coefficients, and then through the DenseNet network output probability to decide which fragments to replace and which fragments to replace; according to the degree to which the new state satisfies all constraints, the new state is scored, and the critic then examines the value of the new state and the value of the current state. The difference value TD-Error between is given to the actor; if yes, the actor's action will be reinforced, if negative, the action will be blocked; then, the current state is replaced by the new state, and the process repeats given the number of times;
    (2)强化学习模型奖励机制的最优化(2) Optimization of reward mechanism of reinforcement learning model
    设计针对分子本身固有属性信息以及分子计算活性信息两种特性优化的分子,强化学习模型的奖励机制部分通过构建感知机模型实现奖励结果预测,感知机模型包括训练与预测两个阶段;训练过程中,数据集包括两部分来源,一是数据集的正样本来自现有文献报道已知有活性的分子,二是数据集的负样本来自同等数量的ZINC库的随机采样,通过将正负样本打乱顺序后依次对接获取的计算活性信息以及现有工具包计算得到的分子固有属性信息作为输入,经过多轮训练使得模型可以学习到活性计算信息及属性信息与是否真正有活性的潜在关联关系;预测过程中,该模型以生成分子的计算活性信息——使用先进且快速的药物对接软件将生成分子与疾病相关的靶点的现有相关PDB文件进行虚拟分子对接得到,以及生成分子的固有属性信息——使用通用软件包计算得到作为输入,预测生成分子是否确实具有真实活性,从而进一步优化了生成分子的活性;强化学习模型中的Actor每产生一个有效的分子就会得到奖励,如果它设法产生符合预测模型预期的分子,就会得到更高的奖励。Design molecules that are optimized for the two characteristics of the molecule's inherent attribute information and molecular calculation activity information. The reward mechanism of the reinforcement learning model realizes reward result prediction by building a perceptron model. The perceptron model includes two stages of training and prediction; during the training process , the data set includes two sources. One is that the positive samples of the data set come from known active molecules reported in the existing literature, and the other is that the negative samples of the data set come from the random sampling of the same number of ZINC libraries. The calculated activity information obtained by sequential docking and the molecular intrinsic property information calculated by the existing toolkit are used as input after random order. After multiple rounds of training, the model can learn the potential relationship between activity calculation information and property information and whether it is really active; During the prediction process, the model uses the calculated activity information of the generated molecules - using advanced and fast drug docking software to perform virtual molecular docking on the existing relevant PDB files of the generated molecules and disease-related targets, as well as the inherent properties of the generated molecules Information—calculated using a general-purpose software package as input to predict whether the generated molecule does have real activity, thereby further optimizing the activity of the generated molecule; the Actor in the reinforcement learning model is rewarded every time it generates a valid molecule, if it manages to Produce molecules that meet the prediction model's expectations, and you get higher rewards.
  2. 根据权利要求1所述的一种基于强化学习和对接的药物分子智能生成方法,其特征在于所述的步骤1在分子拆分时,从一个环原子延伸的所有单键均被破坏,分裂分子时创建片段链列表来记录并存储原始拆分点,便于作为后面分子设计的连接点;如果附接点的总数保持不变,则该方法允许交换具有不同附接点数量的片段;在此过程中使用开源工具包RDKit进行分子裂解;重原子超过12个的碎片将被丢弃,具有4个或更多附着点的碎片也会被丢弃。A method for intelligently generating drug molecules based on reinforcement learning and docking according to claim 1, characterized in that in step 1, when the molecule is split, all single bonds extending from a ring atom are destroyed, splitting the molecule When creating a fragment chain list to record and store the original split point, it is convenient to use as the connection point of the molecular design later; if the total number of attachment points remains unchanged, this method allows the exchange of fragments with different numbers of attachment points; used in this process The open-source toolkit RDKit performed molecular fragmentation; fragments with more than 12 heavy atoms were discarded, as were fragments with 4 or more attachment points.
  3. 根据权利要求1所述的一种基于强化学习和对接的药物分子智能生成方法,其特征在于所述的步骤2中的计算片段间相似性时,在比较“类药物”分子时,具 体是使用最大共同子结构Tanimoto-MCS来比较相似性,对于较小的片段,引入改进Levenshtein距离的Damerau-Levenshtein距离,则两个字符串之间的Damerau-Levenshtein距离定义为:A drug molecular intelligence generation method based on reinforcement learning and docking according to claim 1, characterized in that when calculating the similarity between segments in said step 2, when comparing "drug-like" molecules, specifically using The maximum common substructure Tanimoto-MCS is used to compare the similarity. For smaller fragments, the Damerau-Levenshtein distance that improves the Levenshtein distance is introduced. The Damerau-Levenshtein distance between two strings is defined as:
    Figure PCTCN2021107490-appb-100001
    Figure PCTCN2021107490-appb-100001
    两个分子M1和M2之间的TMCS距离定义为:The TMCS distance between two molecules M1 and M2 is defined as:
    Figure PCTCN2021107490-appb-100002
    Figure PCTCN2021107490-appb-100002
    则测量两个分子M1和M2之间的相似性,以及相应的smiles表示S1和S2,即
    Figure PCTCN2021107490-appb-100003
    Then measure the similarity between two molecules M1 and M2, and the corresponding smiles represent S1 and S2, namely
    Figure PCTCN2021107490-appb-100003
  4. 根据权利要求1所述的一种基于强化学习和对接的药物分子智能生成方法,其特征在于所述的步骤2中的分子片段编码:这些字符串是通过构建基于片段相似性的平衡二叉树来创建的,随后,该树被用来为每个片段生成二进制字符串,从而在延伸中生成表示分子的二进制字符串;附着点的顺序被视为每个片段的标识符;当组装树时,计算所有片段之间的相似度,然后以贪婪的自下而上的方式形成片段对,其中首先将最相似的两个片段配对,然后重复这一过程,将具有最相似片段的两对连接成一棵有四片叶子的新树;计算出的两个子树之间的相似度被测量为这些树的任意两个片段之间的最大相似度;重复连接过程,直到所有片段都连接到单个树中;A method for intelligently generating drug molecules based on reinforcement learning and docking according to claim 1, characterized in that the molecular fragment encoding in step 2: these character strings are created by constructing a balanced binary tree based on fragment similarity , the tree is then used to generate binary strings for each fragment, and thus in extensions, to generate binary strings representing molecules; the order of attachment points is considered as an identifier for each fragment; when assembling the tree, computing similarity between all fragments, and then form fragment pairs in a greedy bottom-up manner, where the two most similar fragments are paired first, and the process is repeated to join the two pairs with the most similar fragments into a tree A new tree with four leaves; the computed similarity between two subtrees is measured as the maximum similarity between any two fragments of these trees; the join process is repeated until all fragments are joined into a single tree;
    当每个片段都存储在二叉树中时,使用它为所有片段生成编码;从根到存储片段的叶子的路径确定每个片段的编码,对于树中的每个分支,如果向左,则在编码中附加一个1,如果向右,则会添加一个0;因此,编码中最右边的字符对应于最接近片段的分支。As each fragment is stored in a binary tree, use it to generate encodings for all fragments; the path from the root to the leaf where the fragment is stored determines the encoding of each fragment, for each branch in the tree, if left, in the encoding A 1 is appended to , or a 0 if to the right; thus, the rightmost character in the encoding corresponds to the branch closest to the fragment.
PCT/CN2021/107490 2021-07-09 2021-07-21 Drug molecule intelligent generation method based on reinforcement learning and docking WO2023279436A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022543606A JP7387962B2 (en) 2021-07-09 2021-07-21 Intelligent generation method of drug molecules based on reinforcement learning and docking

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110780433.3A CN113488116B (en) 2021-07-09 2021-07-09 Drug molecule intelligent generation method based on reinforcement learning and docking
CN202110780433.3 2021-07-09

Publications (1)

Publication Number Publication Date
WO2023279436A1 true WO2023279436A1 (en) 2023-01-12

Family

ID=77938422

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107490 WO2023279436A1 (en) 2021-07-09 2021-07-21 Drug molecule intelligent generation method based on reinforcement learning and docking

Country Status (3)

Country Link
JP (1) JP7387962B2 (en)
CN (1) CN113488116B (en)
WO (1) WO2023279436A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831646A (en) * 2023-11-29 2024-04-05 重庆大学 Molecular orientation intelligent generation method based on molecular fragment chemical space deconstruction

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115762661A (en) * 2022-11-21 2023-03-07 苏州沃时数字科技有限公司 Molecular design and structure optimization method, system, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110534164A (en) * 2019-09-26 2019-12-03 广州费米子科技有限责任公司 Drug molecule generation method based on deep learning
WO2020146356A1 (en) * 2019-01-07 2020-07-16 President And Fellows Of Harvard College Machine learning techniques for determining therapeutic agent dosages
CN112136181A (en) * 2018-03-29 2020-12-25 伯耐沃伦人工智能科技有限公司 Molecular design using reinforcement learning
WO2021026037A1 (en) * 2019-08-02 2021-02-11 Flagship Pioneering Innovations Vi, Llc Machine learning guided polypeptide design
CN112820361A (en) * 2019-11-15 2021-05-18 北京大学 Drug molecule generation method based on confrontation and imitation learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200168302A1 (en) * 2017-07-20 2020-05-28 The University Of North Carolina At Chapel Hill Methods, systems and non-transitory computer readable media for automated design of molecules with desired properties using artificial intelligence
US20210271968A1 (en) * 2018-02-09 2021-09-02 Deepmind Technologies Limited Generative neural network systems for generating instruction sequences to control an agent performing a task
US20210057050A1 (en) * 2019-08-23 2021-02-25 Insilico Medicine Ip Limited Workflow for generating compounds with biological activity against a specific biological target
CN110970099B (en) * 2019-12-10 2023-04-28 北京大学 Drug molecule generation method based on regularized variation automatic encoder
CN111508568B (en) * 2020-04-20 2023-08-29 腾讯科技(深圳)有限公司 Molecule generation method, molecule generation device, computer readable storage medium and terminal device
CN112116963A (en) * 2020-09-24 2020-12-22 深圳智药信息科技有限公司 Automated drug design method, system, computing device and computer-readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112136181A (en) * 2018-03-29 2020-12-25 伯耐沃伦人工智能科技有限公司 Molecular design using reinforcement learning
WO2020146356A1 (en) * 2019-01-07 2020-07-16 President And Fellows Of Harvard College Machine learning techniques for determining therapeutic agent dosages
WO2021026037A1 (en) * 2019-08-02 2021-02-11 Flagship Pioneering Innovations Vi, Llc Machine learning guided polypeptide design
CN110534164A (en) * 2019-09-26 2019-12-03 广州费米子科技有限责任公司 Drug molecule generation method based on deep learning
CN112820361A (en) * 2019-11-15 2021-05-18 北京大学 Drug molecule generation method based on confrontation and imitation learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831646A (en) * 2023-11-29 2024-04-05 重庆大学 Molecular orientation intelligent generation method based on molecular fragment chemical space deconstruction
CN117831646B (en) * 2023-11-29 2024-09-03 重庆大学 Molecular orientation intelligent generation method based on molecular fragment chemical space deconstruction

Also Published As

Publication number Publication date
CN113488116B (en) 2023-03-10
CN113488116A (en) 2021-10-08
JP7387962B2 (en) 2023-11-29
JP2023531846A (en) 2023-07-26

Similar Documents

Publication Publication Date Title
Zhang et al. Motif-based graph self-supervised learning for molecular property prediction
Maziarz et al. Learning to extend molecular scaffolds with structural motifs
Yin et al. Learning to mine aligned code and natural language pairs from stack overflow
WO2023279436A1 (en) Drug molecule intelligent generation method based on reinforcement learning and docking
CN111090461B (en) Code annotation generation method based on machine translation model
WO2022108664A1 (en) Automated merge conflict resolution with transformers
Wang et al. Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks
Yu et al. Adversarial active learning for the identification of medical concepts and annotation inconsistency
Fu et al. MOLER: Incorporate molecule-level reward to enhance deep generative model for molecule optimization
Wei et al. Probabilistic generative transformer language models for generative design of molecules
Hu et al. Deep learning methods for small molecule drug discovery: a survey
Prokhorov et al. Generating knowledge graph paths from textual definitions using sequence-to-sequence models
Bouchard-Côté et al. Improved reconstruction of protolanguage word forms
He et al. Neural unsupervised reconstruction of protolanguage word forms
Shingjergji et al. Relation extraction from DailyMed structured product labels by optimally combining crowd, experts and machines
Liu et al. Inverse Molecular Design with Multi-Conditional Diffusion Guidance
Krishna et al. Neural approaches for data driven dependency parsing in Sanskrit
Damani et al. Black box recursive translations for molecular optimization
Ko et al. Syntactic approach to extracting key elements of work modification cause in change-order documents
Alberga et al. DeLA-DrugSelf: Empowering multi-objective de novo design through SELFIES molecular representation
EP3977310B1 (en) Method for consolidating dynamic knowledge organization systems
Engkvist et al. Molecular De Novo Design Through Deep Generative Models
CN114842924A (en) Optimized de novo drug design method
Wang et al. Deep reinforcement learning and docking simulations for autonomous molecule generation in de novo drug design
Gaines et al. A deep molecular generative model based on multi-resolution graph variational Autoencoders

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022543606

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21948928

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21948928

Country of ref document: EP

Kind code of ref document: A1