CN115966266B - An anti-tumor molecule enhancement method based on graph neural network - Google Patents
An anti-tumor molecule enhancement method based on graph neural network Download PDFInfo
- Publication number
- CN115966266B CN115966266B CN202310015687.5A CN202310015687A CN115966266B CN 115966266 B CN115966266 B CN 115966266B CN 202310015687 A CN202310015687 A CN 202310015687A CN 115966266 B CN115966266 B CN 115966266B
- Authority
- CN
- China
- Prior art keywords
- molecules
- molecule
- tumor
- graph
- optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000000259 anti-tumor effect Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 15
- 238000005457 optimization Methods 0.000 claims abstract description 38
- 230000002787 reinforcement Effects 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 13
- 238000012986 modification Methods 0.000 claims abstract description 6
- 230000004048 modification Effects 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 41
- 239000011159 matrix material Substances 0.000 claims description 24
- 239000000126 substance Chemical group 0.000 claims description 20
- 229940079593 drug Drugs 0.000 claims description 15
- 239000003814 drug Substances 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 10
- 238000003062 neural network model Methods 0.000 claims description 9
- 238000005728 strengthening Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 3
- 239000000376 reactant Substances 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000008685 targeting Effects 0.000 claims 1
- 238000013461 design Methods 0.000 abstract description 8
- 238000011160 research Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 239000002246 antineoplastic agent Substances 0.000 description 2
- 229940041181 antineoplastic drug Drugs 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 229960000074 biopharmaceutical Drugs 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 230000000144 pharmacologic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Processing (AREA)
Abstract
本发明设计一种基于图神经网络的抗肿瘤分子强化学习方法,该方法包含以下步骤:步骤1:根据数据库中的分子标签将分子分为抗肿瘤(阳性)、非抗肿瘤(阴性)类别,步骤2:将所得图输入提出的抗肿瘤分子强化模型,根据阳性分子和阴性分子的不同性质学习图的隐式表示,获得抗肿瘤分子的局部结构特征,用于分子的一步生成和优化,步骤3:施加约束,进行目标优化,确保对分子强化过程中分子的类药性,步骤4:将获得的局部分子结构代入进行抗肿瘤分子的结构修改、优化,步骤5:使用已有分子性质合理检测工具,判断可合成性,输出合理分子,而针对不合理分子,进一步反向优化,步骤6:获得合理的新型抗肿瘤分子,任务结束。
The present invention designs an anti-tumor molecule reinforcement learning method based on graph neural network. The method includes the following steps: Step 1: Divide molecules into anti-tumor (positive) and non-anti-tumor (negative) categories according to the molecular labels in the database. Step 2: Input the obtained graph into the proposed anti-tumor molecule enhancement model, learn the implicit representation of the graph according to the different properties of positive molecules and negative molecules, and obtain the local structural characteristics of the anti-tumor molecules for one-step generation and optimization of molecules. Step 3: Apply constraints and perform target optimization to ensure the drug-like properties of the molecules during the molecule enhancement process. Step 4: Substitute the obtained local molecular structure into the structural modification and optimization of the anti-tumor molecule. Step 5: Use the existing molecular properties to reasonably detect The tool determines the synthesizability, outputs reasonable molecules, and further reversely optimizes unreasonable molecules. Step 6: Obtain reasonable new anti-tumor molecules, and the task is over.
Description
技术领域Technical field
本发明涉及一种基于图神经网络的药物分子强化技术,属于抗肿瘤药物分子化学研究和图神经网络强化学习技术领域。The invention relates to a drug molecule reinforcement technology based on graph neural network, and belongs to the technical fields of anti-tumor drug molecular chemistry research and graph neural network reinforcement learning technology.
背景技术Background technique
描述与本发明最接近的现有技术的状况和存在的问题。Describe the state of the art and existing problems that are closest to the present invention.
药物分子的优化、新药的研究对于肿瘤的治疗至关重要。药物分子优化的目标是为了强化特定方向上更加理想的生物学效应,同时保证生物药物学上的可接受性。目前的问题是开发高效药物的高时间与金钱成本。目前的药物研究领域,传统的策略是基于已有的化合物库进行筛选。但由于现有化合物库里分子的结构多样性有限,并且研究机构早已多次筛选过,所以在这上面再进行高效药物分子的发现与创新已经越来越有挑战性。The optimization of drug molecules and the research of new drugs are crucial to the treatment of tumors. The goal of drug molecule optimization is to enhance more desirable biological effects in a specific direction while ensuring biopharmaceutical acceptability. The current problem is the high cost of time and money in developing highly effective drugs. In the current field of drug research, the traditional strategy is to screen based on existing compound libraries. However, due to the limited structural diversity of molecules in existing compound libraries and the fact that research institutions have screened them many times, it has become increasingly challenging to discover and innovate highly effective drug molecules.
随着机器学习的发展,它在药物发现领域的作用越来越大,相较于传统的筛选策略,将机器学习与分子研究进行融合对任务的处理将更具有有效性与可扩展性。With the development of machine learning, its role in the field of drug discovery is growing. Compared with traditional screening strategies, the integration of machine learning and molecular research will be more effective and scalable in processing tasks.
目前,该领域相关人员开辟了一种“从头分子设计技术”——基于原子、基于片段、基于反应的分子设计方法,通过分析计算从零开始生成具有高有效性的新分子。它的设计想法是:摒弃传统的粗暴的直接筛选方法,平衡全局与局部,探索重点特征,基于线索构建目标分子。其弊端是,这种方法需要从零开始的高手动化生成与分子架构;同时,明确的设计目标和规范化的设计原则对于结果的可解释性也是必须的。同时,由于从头分子设计基于片段而言,所以生成分子的质量与前期的操作也不可分割。总之,从头设计方法目前的突出挑战是高计算量、低可解释性,生成结果对前期选择的高度依赖性。Currently, relevant personnel in this field have developed a "de novo molecule design technology" - an atom-based, fragment-based, reaction-based molecular design method that generates new molecules with high effectiveness from scratch through analytical calculations. Its design idea is to abandon the traditional crude direct screening method, balance the global and local aspects, explore key features, and build target molecules based on clues. The disadvantage is that this method requires high-level manual generation and molecular architecture from scratch; at the same time, clear design goals and standardized design principles are also necessary for the interpretability of the results. At the same time, since de novo molecular design is based on fragments, the quality of the generated molecules is also inseparable from the previous operations. In short, the current outstanding challenges of de novo design methods are high computational complexity, low interpretability, and high dependence of generated results on early choices.
发明内容Contents of the invention
技术问题:争对以上问题,本发明提出了一种基于图神经网络的抗肿瘤药物分子强化学习方法。构造分子抗肿瘤性加强图神经网络模型,基于药物分子进行特征提取与分子性质分类;通过修改特征观察分类结果从而提取关键特征。利用关键特征的分析在原来分子结构基础上进行化学强化与修改,最后得到具有更强目标特性的新分子,从而提高分子生成的效率与可解释性,降低抗肿瘤分子药物的研究开发难度与开发周期、成本。Technical problem: To address the above problems, the present invention proposes a reinforcement learning method for anti-tumor drug molecules based on graph neural network. Construct a molecular anti-tumor enhanced graph neural network model to perform feature extraction and molecular property classification based on drug molecules; observe the classification results by modifying the features to extract key features. Use the analysis of key features to chemically strengthen and modify the original molecular structure, and finally obtain new molecules with stronger target properties, thereby improving the efficiency and interpretability of molecule generation and reducing the difficulty and development of anti-tumor molecular drugs. Cycle time and cost.
技术方案:为实现本发明的目的,本发明所采用的技术方案是:一种基于图神经网络的抗肿瘤分子强化方法,该方法包括以下步骤:Technical solution: In order to achieve the purpose of the present invention, the technical solution adopted by the present invention is: an anti-tumor molecule enhancement method based on graph neural network, which method includes the following steps:
步骤1:根据数据库中的分子标签将分子分为抗肿瘤(阳性)、非抗肿瘤(阴性)类别。将每个输入分子描述为无向图(图矩阵),其中节点和边分别对应于原子和化学键。构建图神经网络(Graph neural networks)模型,以生成分子的化学稳定性和分子类药性为损失函数,并预训练。Step 1: Classify molecules into anti-tumor (positive) and non-anti-tumor (negative) categories based on molecular tags in the database. Describe each input molecule as an undirected graph (graph matrix), where nodes and edges correspond to atoms and chemical bonds, respectively. Construct a graph neural network model, use the chemical stability and drug-like properties of the generated molecules as the loss function, and pre-train.
步骤2:将所得图输入提出的抗肿瘤分子强化模型,根据阳性分子和阴性分子的不同性质学习图的隐式表示,获得抗肿瘤分子的局部结构特征,用于分子的一步生成和优化。Step 2: Input the obtained graph into the proposed anti-tumor molecule enhancement model, learn the implicit representation of the graph based on the different properties of positive molecules and negative molecules, and obtain the local structural characteristics of the anti-tumor molecules for one-step generation and optimization of molecules.
步骤3:施加约束,进行目标优化,确保对分子强化过程中分子的类药性。Step 3: Apply constraints and perform target optimization to ensure the drug-like properties of the molecules during the molecule enhancement process.
步骤4:将获得的局部分子结构代入进行抗肿瘤分子的结构修改、优化。Step 4: Substitute the obtained local molecular structure into the structural modification and optimization of anti-tumor molecules.
步骤5:使用已有分子性质合理检测工具,判断可合成性,输出合理分子。而针对不合理分子,进一步反向优化。Step 5: Use existing molecular property reasonable detection tools to determine the synthesizability and output reasonable molecules. For unreasonable molecules, further reverse optimization is performed.
步骤6:获得合理的新型抗肿瘤分子,任务结束。Step 6: Obtain reasonable new anti-tumor molecules and the task is over.
进一步的,步骤(1)中,根据数据库中的分子标签将分子分为抗肿瘤(阳性)、非抗肿瘤(阴性)类别。将每个输入分子描述为无向图(图矩阵),其中节点和边分别对应于原子和化学键。构建图神经网络(Graph neural networks)模型并预训练,方法如下:Further, in step (1), the molecules are classified into anti-tumor (positive) and non-anti-tumor (negative) categories according to the molecular tags in the database. Describe each input molecule as an undirected graph (graph matrix), where nodes and edges correspond to atoms and chemical bonds, respectively. Construct a graph neural network model and pre-train it as follows:
(101)以图的方式输入分子,包括G=(V,E,X),其中G表示输入分子,V表示分子的原子,为1×n的One-hot编码格式,E表示分子化学键,为一个n×n大小的邻接矩阵,X为分子的各个原子的特征,为一个n×n大小的矩阵。将每个输入分子描述为无向图(图矩阵),也就是G。(101) Input molecules in the form of a graph, including G = (V, E, An n×n size adjacency matrix, X is the characteristics of each atom of the molecule, and is an n×n size matrix. Describe each input molecule as an undirected graph (graph matrix), which is G.
(102)根据数据库中的分子标签将分子分为抗肿瘤(阳性)、非抗肿瘤(阴性)类别,用G+或G-表示。(102) The molecules are classified into anti-tumor (positive) and non-anti-tumor (negative) categories according to the molecular tags in the database, represented by G + or G − .
(103)构建图神经网络(Graph neural networks)模型,其输入为原始分子无向图G,输出为二分类概率矩阵P,以生成分子的化学稳定性和分子类药性为损失函数,对该GNN模型进行预训练。(103) Construct a graph neural network model, whose input is the original molecular undirected graph G, and the output is a binary classification probability matrix P. The chemical stability and drug-like properties of the generated molecules are used as the loss function. For this GNN The model is pre-trained.
进一步的,步骤(2)中,将所得图输入提出的抗肿瘤分子强化模型,根据阳性分子和阴性分子的不同性质学习图的隐式表示,获得抗肿瘤分子的局部结构特征,用于分子的一步生成和优化。方法如下:Further, in step (2), the obtained graph is input into the proposed anti-tumor molecule enhancement model, and the implicit representation of the graph is learned based on the different properties of positive molecules and negative molecules to obtain the local structural characteristics of the anti-tumor molecule, which can be used for molecule identification. Generate and optimize in one step. Methods as below:
(201)将所得到的分子G输入进特征提取模型f:G→F∈Rn×h提取出分子的隐式特征F(201) Input the obtained molecule G into the feature extraction model f:G→F∈R n×h to extract the implicit feature F of the molecule
(202)特征提取模型f:G→F∈Rn×h包括:(1)(202) The feature extraction model f:G→F∈R n×h includes: (1)
其中表示在第l-1层获取的各个分子的特征,/>为矩阵/>第j行内容,N(v)为v的邻居节点,UPDATE为每一层的更新函数,AGG为聚合函数,READOUT为读出函数,在经过l次迭代后获取分子的特征in Represents the characteristics of each molecule obtained at layer l-1,/> is matrix/> In the content of line j, N(v) is the neighbor node of v, UPDATE is the update function of each layer, AGG is the aggregation function, READOUT is the readout function, and the characteristics of the molecule are obtained after l iterations
(2)以获得的hG作为输入,通过一个MLP模型获取分子的隐式特征F。(2) The obtained h G is used as input to obtain the implicit feature F of the molecule through an MLP model.
(203)将得到的隐式特征F输入分类预测网络c:F→Pout∈Rn×2,得到二分类概率结果矩阵Pu。(203) Input the obtained implicit feature F into the classification prediction network c:F→P out ∈R n×2 to obtain the two-class probability result matrix P u .
(204)对得到的分子结构进行部分变换,带入特征提取模型,得到变换后的隐式特征F',将得到的隐式特征F'输入分类预测网络,得到二分类概率结果矩阵Pn。(204) Partially transform the obtained molecular structure and bring it into the feature extraction model to obtain the transformed implicit feature F′. The obtained implicit feature F′ is input into the classification prediction network to obtain the binary classification probability result matrix P n .
(205)将步骤(203)和步骤(204)分别得到的概率结果输入概率波动函数PFF(Fluctuation probability function),带入MEAS(measure)函数,分析PFF的计算结果,(205) Input the probability results obtained in steps (203) and (204) into the probability fluctuation function PFF (Fluctuation probability function), bring it into the MEAS (measure) function, and analyze the calculation results of PFF,
PFF=||pu-pn||PFF=||p u -p n ||
Sout=MEAS{PFF(Pu,Pn)}S out =MEAS{PFF(P u ,P n )}
提取更改后影响概率波动程度大的分子局部结构特征Sout,将此作为输出。Extract the local structure feature S out of the molecule that has a large impact on the probability fluctuation after the change, and use this as the output.
进一步的,步骤(3)中,施加约束,进行目标优化,确保对分子强化过程中分子的类药性,方法如下:Further, in step (3), constraints are imposed and target optimization is performed to ensure the drug-like properties of the molecules during the molecular strengthening process. The method is as follows:
(301)在训练抗肿瘤分子强化模型时在MEAS函数中利用QED数值(301) Utilizing QED values in the MEAS function when training anti-tumor molecule reinforcement models
进行约束to constrain
MEAS:Sout=RF{PFF(Pu,Pn)+γQED}MEAS: S out =RF{PFF(P u ,P n )+γQED}
其中,QED采用RDKit计算。RF函数将概率波动函数计算值经过QED约束后的结果映射到分子结构上从而得出局部分子结构,从而确保生成的局部特征对应的drug-likeness药物相似性。Among them, QED is calculated using RDKit. The RF function maps the calculated value of the probability fluctuation function after QED constraints to the molecular structure to obtain the local molecular structure, thus ensuring the drug-likeness of the drug-likeness corresponding to the generated local features.
进一步的,步骤(4)中,将获得的局部分子结构代入进行抗肿瘤分子的结构修改、优化方法如下:Further, in step (4), the obtained local molecular structure is substituted into the structure modification and optimization method of the anti-tumor molecule as follows:
(401)输入已有阳性分子包括G=(V,E,X),其中G表示输入分子,V表示分子的原子,E表示分子化学键,X为分子的各个原子的特征,从而以图的方式获取输入的分子。(401) The input of existing positive molecules includes G=(V,E,X), where G represents the input molecule, V represents the atoms of the molecule, E represents the chemical bond of the molecule, and Get the input numerator.
(402)用上述训练出的模型获取的分子局部结构特征对分子进行强化:通过自动迭代优化方法,不断寻找最有可能的原子或化学键相连接,从而修改分子的结构,将之前获取的分子局部结构特征施加到分子,实现逐步构建抗肿瘤性能更好的分子。(402) Use the local structural characteristics of the molecule obtained by the above-trained model to strengthen the molecule: through the automatic iterative optimization method, continuously find the most likely atoms or chemical bonds to connect, thereby modifying the structure of the molecule and converting the previously obtained local molecule Structural features are applied to molecules, enabling the progressive construction of molecules with better anti-tumor properties.
进一步的,步骤(5)中,使用已有分子性质合理检测工具,判断可合成性,输出合理分子。而针对不合理分子,进一步反向优化,方法如下:Further, in step (5), existing reasonable molecular property detection tools are used to determine the synthesizability and output reasonable molecules. For unreasonable molecules, further reverse optimization is performed as follows:
(501)建立已有分子性质合理检测工具p:G=(V,E,X)→V∈R,分析分子性质合理性并评估化学可行性。(501) Establish an existing reasonable molecular property detection tool p:G=(V,E,X)→V∈R to analyze the rationality of molecular properties and evaluate chemical feasibility.
(502)模型p主要包括state-action模块A以及reward模块Q组成,在任何时间步t,模块A的输入为状态,输出为动作,该动作为在所有初始反应物的特征表示空间中定义的张量。模块Q通过Q网络计算状态的最优价值;Actor利用这个最优价值迭代更新策略函数的参数,进而选择动作,并得到反馈和新的状态。环境将状态、最佳反应模板和动作作为参考,计算确定回合是否结束。最终通过Softmax层输出评估值V以表示可行性得分。(502) Model p mainly consists of state-action module A and reward module Q. At any time step t, the input of module A is the state and the output is the action. The action is defined in the feature representation space of all initial reactants. Tensor. Module Q calculates the optimal value of the state through the Q network; the Actor uses this optimal value to iteratively update the parameters of the policy function, then selects actions, and obtains feedback and new states. The environment uses status, best response templates, and actions as reference to calculate whether the turn is over. Finally, the evaluation value V is output through the Softmax layer to represent the feasibility score.
进一步的,步骤(6)中,获得合理的新型抗肿瘤分子,任务结束,方法如下:Further, in step (6), a reasonable new anti-tumor molecule is obtained and the task is completed. The method is as follows:
(601)通过权利要求6所示,在反馈标准惩罚的优化中,生成的分子显然无法成为现实可用的药物,这说明通过抗肿瘤分子强化模型可以生成具有强抗肿瘤的特征,但无法保证生成的分子可以在现实生活中得以制造或稳定存在。这突出了在使用强化学习进行抗肿瘤分子强化时,需要使用多目标优化进行奖惩。(601) As shown in claim 6, in the optimization of the feedback standard penalty, the generated molecules obviously cannot become realistically available drugs. This shows that the model with strong anti-tumor characteristics can be generated by strengthening the anti-tumor molecule, but the generation cannot be guaranteed. molecules can be made or stabilized in real life. This highlights the need to use multi-objective optimization for rewards and penalties when using reinforcement learning for anti-tumor molecule enhancement.
(602)重新强调变量定义:(602) Re-emphasis on variable definition:
Xv:维度RDV,节点v的特征向量X v : Dimension R DV , feature vector of node v
hv:维度RDV,节点v的状态向量h v : Dimension R DV , state vector of node v
xv1,v2:维度RDE,边(V1,V2)的特征向量x v1,v2 : dimension R DE , eigenvector of edge (V 1 , V 2 )
在节点操作过程中,需要定义节点状态更新函数使得节点状态得以迭代稳定。而对于During the node operation process, it is necessary to define the node status update function This allows the node status to be iteratively stabilized. And for
节点的状态转化函数,对于节点V,其状态向量的转化可以表示为:The state transformation function of the node. For node V, the transformation of its state vector can be expressed as:
(603)此处着重强调反向传播过程:(603) Here the emphasis is on the backpropagation process:
在这一步骤,我们按照反向传播的步骤可以求得参数的梯度,然后使用梯度下降法进行In this step, we can find the gradient of the parameters by following the steps of backpropagation, and then use the gradient descent method.
优化。迭代过程为:optimization. The iterative process is:
……
Δwij=ηδjxi Δw ij =ηδ j x i
接着,利用Backpropagation方式,利用pytorch框架下的自动后向传播,不断迭代优化分子结构,使得生成的强化分子模型满足抗肿瘤和分子合理性等多重要求。Then, the Backpropagation method is used, and the automatic backpropagation under the pytorch framework is used to continuously and iteratively optimize the molecular structure, so that the generated enhanced molecular model meets multiple requirements such as anti-tumor and molecular rationality.
有益效果:与现有技术相比,本方法的技术方案具有以下有益技术效果:Beneficial effects: Compared with the existing technology, the technical solution of this method has the following beneficial technical effects:
1、通过图神经网络的方式学习药物分子的特征,利用图神经网络与强化学习的方法对于给定分子进行强化,将强化分子的工作转移给机器来处理,相较于传统的人工强化方法大大提高了效率。1. Learn the characteristics of drug molecules through graph neural networks, use graph neural networks and reinforcement learning methods to strengthen given molecules, and transfer the work of strengthening molecules to machines. Compared with traditional manual strengthening methods, it is greatly improved. Improved efficiency.
2、本方法提到的基于图神经网络与强化学习进行强化已有分子的方法相较于其他的根据给定分子的模型从头开始进行分子的生成方法大大提高了效率,并且一定程度上提高了准确性。2. The method mentioned in this method of strengthening existing molecules based on graph neural network and reinforcement learning greatly improves the efficiency compared to other methods of generating molecules from scratch based on the model of a given molecule, and to a certain extent improves the efficiency accuracy.
3、本方法生成的分子具有更高的可解释性,可视化更好,有利于了解生成的强化后的药物分子的特征以及强化的缘由,为相关药物研究人员提高了更多的便利。3. The molecules generated by this method have higher interpretability and better visualization, which is conducive to understanding the characteristics of the generated enhanced drug molecules and the reasons for the enhancement, and provides more convenience for relevant drug researchers.
4、在生成过程中采用多重约束与性质检测保证生成分子的化学合理性与药理性。4. Use multiple constraints and property detection during the generation process to ensure the chemical rationality and pharmacological properties of the generated molecules.
附图说明Description of the drawings
图1为本发明的实现步骤流程图;Figure 1 is a flow chart of implementation steps of the present invention;
图2为抗肿瘤分子强化模型框架图;Figure 2 is a framework diagram of the anti-tumor molecule enhancement model;
图3为实例分子结构图;Figure 3 is an example molecular structure diagram;
图4为某实例分子优化结构结果图。Figure 4 is a diagram of the structure optimization results of an example molecule.
具体实施方式Detailed ways
为了加深对本发明的理解,下面结合附图对本实施例做详细的说明。In order to deepen the understanding of the present invention, this embodiment will be described in detail below with reference to the accompanying drawings.
实施例:下面以取ChEMBL数据库分子为实例,结合附图,对本发明的技术方案进行详细的说明。Examples: Taking the molecules in the ChEMBL database as an example, the technical solution of the present invention will be described in detail in conjunction with the accompanying drawings.
一种基于图神经网络的抗肿瘤分子强化方法,该方法包括以下步骤:An anti-tumor molecule enhancement method based on graph neural network, the method includes the following steps:
步骤(1):根据数据库中的分子标签将分子分为抗肿瘤(阳性)、非抗肿瘤(阴性)类别。将每个输入分子描述为无向图(图矩阵),其中节点和边分别对应于原子和化学键。构建图神经网络(Graph neural networks)模型,以生成分子的化学稳定性和分子类药性为损失函数,并预训练,方法如下:Step (1): Classify molecules into anti-tumor (positive) and non-anti-tumor (negative) categories based on molecular tags in the database. Describe each input molecule as an undirected graph (graph matrix), where nodes and edges correspond to atoms and chemical bonds, respectively. Construct a graph neural network model, use the chemical stability and drug-like properties of the generated molecules as the loss function, and pre-train. The method is as follows:
(101)以图的方式输入分子,包括G=(V,E,X),其中G表示输入分子,V表示分子的原子,为1×n的One-hot编码格式,E表示分子化学键,为一个n×n大小的邻接矩阵,X为分子的各个原子的特征,为一个n×n大小的矩阵。将每个输入分子描述为无向图(图矩阵),也就是G。(101) Input molecules in the form of a graph, including G = (V, E, An n×n size adjacency matrix, X is the characteristics of each atom of the molecule, and is an n×n size matrix. Describe each input molecule as an undirected graph (graph matrix), which is G.
(102)根据数据库中的分子标签将分子分为抗肿瘤(阳性)、非抗肿瘤(阴性)类别,用G+或G-表示。(102) The molecules are classified into anti-tumor (positive) and non-anti-tumor (negative) categories according to the molecular tags in the database, represented by G + or G − .
(103)构建图神经网络(Graph neural networks)模型,以生成分子的化学稳定性和分子类药性为损失函数,模型结构如图2所示,其输入为原始分子无向图G,输出为二分类(103) Construct a graph neural network model, using the chemical stability and drug-like properties of the generated molecules as the loss function. The model structure is shown in Figure 2. Its input is the original molecular undirected graph G, and the output is binary Classification
概率矩阵P,对该GNN模型进行预训练。Probability matrix P, pre-train the GNN model.
步骤(2):将所得图输入提出的抗肿瘤分子强化模型,根据阳性分子和阴性分子的不同性质学习图的隐式表示,获得抗肿瘤分子的局部结构特征,用于分子的一步生成和优化。方法如下:Step (2): Input the obtained graph into the proposed anti-tumor molecule enhancement model, learn the implicit representation of the graph based on the different properties of positive molecules and negative molecules, and obtain the local structural characteristics of the anti-tumor molecules for one-step generation and optimization of molecules. . Methods as below:
(201)将所得到的分子G输入进特征提取模型f:G→F∈Rn×h提取出分子的隐式特征F(201) Input the obtained molecule G into the feature extraction model f:G→F∈R n×h to extract the implicit feature F of the molecule
(202)特征提取模型f:G→F∈Rn×h包括:(1)(202) The feature extraction model f:G→F∈R n×h includes: (1)
其中表示在第l-1层获取的各个分子的特征,/>为矩阵/>第j行内容,N(v)为v的邻居节点,UPDATE为每一层的更新函数,AGG为聚合函数,READOUT为读出函数,在经过l次迭代后获取分子的特征in Represents the characteristics of each molecule obtained at layer l-1,/> is matrix/> In the content of line j, N(v) is the neighbor node of v, UPDATE is the update function of each layer, AGG is the aggregation function, READOUT is the readout function, and the characteristics of the molecule are obtained after l iterations
(2)以获得的hG作为输入,通过一个MLP模型获取分子的隐式特征F。(2) The obtained h G is used as input to obtain the implicit feature F of the molecule through an MLP model.
(203)将得到的隐式特征F输入分类预测网络c:F→Pout∈Rn×2,得到二分类概率结果矩阵Pu。(203) Input the obtained implicit feature F into the classification prediction network c:F→P out ∈R n×2 to obtain the two-class probability result matrix P u .
(204)对得到的分子结构进行部分变换,带入特征提取模型,得到变换后的隐式特征F',将得到的隐式特征F'输入分类预测网络,得到二分类概率结果矩阵Pn。(204) Partially transform the obtained molecular structure and bring it into the feature extraction model to obtain the transformed implicit feature F′. The obtained implicit feature F′ is input into the classification prediction network to obtain the binary classification probability result matrix P n .
(205)将步骤(203)和步骤(204)分别得到的概率结果输入概率波动函数PFF(Fluctuation probability function),带入MEAS(measure)函数,分析PFF的计算结果,(205) Input the probability results obtained in steps (203) and (204) into the probability fluctuation function PFF (Fluctuation probability function), bring it into the MEAS (measure) function, and analyze the calculation results of PFF,
PFF=||pu-pn||PFF=||p u -p n ||
Sout=MEAS{PFF(Pu,Pn)}S out =MEAS{PFF(P u ,P n )}
提取更改后影响概率波动程度大的分子局部结构特征Sout,将此作为输出。Extract the local structure feature S out of the molecule that has a large impact on the probability fluctuation after the change, and use this as the output.
步骤(3):施加约束,进行目标优化,确保对分子强化过程中分子的类药性,方法如下:Step (3): Apply constraints and perform target optimization to ensure the drug-like properties of the molecules during the molecular strengthening process. The method is as follows:
(301)在训练抗肿瘤分子强化模型时在MEAS函数中利用QED数值(301) Utilizing QED values in the MEAS function when training anti-tumor molecule reinforcement models
进行约束to constrain
MEAS:Sout=RF{PFF(Pu,Pn)+γQED}MEAS: S out =RF{PFF(P u ,P n )+γQED}
其中,QED采用RDKit计算。RF函数将概率波动函数计算值经过QED约束后的结果映射到分子结构上从而得出局部分子结构,从而确保生成的局部特征对应的drug-likeness药物相似性。Among them, QED is calculated using RDKit. The RF function maps the calculated value of the probability fluctuation function after QED constraints to the molecular structure to obtain the local molecular structure, thus ensuring the drug-likeness of the drug-likeness corresponding to the generated local features.
步骤(4):将获得的局部分子结构代入进行抗肿瘤分子的结构修改、优化。方法如下:Step (4): Substitute the obtained local molecular structure into the structural modification and optimization of anti-tumor molecules. Methods as below:
(401)输入已有阳性分子包括G=(V,E,X),其中G表示输入分子,V表示分子的原子,E表示分子化学键,X为分子的各个原子的特征,从而以图的方式获取输入的分子。(401) The input of existing positive molecules includes G=(V,E,X), where G represents the input molecule, V represents the atoms of the molecule, E represents the chemical bond of the molecule, and Get the input numerator.
(402)用上述训练出的模型获取的分子局部结构特征对分子进行强化:通过自动迭代优化方法,不断寻找最有可能的原子或化学键相连接,从而修改分子的结构,将之前获取的分子局部结构特征施加到分子,实现逐步构建抗肿瘤性能更好的分子。(402) Use the local structural characteristics of the molecule obtained by the above-trained model to strengthen the molecule: through the automatic iterative optimization method, continuously find the most likely atoms or chemical bonds to connect, thereby modifying the structure of the molecule and converting the previously obtained local molecule Structural features are applied to molecules, enabling the progressive construction of molecules with better anti-tumor properties.
步骤(5):使用已有分子性质合理检测工具,判断可合成性,输出合理分子。而针对不合理分子,进一步反向优化,方法如下:Step (5): Use existing molecular property reasonable detection tools to determine the synthesizability and output reasonable molecules. For unreasonable molecules, further reverse optimization is performed as follows:
(501)建立已有分子性质合理检测工具p:G=(V,E,X)→V∈R,分析分子性质合理性并评(501) Establish a reasonable detection tool for existing molecular properties p:G=(V,E,X)→V∈R, analyze the rationality of molecular properties and evaluate
估化学可行性Evaluate chemical feasibility
(502)模型p主要包括state-action模块A以及reward模块Q组成,在任何时间步t,模块A的输入为状态,输出为动作,该动作为在所有初始反应物的特征表示空间中定义的张量。模块Q通过Q网络计算状态的最优价值;Actor利用这个最优价值迭代更新策略函数的参数,进而选择动作,并得到反馈和新的状态。环境将状态、最佳反应模板和动作作为参考,计算确定回合是否结束。最终通过Softmax层输出评估值V以表示可行性得分。(502) Model p mainly consists of state-action module A and reward module Q. At any time step t, the input of module A is the state and the output is the action. The action is defined in the feature representation space of all initial reactants. Tensor. Module Q calculates the optimal value of the state through the Q network; the Actor uses this optimal value to iteratively update the parameters of the policy function, then selects actions, and obtains feedback and new states. The environment uses status, best response templates, and actions as reference to calculate whether the turn is over. Finally, the evaluation value V is output through the Softmax layer to represent the feasibility score.
步骤(6):获得合理的新型抗肿瘤分子,任务结束,方法如下:Step (6): Obtain reasonable new anti-tumor molecules. The task is over. The method is as follows:
(601)通过权利要求6所示,在反馈标准惩罚的优化中,生成的分子显然无法成为现实可用的药物,这说明通过抗肿瘤分子强化模型可以生成具有强抗肿瘤的特征,但无法保证生成的分子可以在现实生活中得以制造或稳定存在。这突出了在使用强化学习进行抗肿瘤分子强化时,需要使用多目标优化进行奖惩。(601) As shown in claim 6, in the optimization of the feedback standard penalty, the generated molecules obviously cannot become realistically available drugs. This shows that the model with strong anti-tumor characteristics can be generated by strengthening the anti-tumor molecule, but the generation cannot be guaranteed. molecules can be made or stabilized in real life. This highlights the need to use multi-objective optimization for rewards and penalties when using reinforcement learning for anti-tumor molecule enhancement.
(602)重新强调变量定义:(602) Re-emphasis on variable definition:
Xv:维度RDV,节点v的特征向量X v : Dimension R DV , feature vector of node v
hv:维度RDV,节点v的状态向量h v : Dimension R DV , state vector of node v
xv1,v2:维度RDE,边(V1,V2)的特征向量x v1,v2 : dimension R DE , eigenvector of edge (V 1 , V 2 )
在节点操作过程中,需要定义节点状态更新函数使得节点状态得以迭代稳定。而对于节点的状态转化函数,对于节点V,其状态向量的转化可以表示为:During the node operation process, it is necessary to define the node status update function This allows the node status to be iteratively stabilized. As for the node's state transformation function, for node V, the transformation of its state vector can be expressed as:
(603)此处着重强调反向传播过程:(603) Here the emphasis is on the backpropagation process:
在这一步骤,我们按照反向传播的步骤可以求得参数的梯度,然后使用梯度下降法进行优化。迭代过程为:In this step, we can find the gradient of the parameters by following the steps of backpropagation, and then use the gradient descent method for optimization. The iterative process is:
……
Δwij=ηδjXi Δw ij = ηδ j X i
接着,利用Backpropagation方式,利用pytorch框架下的自动后向传播,不断迭代优化分子结构,使得生成的强化分子模型满足抗肿瘤和分子合理性等多重要求。ChEMBL数Then, the Backpropagation method is used, and the automatic backpropagation under the pytorch framework is used to continuously and iteratively optimize the molecular structure, so that the generated enhanced molecular model meets multiple requirements such as anti-tumor and molecular rationality. ChEMBL number
据库中某分子优化前后如图4所示。Figure 4 shows before and after optimization of a certain molecule in the database.
需要说明的是上述实施例,并非用来限定本发明的保护范围,在上述技术方案的基础上所作出的等同变换或替代均落入本发明权利要求所保护的范围。It should be noted that the above-mentioned embodiments are not used to limit the scope of protection of the present invention. Equivalent transformations or substitutions made on the basis of the above-mentioned technical solutions all fall within the scope of protection of the claims of the present invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310015687.5A CN115966266B (en) | 2023-01-06 | 2023-01-06 | An anti-tumor molecule enhancement method based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310015687.5A CN115966266B (en) | 2023-01-06 | 2023-01-06 | An anti-tumor molecule enhancement method based on graph neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115966266A CN115966266A (en) | 2023-04-14 |
CN115966266B true CN115966266B (en) | 2023-11-17 |
Family
ID=87357842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310015687.5A Active CN115966266B (en) | 2023-01-06 | 2023-01-06 | An anti-tumor molecule enhancement method based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115966266B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118280482B (en) * | 2024-06-04 | 2024-08-23 | 浙江大学 | Method and system for predicting antioxidant molecules based on deep learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898730A (en) * | 2020-06-17 | 2020-11-06 | 西安交通大学 | A structure optimization design method using graph convolutional neural network structure acceleration |
CN112820361A (en) * | 2019-11-15 | 2021-05-18 | 北京大学 | A drug molecule generation method based on adversarial imitation learning |
CN113140267A (en) * | 2021-03-25 | 2021-07-20 | 北京化工大学 | Directional molecule generation method based on graph neural network |
CN113327651A (en) * | 2021-05-31 | 2021-08-31 | 东南大学 | Molecular diagram generation method based on variational self-encoder and message transmission neural network |
CN114822718A (en) * | 2022-03-25 | 2022-07-29 | 云南大学 | Human oral bioavailability prediction method based on graph neural network |
CN115274007A (en) * | 2022-08-02 | 2022-11-01 | 殷越铭 | Generalizable and interpretable depth map learning method for discovering and optimizing drug lead compound |
CN115526246A (en) * | 2022-09-21 | 2022-12-27 | 吉林大学 | A Self-Supervised Molecular Classification Method Based on Deep Learning Model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102619861B1 (en) * | 2019-02-08 | 2024-01-04 | 오스모 랩스, 피비씨 | Systems and methods for predicting olfactory properties of molecules using machine learning |
-
2023
- 2023-01-06 CN CN202310015687.5A patent/CN115966266B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112820361A (en) * | 2019-11-15 | 2021-05-18 | 北京大学 | A drug molecule generation method based on adversarial imitation learning |
CN111898730A (en) * | 2020-06-17 | 2020-11-06 | 西安交通大学 | A structure optimization design method using graph convolutional neural network structure acceleration |
CN113140267A (en) * | 2021-03-25 | 2021-07-20 | 北京化工大学 | Directional molecule generation method based on graph neural network |
CN113327651A (en) * | 2021-05-31 | 2021-08-31 | 东南大学 | Molecular diagram generation method based on variational self-encoder and message transmission neural network |
CN114822718A (en) * | 2022-03-25 | 2022-07-29 | 云南大学 | Human oral bioavailability prediction method based on graph neural network |
CN115274007A (en) * | 2022-08-02 | 2022-11-01 | 殷越铭 | Generalizable and interpretable depth map learning method for discovering and optimizing drug lead compound |
CN115526246A (en) * | 2022-09-21 | 2022-12-27 | 吉林大学 | A Self-Supervised Molecular Classification Method Based on Deep Learning Model |
Non-Patent Citations (1)
Title |
---|
"A Review of Graph Neural Networks and Their Applications in Power Systems";Wenlong Liao.etc;《JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115966266A (en) | 2023-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111368920A (en) | A binary classification method based on quantum twin neural network and its face recognition method | |
CN112836739B (en) | Classification model establishment method and application based on dynamic joint distribution alignment | |
Menaga et al. | Deep learning: a recent computing platform for multimedia information retrieval | |
CN106650820A (en) | Matching recognition method of handwritten electrical component symbols and standard electrical component symbols | |
CN115966266B (en) | An anti-tumor molecule enhancement method based on graph neural network | |
CN117524353A (en) | Molecular large model based on multidimensional molecular information, construction method and application | |
CN118196393A (en) | Target detection method based on pyramid pooling graph neural network | |
Li et al. | Feature reconstruction and metric based network for few-shot object detection | |
Li et al. | Cascaded iterative transformer for jointly predicting facial landmark, occlusion probability and head pose | |
Garrido-Munoz et al. | A holistic approach for image-to-graph: application to optical music recognition | |
WO2023174064A1 (en) | Automatic search method, automatic-search performance prediction model training method and apparatus | |
Ye et al. | Learning cross-domain representations by vision transformer for unsupervised domain adaptation | |
Wang et al. | Weakly supervised object detection based on active learning | |
Zhang et al. | SCATT: Transformer tracking with symmetric cross-attention | |
Zhang et al. | Review on deep learning in feature selection | |
CN117637029A (en) | Antibody developability prediction method and device based on deep learning model | |
CN117540247A (en) | Comprehensive decision method, system and medium for preference learning based on graph neural network | |
Chen et al. | Revolutionizing graph classification generalization with an adaptive causality-enhanced framework | |
CN116343930A (en) | Metabolic dynamics and toxicity prediction method based on graph representation multitask learning | |
CN117196842A (en) | Stock feature analysis system and method based on causal relationship graph neural network | |
Bayoudh et al. | Hybrid-CT: a novel hybrid 2D/3D CNN-Transformer based on transfer learning and attention mechanisms for small object classification | |
CN112861882B (en) | A frequency adaptive image-text matching method and system | |
Zhao et al. | DSPformer: discovering semantic parts with token growth and clustering for zero-shot learning | |
Dutta et al. | Performance of Automated Machine Learning Based Neural Network Estimators for the Classification of PCOS | |
Zhao et al. | A small-scale data driven and graph neural network based toxicity prediction method of compounds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |