JP2023531846A

JP2023531846A - Intelligent Generation Method of Drug Molecules Based on Reinforcement Learning and Docking

Info

Publication number: JP2023531846A
Application number: JP2022543606A
Authority: JP
Inventors: 魏志強; 王茜; 劉昊; 李陽陽; 王卓亜
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-07-09
Filing date: 2021-07-21
Publication date: 2023-07-26
Anticipated expiration: 2041-07-21
Also published as: WO2023279436A1; CN113488116B; CN113488116A; JP7387962B2

Abstract

本発明は強化学習及びドッキングに基づく医薬品分子のインテリジェント生成方法に関し、医薬品化学及びコンピュータの技術分野に関し、前記方法は、医薬品設計のための仮想フラグメントコンビネーションライブラリを構築するステップ１）と、フラグメント類似性を計算して分子フラグメントコーディングを行うステップ２）と、強化学習のａｃｔｏｒ－ｃｒｉｔｉｃモデルに基づいて分子を生成して最適化するステップ３）とを含む。本発明の方法は、リード化合物を基に、検索対象の化学空間を絞り込む。強化学習のａｃｔｏｒ－ｃｒｉｔｉｃモデルにトランスフォーマーによるモデリングが使用されることによって、分子フラグメントの位置情報が導入され、フラグメントの分子での相対位置又は絶対位置の情報が保存され、並行訓練が実現される。また、報酬メカニズムが単層パーセプトロンモデルを作成することで、生成される分子の活性をさらに最適化させる。TECHNICAL FIELD The present invention relates to a method for intelligent generation of pharmaceutical molecules based on reinforcement learning and docking, and relates to the technical fields of medicinal chemistry and computation, said method includes step 1) of building a virtual fragment combination library for pharmaceutical design and fragment similarity analysis. to perform molecular fragment coding, and step 3) to generate and optimize molecules based on reinforcement learning actor-critical models. The method of the present invention narrows down the chemical space to be searched on the basis of lead compounds. Transformer modeling is used in the actor-critical model of reinforcement learning to introduce positional information of molecular fragments, preserve relative or absolute positional information of fragments in the molecule, and realize parallel training. In addition, the reward mechanism creates a monolayer perceptron model, further optimizing the activity of the generated molecules.

Description

本発明は医薬品化学及びコンピュータの技術分野に関し、具体的には、強化学習及びドッキングに基づく医薬品分子のインテリジェント生成方法に関する。 The present invention relates to the technical fields of medicinal chemistry and computers, and in particular to methods for intelligent generation of drug molecules based on reinforcement learning and docking.

医薬品化学の分野では、安全で効果的な化合物の設計や製造は鍵である。これは時間やお金がかかり、複雑で困難であり、複数のパラメータを最適化させるプロセスである。有望の化合物でも臨床試験において失敗していまうリスクが高く（＞９０％）、その結果、不要な資源浪費をもたらす。現在、１種の新薬を市販するまでには平均コストが１０億ドルを遥かに上回っており、発見から市販まで平均で１３年がかかる。医薬品の場合は、発見から商業的な生産までは時間がよりかかり、例えば、高エネルギー分子は２５年を必要とする。分子を発見するための重要なステップは計算研究又は合成と特徴付け用の候補を生成することである。これは非常に困難なタスクであり、可能な分子の化学空間が巨大であり、すなわち、潜在的な医薬品類似化合物の数が１０^２３～１０^６０種類であり、合成された全ての化合物の数が約１０^８個の桁であるためである。リピンスキーによる薬学における「５つの規則」などヒューリスティック手法が、可能な空間を絞り込むが、大きな課題に直面している。 In the field of medicinal chemistry, the design and manufacture of safe and effective compounds is key. This is a time consuming, expensive, complex and difficult, multi-parameter optimization process. Even promising compounds have a high risk (>90%) of failing in clinical trials, resulting in unnecessary resource waste. Currently, the average cost to bring a new drug to market is well over $1 billion, and it takes an average of 13 years from discovery to market. For pharmaceuticals, it takes longer from discovery to commercial production, for example, high-energy molecules require 25 years. A key step in discovering molecules is generating candidates for computational studies or synthesis and characterization. This is a very difficult task because the chemical space of possible molecules is huge, i.e., the number of potential pharmaceutical analogues is 10 ²³ to 10 ⁶⁰ and the total number of synthesized compounds is on the order of 10 ⁸ . Heuristic methods such as Lipinski's "five rules" in pharmacy narrow down the possible space, but face major challenges.

コンピュータ技術の革命により、ＡＩを使った創薬がトレンドになりつつある。従来、この目的を達成するために、定量的構造－活性関係（ＱＳＡＲ）、分子置換、分子シミュレーション、分子ドッキングなど、さまざまな計算モデルの組み合わせが用いられてきた。しかし、従来の方法は本質的に組み合わせられたものであり、多くの分子の不安定性や合成不可能性を招くことが多い。近年、深層学習モデルに基づいて薬物に類似した化合物を設計するための生成モデルが多く登場しており、例えば、変分オートエンコーダによる分子生成法や、生成的敵対的ネットワークによる分子生成法などがある。しかし、現在の方法は候補化合物の生成速度、有効性や分子活性の面でまだ改良の余裕がある。 Due to the revolution in computer technology, drug discovery using AI is becoming a trend. Conventionally, a combination of various computational models such as quantitative structure-activity relationships (QSAR), molecular replacement, molecular simulation, and molecular docking have been used to achieve this goal. However, conventional methods are combinatorial in nature and often lead to instability and inability to synthesize many molecules. In recent years, many generative models for designing drug-like compounds based on deep learning models have appeared. For example, there are molecular generation methods using variational autoencoders and generative adversarial networks. However, current methods still have room for improvement in terms of rate of formation, efficacy and molecular activity of candidate compounds.

本発明は、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングシミュレーションに基づいて、最適な性質を有する新しい医薬品分子を生成する、強化学習及びドッキングに基づく医薬品分子のインテリジェント生成方法を提供する。Ａｃｔｏｒネットワークには双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークによるモデリングが使用される。 The present invention provides an intelligent generation method of drug molecules based on reinforcement learning and docking, which generates new drug molecules with optimal properties based on actor-critical reinforcement learning model and docking simulation. The Actor network uses a bi-directional transformer encoder mechanism and modeling with a DenseNet network.

上記の問題を解决するために、本発明は、以下の技術案によって達成される。
強化学習及びドッキングに基づく医薬品分子のインテリジェント生成方法は、具体的には、
医薬品設計のための仮想フラグメントコンビネーションライブラリを構築するステップ１であって、
医薬品分子仮想フラグメントコンビネーションライブラリは従来のツールキットによって１組の分子をフラグメント化したものであり、分子を分割する際に、フラグメントは分類されず、全て同じものと取り扱われるステップ１と、
フラグメント類似性を計算して分子フラグメントコーディングを行うステップ２であって、
化学類似性を計算する従来の組み合わせ方法によって異なる分子フラグメントの間の類似性を測定し、類似性に基づく平衡二分木を構築することによって、全てのフラグメントを２進文字列にコーディングし、類似するフラグメントについて類似するコーディングを付与するステップ２と、
Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化するステップ３であって、
（１）Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づくフレームワークの説明
Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化し、分子の単一のフラグメント及び該フラグメント記述における１ｂｉｔを選択して変更を行い、当該ビットでの値を入れ替えて、すなわち、０であれば、１に変更し、逆の場合にも同様であり、分子に用いられる変化の度合いを追跡することを可能とし、コーディングされるリードビットを一定に維持し、これにより、モデルでは末端でのビット変更のみを許可し、モデルが既知の化合物付近の分子しか検索でいないようにし、
Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいてフラグメント化される分子状態、すなわち、現在の状態から始まり、Ａｃｔｏｒは全てのフラグメントを抽出してチェックし、異なるフラグメントの分子での位置情報を導入し、トランスフォーマーエンコーダメカニズムを利用してそれぞれの分子の各フラグメントのアテンション係数を計算し、次に、ＤｅｎｓｅＮｅｔネットワーク出力確率によって置換対象のフラグメント及び置換用のフラグメントを決定し、全ての制約に対する新しい状態の満足度に従って、新しい状態を採点し、ｃｒｉｔｉｃは、次に、新しい状態と現在の状態の価値から増加させる報酬の間の差ＴＤ－Ｅｒｒｏｒがａｃｔｏｒに供給されるか否かを調べて、ＹＥＳの場合、ａｃｔｏｒのアクションが強化され、ＮＯの場合、アクションが阻止され、次に、現在の状態を新しい状態で置換し、このプロセスを所定の回数繰り返し、
（２）強化学習モデルの報酬メカニズムの最適化
分子自体の固有属性情報及び分子計算活性情報の２つの特性について最適化された分子を設計し、強化学習モデルの報酬メカニズム部分はパーセプトロンモデルを構築することで報酬結果の予測を行い、パーセプトロンモデルは訓練と予測の２つの段階を含み、訓練過程では、データセットは、従来の文献報告により活性を有するものとして知られている分子由来のデータセットの陽性サンプルと、同じ数量のＺＩＮＣライブラリからランダムにサンプリングしたものに由来するデータセットの陰性サンプルとの２つの由来を含み、陽性サンプル及び陰性サンプルの順序を乱したものを順次ドッキングして得られた計算活性情報及び従来のツールキットによって算出された分子固有属性情報を入力として、複数の訓練によってモデルは活性計算情報及び属性情報と本当に活性があるか否かとの潜在的な相関関係を学習し、予測過程では、該モデルは、先進的かつ効率的な医薬品ドッキングソフトウェアを用いて生成分子と疾患に関連する標的の従来の関連ＰＤＢファイルとについて仮想分子ドッキングを行って得られる生成分子の計算活性情報と、汎用ソフトウェアパッケージを用いて計算された生成分子の固有属性情報とを入力として、生成分子が実際の活性を有するか否かを予測し、生成される分子の活性をさらに最適化させ、強化学習モデルのＡｃｔｏｒは、有効な分子を生成するごとに報酬が付与され、工夫して予測モデルの期待に合致する分子を取得した場合、より高い報酬が付与されるステップ３とを含む。 In order to solve the above problems, the present invention is achieved by the following technical solutions.
The intelligent generation method of drug molecules based on reinforcement learning and docking specifically includes:
Step 1 of building a virtual fragment combination library for drug design, comprising:
A drug molecule virtual fragment combination library is a set of molecules fragmented by a conventional toolkit, and when splitting molecules, the fragments are not classified and are all treated as the same step 1;
Step 2 of calculating fragment similarities for molecular fragment coding, comprising:
measuring the similarity between different molecular fragments by conventional combinatorial methods of calculating chemical similarity, and coding all fragments into binary strings by constructing a balanced binary tree based on the similarity, giving similar coding for similar fragments step 2;
Step 3 of generating and optimizing molecules based on an actor-critical reinforcement learning model, comprising:
(1) Description of Framework Based on Actor-Critical Reinforcement Learning Model Generating and optimizing a molecule based on the actor-critical reinforcement learning model, selecting a single fragment of the molecule and 1 bit in the fragment description to make changes, swapping the value at that bit, i.e., if it is 0, change it to 1 and vice versa, allowing us to track the degree of change used in the molecule, keeping the coded read bit constant, thereby allowing the model to keep the bit at the end bit at the end. allow changes only, so that the model can only search for molecules near known compounds,
Starting from the molecular state that is fragmented based on the Actor-critic reinforcement learning model, that is, the current state, the Actor extracts and checks all the fragments, introduces the position information in the molecule of different fragments, calculates the attention coefficient of each fragment of each molecule using the transformer encoder mechanism, then determines the fragment to be replaced and the fragment for replacement by the DenseNet network output probability, scores the new state according to the satisfaction of the new state to all constraints, and scores the critical state. then checks whether the difference TD-Error between the new state and the increasing reward from the value of the current state is supplied to the actor, if YES, the action of the actor is strengthened, if NO, the action is blocked, then replaces the current state with the new state, repeats this process a predetermined number of times,
(2) Optimization of the Reward Mechanism of the Reinforcement Learning Model A molecule is designed that is optimized for two characteristics of the intrinsic attribute information of the molecule itself and the molecular computational activity information, and the reward mechanism part of the reinforcement learning model predicts the reward outcome by constructing a perceptron model. The perceptron model includes two stages of training and prediction. Taking as inputs computational activity information obtained by sequentially docking positive and negative samples from two origins, negative samples and permuted sequences of positive and negative samples, and molecule-specific attribute information computed by conventional toolkits, the model learns potential correlations between computational activity information and attribute information and true activity or not through multiple trainings, and in the prediction process, the model performs virtual molecular docking of generated molecules and conventional associated PDB files of disease-relevant targets using advanced and efficient pharmaceutical docking software. and the specific attribute information of the generated molecule calculated using a general-purpose software package, predicting whether or not the generated molecule actually has activity, further optimizing the activity of the generated molecule, and rewarding the Actor of the reinforcement learning model each time it generates an effective molecule, and giving a higher reward if it obtains a molecule that meets the expectations of the prediction model by devising a step 3.

さらに、前記ステップ１では、分子分割において、１つの環原子から延伸している全ての単結合が破壊され、分割を分子するときのフラグメントチェーンリストが作成されて元の分割点を記録して記憶し、後の分子設計における連結点として機能し、ライゲーションポイントの総数が一定であれば、ライゲーションポイント数の異なるフラグメントの交換を可能とし、この過程においてオープンソースツールキットＲＤＫｉｔを用いて分子開裂を行い、重原子が１２個を超える断片が捨てられ、４個以上のライゲーションポイントを有する断片も捨てられ、
さらに、前記ステップ２では、フラグメントの間の類似性計算において、「医薬品類似」分子を比較する際には、具体的には、最大共通下部構造Ｔａｎｉｍｏｔｏ－ＭＣＳ（ＴＭＣＳ）を用いて類似性を比較し、小さなフラグメントの場合、レーベンシュタイン距離を改良したダメラウ・レーベンシュタイン距離を導入し、この場合、２つの文字列の間のダメラウ・レーベンシュタイン距離を以下のように定義し、
２つの分子Ｍ１とＭ２との間のＴＭＣＳ距離を以下のように定義し、
この場合、２つの分子Ｍ１とＭ２との間の類似性、及び対応するｓｍｉｌｅｓ表記Ｓ１及びＳ２、すなわち
、を測定する。 Furthermore, in step 1, all single bonds extending from one ring atom are broken in the molecular splitting, a fragment chain list is created when splitting the split, the original splitting points are recorded and stored, and functions as a linking point in later molecular design. If the total number of ligation points is constant, fragments with different numbers of ligation points can be exchanged. Fragments with connection points are also discarded,
Furthermore, in step 2, in the similarity calculation between fragments, when comparing “pharmaceutical-like” molecules, specifically, the maximum common substructure Tanimoto-MCS (TMCS) is used to compare the similarity, and for small fragments, the Damerau-Levenshtein distance, which is an improved Levenshtein distance, is introduced. In this case, the Damerau-Levenshtein distance between two strings is defined as follows:
Define the TMCS distance between the two molecules M1 and M2 as
In this case, the similarity between the two molecules M1 and M2 and the corresponding smiles notations S1 and S2, i.e.
, to measure.

さらに、前記ステップ２では、分子フラグメントコードにおいて、前記文字列はフラグメント類似性に基づく平衡二分木を構築することにより作成され、次に、該木は各フラグメントに２進文字列を生成するものであり、その延伸において分子を表記する２進文字列を生成し、ライゲーションポイントの順序はそれぞれのフラグメントの識別子とされ、木を集合する際には、全てのフラグメントの間の類似性を計算し、次に、ボトムアップ型貪欲法によってフラグメントペアを形成し、ここでは、まず最も類似する２つのフラグメントをペアとし、次に、この過程を繰り返して、フラグメントが最も類似している２対を連結して４リーフ付き新木を形成し、測定の結果、算出した２つのサブ木の間の類似性はこれらの木のいずれか２つのフラグメントの間の最大類似性であり、
全てのフラグメントが単一の木に連結されるまで連結過程を繰り返し、
全てのフラグメントが二分木に記憶されると、前記二分木を用いて全てのフラグメントについてコードを生成し、
ルートからフラグメントを記憶するリーフまでの経路からそれぞれのフラグメントのコードを決定し、木のそれぞれの分岐については、左向きであれば、コードに１を追加し（「１」）、右向きであれば、０を追加し（「０」）、このようにして、コードの最右の文字がフラグメントに最も近い分岐に対応するようになる。 Further, in step 2, in the molecular fragment code, the string is created by building a balanced binary tree based on fragment similarity, then the tree is to generate a binary string for each fragment, and in its extension generates a binary string representing the molecule, the order of the ligation points is the identifier of each fragment, when assembling the tree, calculate the similarity between all fragments, and then form fragment pairs by bottom-up greedy method, where the most similar Pair two fragments and then repeat the process to concatenate the two most similar pairs of fragments to form a four-leafed new tree, where the similarity between the two subtrees calculated as a result of the measurement is the maximum similarity between the fragments of any two of these trees,
repeating the concatenation process until all fragments are concatenated into a single tree,
once all fragments are stored in a binary tree, generating code for all fragments using said binary tree;
Determine the code of each fragment from the path from the root to the leaf that stores the fragment, and for each branch of the tree, add 1 to the code if going left ("1"), add 0 if going right ("0"), and so that the rightmost character of the code corresponds to the branch closest to the fragment.

従来技術に比べて、本発明の有益な効果は以下のとおりである。
本発明は、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングシミュレーション方法に基づいて、新規分子を生成する。該モデルは、所望の性質を付与するためにどのように分子を修飾して改良するかを学習する。
（１）従来の強化学習方法と異なり、本発明は、如何にリード化合物のフラグメントを変換することによって、従来の化合物に近い構造の新規化合物を生成し、検索対象の化学空間を絞り込むかに着目する。
（２）本発明は、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて、Ａｃｔｏｒネットワークには双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークによるモデリングを利用し、様々なフラグメントの分子での位置情報を導入し、トランスフォーマーエンコーダメカニズムを利用してそれぞれの分子の各フラグメントのアテンション係数を計算し、フラグメントの分子での相対位置又は絶対位置情報を保存することで、並行訓練を実現する。
（３）強化学習の報酬メカニズムによって単層パーセプトロンモデルが作成され、該モデルの入力は、分子関連属性情報と活性情報との２つの部分の情報を含み、該活性情報は、ドッキングソフトウェアを用いて生成分子と疾患関連標的とについて分子ドッキングを行うことにより得られ、生成される分子の活性はさらに最適化させる。
（４）本発明の方法では、候補生成物の規模については、特定の疾患に対応する標的に対しては、２００万以上の候補生成分子の生成が予測される。
（５）本発明の方法では、分子ドッキング部分によって１０００個以上の超高次元パラメータが追加され、分子活性と関連属性情報が融合され、最適化させた８０％以上の高品質ＡＩ分子が生成され得る。
（６）本発明の方法は大規模なスーパーコンピューティングプラットフォームに依拠し、分子生成速度が顕著に向上する。 The beneficial effects of the present invention compared to the prior art are as follows.
The present invention generates novel molecules based on an actor-critical reinforcement learning model and a docking simulation method. The model learns how to modify and improve molecules to confer desired properties.
(1) Unlike conventional reinforcement learning methods, the present invention focuses on how to convert fragments of lead compounds to generate new compounds with structures close to conventional compounds and narrow down the chemical space to be searched.
(2) Based on the actor-critical reinforcement learning model, the present invention uses the two-way transformer encoder mechanism and DenseNet network modeling in the actor network, introduces the molecular position information of various fragments, uses the transformer encoder mechanism to calculate the attention coefficient of each fragment of each molecule, and stores the relative or absolute position information of the fragment in the molecule, thereby realizing parallel training.
(3) A single-layer perceptron model is created by the reward mechanism of reinforcement learning, the input of the model includes two parts of information, molecule-related attribute information and activity information, the activity information is obtained by performing molecular docking for the generated molecule and the disease-related target using docking software, and the activity of the generated molecule is further optimized.
(4) With regard to the scale of candidate products, the method of the present invention predicts the generation of 2 million or more candidate product molecules for a target corresponding to a particular disease.
(5) In the method of the present invention, more than 1000 ultra-high dimensional parameters are added by the molecular docking part, molecular activity and related attribute information are fused, and optimized 80% or more high-quality AI molecules can be generated.
(6) The method of the present invention relies on a large-scale supercomputing platform, significantly increasing the rate of molecule production.

Ｍｐｒｏ関連化合物の仮想分子フラグメントライブラリである。A virtual molecular fragment library of Mpro-related compounds. Ｍｐｒｏ関連化合物の全てのフラグメントを含む二分木のサブ部分である。A sub-part of a binary tree containing all fragments of Mpro-related compounds. Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルのフレームワーク図である。1 is a framework diagram of an actor-critical reinforcement learning model; FIG. Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルにおけるａｃｔｏｒの詳細な情報である。It is detailed information of the actor in the actor-critical reinforcement learning model. 新型コロナウイルスＭｐｒｏ標的に対する活性化合物分子の生成である。Generation of active compound molecules against the novel coronavirus Mpro target.

以下、実施例によって図面を参照しながら本発明の技術案をさらに説明するが、本発明の特許範囲は実施例を何ら限定するものではない。 Hereinafter, the technical solution of the present invention will be further described by way of examples with reference to the drawings, but the patent scope of the present invention does not limit the examples.

実施例１
本実施例は、主として、新型コロナウイルスのＭｐｒｏ標的に対する活性化合物の生成を目的とし、１組の出発リード化合物を基にして、これらのフラグメントの一部を置換することでこれらの分子を改良して最適化させ、所望の性質を有するＭｐｒｏを標的とする新規活性化合物を生成する。本実施例では、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングシミュレーション方法に基づいて、最適な性質を有する新規医薬品分子を生成する。以下、本実施例の技術案について詳細に説明する。 Example 1
This example is primarily directed to the generation of active compounds against the Mpro target of the novel coronavirus, and builds on a set of starting lead compounds and refines and optimizes these molecules by substituting some of these fragments to generate novel active compounds targeting Mpro with desired properties. In this example, novel drug molecules with optimal properties are generated based on an actor-critical reinforcement learning model and a docking simulation method. The technical solution of this embodiment will be described in detail below.

Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデル及びドッキングに基づく医薬品分子のインテリジェント生成方法であって、具体的には、下記のステップ１～ステップ３を含む。 An intelligent generation method of drug molecules based on actor-critical reinforcement learning model and docking, specifically including steps 1 to 3 below.

ステップ１．医薬品設計のための仮想フラグメントコンビネーションライブラリを構築する。 Step 1. Building a virtual fragment combination library for drug design.

医薬品分子仮想フラグメントコンビネーションライブラリは１組の分子をフラグメント化したものである。本実施例の仮想フラグメントライブラリは、図１に示すように、医薬品化学データベースであるＣｈＥＭＢＬデータベースからのＭｐｒｏ標的に関連する１０１７２個の化合物と、実験室において分子ドッキングによりスクリーニングされたＭｐｒｏを標的とする１７５個のリード化合物とから構成される。分子のフラグメント化の通常の方法は、分子を環構造、側鎖やリーガーなどのものに分けることである。本発明では、フラグメントを分類しない以外、分子分割は略同じ手段に従って行われる。このため、全てのフラグメントは同じものとして取り扱われる。分子を切断するために、１つの環原子から延伸している全ての単結合が破壊される。分子を分割する際に、フラグメントチェーンリストが作成されて元の分割点を記録して記憶し、後の分子設計における連結点として機能する。ライゲーションポイントの総数が一定であれば、ライゲーションポイント数の異なるフラグメントの交換を可能とする。この過程において、分子開裂は従来の化学情報学のオープンソースツールキットＲＤＫｉｔによって行われる。この過程において、重原子が１２個を超える断片が捨てられ、４個以上のライゲーションポイントを有する断片も捨てられる。これらの制約は面白い候補対象を多く生成することを維持しながら複雑さを低減させるためである。 A drug molecule virtual fragment combination library is a fragmentation of a set of molecules. The virtual fragment library of this example consists of 10172 compounds related to Mpro targets from the ChEMBL database, a medicinal chemistry database, and 175 lead compounds targeting Mpro screened by molecular docking in the laboratory, as shown in FIG. A common method of fragmenting a molecule is to divide the molecule into such things as ring structures, side chains and ligers. In the present invention, molecular splitting follows substantially the same procedure, except that fragments are not sorted. Because of this, all fragments are treated the same. To cleave the molecule, all single bonds extending from one ring atom are broken. When splitting a molecule, a fragment chain list is created to record and store the original splitting points and serve as ligation points in later molecular design. If the total number of ligation points is constant, fragments with different numbers of ligation points can be exchanged. In this process, molecular cleavage is performed by the traditional cheminformatics open source toolkit RDKit. In this process, fragments with more than 12 heavy atoms are discarded, and fragments with 4 or more ligation points are also discarded. These constraints are to reduce complexity while still generating many interesting candidate objects.

ステップ２．フラグメント類似性を計算して分子フラグメントコーディングを行う。 Step 2. Molecular fragment coding is performed by calculating fragment similarities.

ステップ２．１フラグメント間んお類似性の計算
本実施例では、全てのフラグメントは２進文字列としてコーディングされ、なお、コーディングは類似するフラグメントが類似するコードを得ることを目的とする。このため、フラグメントの間の類似性についての測定が行わなければならない。化学類似性を計算する方法が多くある。分子の指紋は直接的な２進コードであり、ここでは、類似する分子は原則的には類似するコードが付与される。ただし、分子フラグメント及びそれに固有のスパース表現の形式を比較した結果、ここでの目的に関しても、分子の指紋の寄与がそれほど大きくない。化学的には、分子の間の類似性を視覚的に測定する方法としては、最大共通下部構造Ｔａｎｉｍｏｔｏ－ＭＣＳ（ＴＭＣＳ）類似性を利用することである。

Step 2.1 Computation of Similarity Between Fragments In this embodiment, all fragments are coded as binary strings, where the coding aims to obtain similar codes for similar fragments. For this reason, a measure of similarity between fragments must be made. There are many ways to calculate chemical similarity. A molecular fingerprint is a direct binary code, where similar molecules are in principle given similar codes. However, a comparison of molecular fragments and their inherent form of sparse representation shows that the contribution of molecular fingerprints is not significant, even for our purposes. Chemically, a visual measure of similarity between molecules is to use maximum common substructure Tanimoto-MCS (TMCS) similarity.

ここで、ｍｃｓ（Ｍ１，Ｍ２）は分子Ｍ１及びＭ２の最大共通下部構造の原子数であり、ａｔｏｍｓ（Ｍ１）及びａｔｏｍｓ（Ｍ２）はそれぞれ分子Ｍ１及びＭ２の原子数である。 where mcs(M1,M2) is the number of atoms in the largest common substructure of molecules M1 and M2, and atoms(M1) and atoms(M2) are the number of atoms in molecules M1 and M2, respectively.

Ｔａｎｉｍｏｔｏ－ＭＣＳ類似性の利点の１つはフラグメントの構造を直接比較するので、他の特定の表記に依存しないことにある。「医薬品類似」分子を比較する際には、通常、このような方法は好適である。しかし、小さなフラグメントの場合、Ｔａｎｉｍｏｔｏ－ＭＣＳ類似性には欠点がある。このため、本発明では、２つのテキスト文字列の間の類似性を測定する一般的な方法であるレーベンシュタイン距離が導入されている。レーベンシュタイン距離は、２つの文字列を同じとするのに必要な最小の挿入、削除及び置換の回数として定義される。ただし、置換による編集距離への影響を考慮して、本実施例では、レーベンシュタイン距離を改良した的ダメラウ・レーベンシュタイン距離が導入され、すなわち、２つの文字列の間のダメラウ・レーベンシュタイン距離は以下のように定義される。
One of the advantages of the Tanimoto-MCS similarity is that it directly compares the structure of the fragments and does not rely on any other particular notation. Such methods are generally preferred when comparing "pharmaceutical-like" molecules. However, for small fragments, the Tanimoto-MCS similarity has drawbacks. For this reason, the present invention introduces the Levenshtein distance, a common method for measuring the similarity between two text strings. The Levenshtein distance is defined as the minimum number of insertions, deletions and substitutions required to make two strings the same. However, considering the effect of substitution on the edit distance, this embodiment introduces the Damerau-Levenshtein distance, which is an improved version of the Levenshtein distance, that is, the Damerau-Levenshtein distance between two strings is defined as follows.

妥協案として、２つの分子Ｍ１とＭ２との間の類似性、及び対応するｓｍｉｌｅｓ表記Ｓ１及びＳ２を測定するようになり、すなわち、以下のとおりである。
A compromise is to measure the similarity between the two molecules M1 and M2 and the corresponding smiles notations S1 and S2, ie:

ステップ２．２分子フラグメントのコーディング
全てのフラグメントは２進文字列にコーディングされる。前記文字列はフラグメント類似性に基づく平衡二分木を構築することにより作成され。次に、該木は各フラグメントに２進文字列を生成するものであり、その延伸において分子を表記する２進文字列を生成する。ライゲーションポイントの順序はそれぞれのフラグメントの識別子とされる。木を集合する際には、全てのフラグメントの間の類似性を計算する。次に、ボトムアップ型貪欲法によってフラグメントペアを形成し、ここでは、まず最も類似する２つのフラグメントをペアとする。次に、この過程を繰り返して、フラグメントが最も類似している２対を連結して４リーフ付き新木を形成する。測定の結果、算出した２つのサブ木の間の類似性はこれらの木のいずれか２つのフラグメントの間の最大類似性である。全てのフラグメントが単一の木に連結されるまで、連結過程を繰り返す。 Step 2.2 Coding Molecular Fragments All fragments are coded into binary strings. The string is created by building a balanced binary tree based on fragment similarity. The tree then produces a binary string for each fragment, which in extension produces a binary string that describes the molecule. The order of ligation points serves as an identifier for each fragment. When assembling the tree, we compute the similarity between all the fragments. Fragment pairs are then formed by a bottom-up greedy method, where the two most similar fragments are paired first. The process is then repeated to concatenate the two most similar pairs of fragments to form a new tree with four leaves. As a result of the measurement, the calculated similarity between two subtrees is the maximum similarity between any two fragments of these trees. The concatenation process is repeated until all fragments are concatenated into a single tree.

全てのフラグメントが二分木に記憶されると、前記二分木を用いて全てのフラグメントについてコードを生成する。ルートからフラグメントを記憶するリーフまでの経路からそれぞれのフラグメントのコードを決定する。木のそれぞれの分岐については、図２に示すように、左向きであれば、コードに１を追加し（「１」）、右向きであれば、（「０」）を追加し、このようにして、コードの最右の文字がフラグメントに最も近い分岐に対応するようになる。 Once all fragments are stored in a binary tree, the binary tree is used to generate code for all fragments. Determine the code of each fragment from the path from the root to the leaf that stores the fragment. For each branch of the tree, if going left, add 1 to the code (“1”), if going right, add (“0”), as shown in FIG. 2, so that the rightmost character of the code corresponds to the branch closest to the fragment.

ステップ３．Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化する。 Step 3. Molecules are generated and optimized based on actor-critical reinforcement learning models.

ステップ３．１Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づくフレームワークの説明
本発明では、Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいて分子を生成して最適化し、最適化は、分子の単一のフラグメント及び該フラグメント記述における１ｂｉｔを選択して変更を行うことである。当該ビットでの値を入れ替える。すなわち、０であれば、１に変更し、逆の場合にも同様である。こにより、分子に用いられる変化の度合いを追跡することが可能になり、コードの末端でビットを変更することは、非常に類似するフラグメントの変化を表し、開始部位での変化は大幅に異なるタイプのフラグメントの変化を表すためである。図３に示すように、コーディングされるリードビットを一定に維持し、これにより、モデルでは末端でのビット変更のみを許可し、モデルが既知の化合物付近の分子しか検索でいないようにする。 Step 3.1 Description of Framework Based on Actor-Critical Reinforcement Learning Model In the present invention, molecules are generated and optimized based on the Actor-critic Reinforcement Learning model, and optimization is to select a single fragment of the molecule and 1 bit in the fragment description to make changes. Swap the value at the bit. That is, if it is 0, change it to 1 and vice versa. This makes it possible to track the degree of change used in the molecule, as changing bits at the ends of the code represent changes in very similar fragments, while changes at the start site represent changes in fragments of a significantly different type. As shown in FIG. 3, we keep the coded lead bit constant so that the model only allows bit changes at the ends and only searches for molecules near known compounds.

Ａｃｔｏｒ－ｃｒｉｔｉｃ強化学習モデルに基づいてフラグメント化される分子状態、すなわち、現在の状態Ｓから始まる。Ａｃｔｏｒは全てのフラグメントを抽出してチェックし、双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークを利用して置換対象のフラグメント及び置換用のフラグメントを決定し、すなわち、Ａｃｔｏｒにより採用されるアクションＡｉは新しい状態Ｓｉを取得する。全ての制約に対する新しい状態の満足度に従って、新しい状態Ｓｉについて採点Ｒを行う。次にｃｒｉｔｉｃは、ＳｉとＳの価値から増加させる報酬の間の差Ｔｄ－ｅｒｒｏｒがａｃｔｏｒに供給されるか否かを調べる。ＹＥＳの場合、ａｃｔｏｒのアクションＡｉが強化され、ＮＯの場合、アクションが阻止される。次に、現在の状態を新しい状態で置換し、このプロセスを所定の回数繰り返す。ここで、損失関数ｌｏｓｓ＝－ｌｏｇ（ｐｒｏｂ）＊ｔｄ＿ｅｒｒｏｒである We start with a molecular state, the current state S, which is fragmented based on an actor-critical reinforcement learning model. The Actor extracts and checks all fragments, and utilizes the bi-directional transformer encoder mechanism and the DenseNet network to determine which fragment to replace and which to replace, i.e., the action Ai taken by the Actor obtains a new state Si. Score R for the new state Si according to the new state's satisfaction with all constraints. The critic then checks whether the difference Td-error between the reward increasing from the value of Si and S is supplied to the actor. If YES, the actor's action Ai is strengthened; if NO, the action is blocked. The current state is then replaced with the new state and the process is repeated a predetermined number of times. where the loss function loss=-log(prob)*td_error

ステップ３．２強化学習モデルＡｃｔｏｒのネットワーク構造
Ａｃｔｏｒネットワークは、双方向トランスフォーマーエンコーダメカニズム及びＤｅｎｓｅＮｅｔネットワークによるモデリングを利用して、さまざまなフラグメントの分子での位置情報を導入し、トランスフォーマーエンコーダメカニズムを利用して、各分子のさまざまなフラグメントのアテンション係数を計算し、該構造の一回の読み取りは１分子のコーディングフラグメントを表し、向前き及び後向きに出力して連結し、連結された表記をＤｅｎｓｅＮｅｔニューラルネットワークを通じて、どのフラグメントを変化するかを計算し、変化後の確率分布の推定を行う。 Step 3.2 Network structure of reinforcement learning model Actor The Actor network uses bidirectional transformer encoder mechanism and DenseNet network modeling to introduce the position information of different fragments in the molecule, uses the transformer encoder mechanism to calculate the attention coefficients of different fragments of each molecule, one reading of the structure represents the coding fragment of one molecule, forward and backward output and connect, and the connected notation is passed through the DenseNet neural network to change which fragment. and estimate the probability distribution after the change.

フラグメントの置換概率は分子の前進フラグメントと後続フラグメントに依存する。このため、各分子はフラグメント配列として構成され、この配列はトランスフォーマーエンコーダメカニズムに一括して伝達される。各分子のさまざまなフラグメントのアテンション係数を計算することにより、各フラグメントの重要性が得られる。図４に示すように、次に、フォワード及びバックワードトランスフォーマーエンコーダによって１分子のさまざまなフラグメント相関性を有するベクトル化表記が入力され、最後に、連結の結果はＤｅｎｓｅＮｅｔネットワークによって分類され、どのフラグメントを変化するかの計算及び変化後の確率分布の推定が行われる。 Fragment replacement rates depend on the preceding and following fragments of the molecule. Thus, each molecule is organized into fragment sequences, which are collectively transmitted to the transformer-encoder mechanism. Calculating the attention coefficients of the various fragments of each molecule yields the importance of each fragment. As shown in Fig. 4, the forward and backward transformer encoders then input vectorized representations with different fragment correlations of one molecule, and finally the concatenation results are sorted by a DenseNet network to calculate which fragments to change and estimate the probability distribution after the change.

ステップ３．３強化学習モデルの報酬メカニズムの最適化
創薬では、最も重大な課題は複数の特性を最適化させた分子の設計であり、これらの特性には好適な関連性がない場合がある。提案されている方法では、このような状況に対応できることを確かめるために、２種の異なる特性が選択され、これらの特性は医薬品としての分子のフィージビリティを表し得る。本発明の目的は、実際の活性分子の性質により近い医薬品の分子を生成し、すなわち、所望の「最適位置」で分子を生成することである。前記したとおり、選択された性質は分子自体の固有属性情報（例えば、ＭＷ、ｃｌｏｇＰやＰＳＡなど）及び分子計算活性情報（すなわち、分子と特定の疾患の対応する標的とのドッキング結果の情報）である。なお、本発明では、強化学習モデルの報酬メカニズム部分は単層パーセプトロンモデルを構築することで報酬結果の予測を行う。このモデルは訓練と予測との２つの段階を含む。訓練過程では、データセットは、従来の文献報告により活性を有するものとして知られている分子由来のデータセットの陽性サンプルと、同じ数量のＺＩＮＣライブラリからランダムにサンプリングするものに由来するデータセットの陰性サンプルとの２つの部分の由来を含み、陽性サンプル及び陰性サンプルの順序を乱したものを順次ドッキングして得られた計算活性情報及び従来のツールキットによって算出された分子固有属性情報を入力として、複数の訓練によって、モデルは活性計算情報及び属性情報と本当に活性があるか否かとの潜在的な相関関係を学習する。予測過程では、該モデルでは、生成分子の計算活性情報は、進的かつ効率的な医薬品ドッキングソフトウェアを用いて生成分子と疾患に関連する標的とについて仮想分子ドッキングを行うことにより得られる。該モデルは、医薬品ドッキングソフトウェア、例えばＬｅｄｏｃｋによって、各ｅｐｏｃｈによって生成される５１２個以下の分子とＭｐｒｏ新型コロナウイルスに関連する異なるコンフォメーションの３８０個の標的に関する従来のＰＤＢファイルとについて仮想分子ドッキングを行う。生成分子の固有属性情報は、汎用ソフトウェアパッケージＲＤＫｉｔを用いて計算されるものであり、生成分子の計算活性情報及び分子自体の固有属性情報の合計１１４３個の超高次元パラメータを単層パーセプトロンの入力として、生成分子が実際の活性を有するか否かを予測し、生成される分子の活性をさらに最適化させる。該強化学習フレームワークのａｃｔｏｒは、有効な分子を生成するごとに報酬が付与され、工夫して予測モデルの期待に合致する分子を取得した場合、より高い報酬が付与される。 Step 3.3 Optimization of Reward Mechanisms in Reinforcement Learning Models In drug discovery, the most critical challenge is designing molecules that optimize for multiple properties, which may not be well related. In the proposed method, two different properties were selected to ensure that such situations could be addressed, and these properties may represent the feasibility of the molecule as a drug. The aim of the present invention is to produce molecules of pharmaceutical agents that more closely resemble the properties of the actual active molecule, ie to produce molecules in the desired "optimal position". As mentioned above, the properties selected are intrinsic attribute information of the molecule itself (e.g., MW, clogP, PSA, etc.) and molecular computational activity information (i.e., information of docking results between the molecule and its corresponding target for a particular disease). In the present invention, the reward mechanism part of the reinforcement learning model predicts reward results by building a single-layer perceptron model. This model includes two stages: training and prediction. In the training process, the dataset contains two parts of the dataset, the positive samples of which are derived from molecules known to have activity according to previous literature reports, and the negative samples of the dataset which are derived from random sampling from the ZINC library of the same quantity, and with the computational activity information obtained by sequentially docking the positive and negative samples permuted and the molecule-specific attribute information calculated by a conventional toolkit as inputs, the model undergoes multiple trainings to determine whether or not it is truly active with the computational activity information and attribute information. to learn the potential correlations of In the prediction process, in the model, computational activity information of product molecules is obtained by performing virtual molecular docking of product molecules and disease-related targets using progressive and efficient drug docking software. The model performs virtual molecular docking with pharmaceutical docking software, e.g., Ledock, on the ~512 molecules generated by each epoch and conventional PDB files for 380 targets of different conformations associated with Mpro-Covid-19. The intrinsic attribute information of the generated molecule is calculated using the general-purpose software package RDKit. A total of 1143 ultra-high-dimensional parameters, including the calculated activity information of the generated molecule and the intrinsic attribute information of the molecule itself, are input to the single-layer perceptron to predict whether or not the generated molecule actually has activity, thereby further optimizing the activity of the generated molecule. Actors in the reinforcement learning framework are rewarded each time they generate an effective molecule, and are rewarded more if they are devised to obtain a molecule that matches the expectations of the predictive model.

最終的に生成された新型コロナウイルスＭｐｒｏ標的に対する活性化合物分子は図５に示される。 The final generated active compound molecule against the novel coronavirus Mpro target is shown in FIG.

なお、以上の本発明の前記実施例は説明的なものに過ぎず、本発明を限定するものではなく、このため、本発明は上記の特定の形態に限定されるものではない。当業者が本発明の原理を逸脱することなく本発明に基づて得る他の形態は全て本発明の特許範囲に属する。 It should be noted that the above-described embodiments of the present invention are illustrative only and are not intended to limit the present invention, and thus the present invention is not limited to the specific forms described above. All other forms that a person skilled in the art can obtain based on the present invention without departing from the principles of the present invention are within the patent scope of the present invention.

Claims

A method for intelligent generation of pharmaceutical molecules based on reinforcement learning and docking, specifically comprising:
Step 1 of building a virtual fragment combination library for drug design, comprising:
A drug molecule virtual fragment combination library is a set of molecules fragmented by a conventional toolkit, and when splitting molecules, the fragments are not classified and are all treated as the same step 1;
Step 2 of calculating fragment similarities for molecular fragment coding, comprising:
measuring the similarity between different molecular fragments by conventional combinatorial methods of calculating chemical similarity, and coding all fragments into binary strings by constructing a balanced binary tree based on the similarity, giving similar coding for similar fragments step 2;
Step 3 of generating and optimizing molecules based on an actor-critical reinforcement learning model, comprising:
(1) Description of Framework Based on Actor-Critical Reinforcement Learning Model Generating and optimizing a molecule based on the actor-critical reinforcement learning model, selecting a single fragment of the molecule and 1 bit in the fragment description to make changes, swapping the value at that bit, i.e., if it is 0, change it to 1 and vice versa, allowing us to track the degree of change used in the molecule, keeping the coded read bit constant, thereby allowing the model to keep the bit at the end bit at the end. allow changes only, so that the model can only search for molecules near known compounds,
Starting from the molecular state that is fragmented based on the Actor-critic reinforcement learning model, that is, the current state, the Actor extracts and checks all the fragments, introduces the position information in the molecule of different fragments, calculates the attention coefficient of each fragment of each molecule using the transformer encoder mechanism, then determines the fragment to be replaced and the fragment for replacement by the DenseNet network output probability, scores the new state according to the satisfaction of the new state to all constraints, and scores the critical state. then checks whether the difference TD-Error between the new state and the increasing reward from the value of the current state is supplied to the actor, if YES, the action of the actor is strengthened, if NO, the action is blocked, then replaces the current state with the new state, repeats this process a predetermined number of times,
(2) Optimization of the Reward Mechanism of the Reinforcement Learning Model A molecule is designed that is optimized for two characteristics of the intrinsic attribute information of the molecule itself and the molecular computational activity information, and the reward mechanism part of the reinforcement learning model predicts the reward result by constructing a perceptron model that includes two stages of training and prediction. and the calculated activity information obtained by sequentially docking the permuted positive and negative samples and the molecule-specific attribute information calculated by the conventional toolkit as input, the model learns the potential correlation between the calculated activity information and the attribute information and the true activity or not through multiple trainings, and in the prediction process, the model uses advanced and efficient drug docking software to perform virtual molecular docking of the generated molecules and the conventional associated PDB files of disease-relevant targets. and the specific attribute information of the produced molecule calculated using a general-purpose software package as inputs, predicting whether the produced molecule actually has activity, further optimizing the activity of the produced molecule, and giving a reward each time an Actor of the reinforcement learning model produces an effective molecule, and giving a higher reward if a molecule that meets the expectations of the prediction model is obtained by devising a step 3.

In step 1, all single bonds extending from one ring atom are broken in the molecule splitting, and a fragment chain list is created when splitting the molecule, the original split points are recorded and stored, and function as connecting points in later molecular design,
If the total number of ligation points is constant, fragments with different numbers of ligation points can be exchanged,
During this process, molecular cleavage is performed using the open source toolkit RDKit,
The method for intelligent generation of drug molecules based on reinforcement learning and docking according to claim 1, characterized in that fragments with more than 12 heavy atoms are discarded, and fragments with 4 or more ligation points are also discarded.

In step 2, in the similarity calculation between fragments, when comparing "pharmaceutical-similar" molecules, specifically, the maximum common substructure Tanimoto-MCS is used to compare the similarity, and for small fragments, the Damerau-Levenshtein distance, which is an improved Levenshtein distance, is introduced. In this case, the Damerau-Levenshtein distance between two strings is defined as follows:
Define the TMCS distance between the two molecules M1 and M2 as
In this case, the similarity between the two molecules M1 and M2 and the corresponding smiles notations S1 and S2, i.e.
The method for intelligent generation of drug molecules based on reinforcement learning and docking according to claim 1, characterized in that it measures .

in step 2, in the molecular fragment code, the string is created by building a balanced binary tree based on fragment similarity, the tree is then for each fragment to generate a binary string, which in its extension generates a binary string representing the molecule, the order of the ligation points being the identifier of each fragment;
In aggregating trees, the similarity between all fragments is calculated, and then fragment pairs are formed by bottom-up greedy method, where the two most similar fragments are paired first, and then the process is repeated to concatenate the two pairs with the most similar fragments to form a new tree with four leaves, and the calculated similarity between two subtrees is the maximum similarity between any two fragments of these trees as a result of the measurement;
repeating the concatenation process until all fragments are concatenated into a single tree,
The method for intelligent generation of pharmaceutical molecules based on reinforcement learning and docking according to claim 1, characterized in that when all fragments are stored in a binary tree, the code for each fragment is determined from the path from the root to the leaf that stores the fragment, and for each branch of the tree, add 1 to the code if going left, and add 0 if going right, so that the rightmost character of the code corresponds to the branch closest to the fragment.