CN115328117A - Protein dynamic ligand channel optimal path analysis method based on reinforcement learning - Google Patents
Protein dynamic ligand channel optimal path analysis method based on reinforcement learning Download PDFInfo
- Publication number
- CN115328117A CN115328117A CN202210836151.5A CN202210836151A CN115328117A CN 115328117 A CN115328117 A CN 115328117A CN 202210836151 A CN202210836151 A CN 202210836151A CN 115328117 A CN115328117 A CN 115328117A
- Authority
- CN
- China
- Prior art keywords
- ligand
- channel
- reinforcement learning
- protein
- optimal path
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000003446 ligand Substances 0.000 title claims abstract description 93
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 57
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 57
- 230000002787 reinforcement Effects 0.000 title claims abstract description 40
- 238000004458 analytical method Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000009471 action Effects 0.000 claims abstract description 23
- 238000001514 detection method Methods 0.000 claims abstract description 15
- 230000003068 static effect Effects 0.000 claims abstract description 15
- 238000004088 simulation Methods 0.000 claims abstract description 9
- 239000003795 chemical substances by application Substances 0.000 claims abstract description 6
- 230000007246 mechanism Effects 0.000 claims abstract description 6
- 238000012216 screening Methods 0.000 claims abstract description 3
- 230000033001 locomotion Effects 0.000 claims description 23
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 9
- 238000000329 molecular dynamics simulation Methods 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 4
- 238000000547 structure data Methods 0.000 claims description 2
- 230000003197 catalytic effect Effects 0.000 abstract description 4
- 108091006146 Channels Proteins 0.000 description 77
- 238000004364 calculation method Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 12
- 238000006073 displacement reaction Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 150000003384 small molecules Chemical class 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000002441 reversible effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 102000002004 Cytochrome P-450 Enzyme System Human genes 0.000 description 1
- 108010015742 Cytochrome P-450 Enzyme System Proteins 0.000 description 1
- 102000004257 Potassium Channel Human genes 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005460 biophysical method Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004001 molecular interaction Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 108020001213 potassium channel Proteins 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a protein dynamic ligand channel optimal path analysis method based on reinforcement learning, which comprises the following steps: acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamic properties of partial atoms forming a channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment; constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, agents, actions and rewards; when the ligand molecules leave the boundary of the three-dimensional space grid map, giving a negative feedback value to the current state and returning to the previous state for continuous training until the channel end point is reached or the maximum exploration step number of a single round is reached, ending the training of the round and outputting the training result of the round; and continuing the next round of training until all the training rounds are finished, and outputting the optimal path of the protein dynamic ligand channel. The invention can assist ligand molecules to find the optimal path reaching the catalytic activity center position, and has certain engineering application reference value.
Description
Technical Field
The invention relates to the technical field of reinforcement learning, in particular to a protein dynamic ligand channel optimal path analysis method based on reinforcement learning.
Background
The main research methods of the protein ligand channel include Stretching Molecular Dynamics (SMD), random Accelerated Molecular Dynamics (RAMD), crystal thermodynamic factor method, calculation method and the like. Due to the problem of biological experiment cost, the current calculation method plays a key role in the field of ligand channel identification. Channel calculation methods generally fall into two categories: 1) A method of incorporating ligand molecules into a channel recognition process; 2) A method for calculating channels based on protein structure only, as follows:
in the method for incorporating ligand molecules into the channel recognition process, specific ligand molecules are explicitly considered, and possible channels in the protein structure are detected by the techniques such as molecular dynamics simulation and the like. This approach requires a significant amount of resources for the simulations, and these simulations are not always used to find successful channel exits and may result in unrealistic channels.
In the method for calculating the channel only based on the protein structure, ligand molecules are not considered in advance, the protein structure is converted into a problem which can be processed geometrically, and the task of channel identification is simplified. The advantage of this method is that the computation time and computation cost is usually small and detailed information of the channel size and surrounding atoms can be generated. The most representative of them isM et al, in 2006. CAVER works by exploring the grid nodes using Dijkstra's algorithm, which is used to select the shortest and cheapest paths (i.e., those with the lowest cost). The CAVER is mainly used for analyzing a protein structure channel and providing quick, accurate and full-automatic calculation for a static channel, the occurrence of the CAVER plays an important role in the research of drug design and molecular enzymology, but the CAVER has the problems of high requirements on processor time and memory and system continuous errors when a large channel is explored.M and its team proposed in 2007 a mol algorithm based on Voronoi grids. MOLE overcomes many of the disadvantages and limitations of CAVER. Although the core of the MOLE algorithm is also the Dijkstra path search algorithm, the authors apply it to Voronoi diagrams (networks), and for the testing of various biomolecular systems,including potassium channels, cytochrome P450 ligand channels, have all proven this algorithm to work well. 2013. In the year, david et al performed an algorithmic upgrade to MOLE, presenting MOLE2.0 version, which new version could identify more relevant channels, and benchmark tests with other tools showed that MOLE2.0 is faster, more robust, and more versatile. 2017. Berka K, in turn, introduced the MOLE version 2.5, which not only calculates channels but also detects pores. This is also the latest version of the current mol. With the continual updating of iterations in recent years, MOLE has gradually become a gold standard for rapid detection of channels in biomacromolecular structures.
However, the channels calculated by the CAVER software and the MOLE software are static, and due to the existence of protein dynamics, atoms actually composing the channels are in continuous motion, so that the actual positions of the channels are correspondingly changed. The dynamic information of the protein is not considered in the current channel calculation method.
Disclosure of Invention
In order to solve the technical problem, the invention provides a protein dynamic ligand channel optimal path analysis method based on reinforcement learning. The problem of the dynamic property of a protein ligand channel is considered on the basis of a static channel, so that the problem of the ligand channel is more in line with the actual situation. The dynamic property of protein means that the spatial conformation of protein can be changed in a certain microenvironment, and various conformational isomers can regulate the folding and the function of protein.
In order to achieve the purpose, the technical scheme of the invention is as follows:
the protein dynamic ligand channel optimal path analysis method based on reinforcement learning comprises the following steps:
acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamics of partial atoms forming a channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment;
constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, agents, actions and rewards, and the state data comprises position information of ligand molecules and part of atoms forming a channel; inputting previous state data into a reinforcement learning model, wherein the reinforcement learning model determines action data corresponding to the previous state data, the intelligent agent interacts with a simulation environment according to the action data to obtain an interaction result, and the interaction result is next state data corresponding to the action data and reward data corresponding to ligand molecules under the action data;
when the ligand molecules leave the boundary of the three-dimensional space grid map, a negative feedback value is given to the current state, the previous state is returned for continuous training until the ligand molecules reach the channel end point or the maximum exploration step number of a single round is reached, the training of the round is finished, and the path step number of the ligand molecules from the channel starting point to the channel end point in the training of the round and the total obtained return value are output; and continuing the next round of training until all the training rounds are finished, and outputting the path with the least path steps and the highest return value in all the training rounds, namely the optimal path of the protein dynamic ligand channel.
Preferably, the method further comprises the following steps: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.
Preferably, the static ligand channel information of the protein is obtained using the MOLE or carver software.
Preferably, full atomic dynamics information of the protein is obtained using normal mode analysis or molecular dynamics simulation methods.
Preferably, the introducing the grid method under the three-dimensional space for processing specifically comprises the following steps:
constructing a three-dimensional space grid map, wherein the size of grid cells in the three-dimensional space grid map depends on the size of ligand molecules, and the number of the grid cells in the three-dimensional space grid map depends on the size of space occupied by atoms around the obtained static ligand channel;
and placing the ligand molecules at the starting point of the channel, wherein each grid unit in the three-dimensional space grid map is the minimum moving unit of the ligand molecules.
Preferably, before the atomic collision detection mechanism, the method further comprises the following steps:
judging whether the three-dimensional structure shape of the ligand molecule is a regular sphere or not, if so, wrapping the ligand molecule through the sphere by adopting a sphere surrounding box, wherein the grid unit is a cube externally connected with the sphere; if not, adopting a cylinder to wrap ligand molecules, adopting a three-dimensional AABB bounding box, and taking a grid unit as a bounding box cube;
abstracting partial atoms forming the channel into a sphere, and detecting the collision of ligand molecules and each atom in the three-dimensional space grid map in real time.
Preferably, the motion data is up, down, left, right, front, back, and back in a three-dimensional space grid map, and the state data is a state value corresponding to a three-dimensional space coordinate value.
Preferably, the reinforcement learning model adopts a time sequence difference off-line control algorithm Q-learning.
Preferably, the method further comprises the following steps: and introducing an exponential decay factor, so that the exploration rate in the reinforcement learning model is gradually reduced along with the continuous increase of time.
Preferably, the reward data corresponding to the ligand molecule is obtained according to a reward and punishment function, which is described as follows:
based on the technical scheme, the invention has the beneficial effects that: according to the method, based on the outstanding static channel calculation capacity of the MOLE, the dynamic information of the protein is calculated by using normal mode analysis, the dynamic channel information of the protein is obtained by combining, then based on the reinforcement Learning environment Gym, the grid method is improved by introducing the collision detection algorithm based on the bounding box, the time sequence difference offline control algorithm Q-Learning is adopted for Learning, an exponential decay factor is introduced aiming at the exploration rate, a reward and penalty function is designed to avoid dynamic atom collision, the catalytic activity center position can be reached within the shortest step, and the method has a certain engineering application reference value.
Drawings
FIG. 1 is a schematic diagram of the structure of a protein dynamic ligand channel optimal path analysis method based on reinforcement learning in one embodiment;
FIG. 2 is a diagram of all channels of 1bu7 in one embodiment;
FIG. 3 is a diagram of atomic figures around all channels of 1bu7 in one embodiment;
FIG. 4 is a diagram of atomic diagram around each channel of 1bu7 in FIG. 3;
FIG. 5 is a flow diagram of ANM calculation in one embodiment;
FIG. 6 is a dynamic display of the 1bu7 protein of one example;
FIG. 7 is a dynamic representation of atoms surrounding a 1bu7 channel in one embodiment;
FIG. 8 is a schematic diagram of a three-dimensional environment grid method in one embodiment;
FIG. 9 is a flow diagram of constructing a grid cell in one embodiment;
FIG. 10 is a schematic structural diagram of a reinforcement learning-based protein dynamic ligand channel optimal path analysis method in one embodiment;
FIG. 11 is a graph of convergence for different quest rates in one embodiment;
FIG. 12 is a comparison graph of reward value convergence in one embodiment;
FIG. 13 is an optimal path display diagram corresponding to different channels of 1bu7 in FIG. 4.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, the present embodiment provides a method for analyzing an optimal path of a protein dynamic ligand channel based on reinforcement learning, which includes the following steps:
step 1, environment creation. And combining the static ligand channel information calculated by the MOLE software and the full atom dynamic information calculated by the NMA to generate the dynamic information of atoms forming the channel, namely the dynamic information of the ligand channel. And rasterization processing of the target environment and introduction of an atomic collision detection mechanism are carried out on the basis of the python script.
In this example, the protein static ligand channels were calculated by the MOLE software. The core of the localization deployment of the MOLE is to configure an XML file for calculation, set a proper diameter for channel filtering, and set a channel endpoint coordinate for path calculation of a channel. The MOLE supports different results of output multi-formats, and PDBProfile, PDBSstructure and PyMOL formats are mainly used in the system. The PDBPprofile output format is a PDB file, and data such as a channel center coordinate, a channel radius and the like are stored. The PDBSstructure output format is a PDB file, and the peripheral atom information of the channel is stored. The PyMOL output format is a Python file, which is mainly used for quickly visualizing a channel in a PyMOL tool. Taking 1BU7 as an example, setting the minimum radius of a channel to be 1.2, setting the starting point of the channel to be a Fe atom of a catalytic center, configuring an input _1bu7.Xml file, and enabling Linux and MacOS users to install Mono to use the file, wherein the command is as follows: mono move 2. Exe./Test _ data/1BU7/input _1bu7.Xml, and the static channel calculation results are visualized as shown in fig. 2, fig. 3 and fig. 4.
In this example, the kinetic information of all atoms of the protein was calculated using Normal Model Analysis (NMA). NMA can be roughly divided into two categories: force field based methods and network based methods. Force field-based methods, which utilize detailed molecular interactions, generally produce more accurate patterns than network-based methods. However, conventional force field based approaches require initial capability minimization, which distorts the structure. Network-based methods, typically Elastic Networks (ENMs), bypass this step.
ENM is an effective means for processing and analyzing protein by a biophysical method, and is essentially a phase change expansion of Hooke elasticity law, and the intrinsic kinetic properties of the protein are characterized from a high-dimensional level of a network. The system needs to consider the motion amplitude information of each atom, also needs to consider the motion directions of the atoms, and selects an Anisotropic Network Model (ANM) for calculation.
The calculation is performed using ANM, and the input file is a protein three-dimensional structure PDB file mainly including information such as position coordinates of atoms and residues constituting the protein. The ANM computation core generation flow is shown in fig. 5.
The new Coordinates calculated by NMA are called Normal Coordinates (NMC), one NMC representing one vibration Mode. Assuming that the coordinates of the starting position of the atom a are (40,50,60) and the NMC calculated by NMA is (5,5,6), the range of motion of the atom a is:
x∈[35,45],y∈[45,55],z∈[54,66]
generalizing the problem, let the atomic A coordinate be (x) 0 ,y 0 ,z 0 ) NMC calculated by NMA is (m) x ,m y ,m z ) Then the range of motion of atom a is:
x∈[x 0 -m x ,x 0 +m x ],y∈[y 0 -m y ,y 0 +m y ],z∈[z 0 -m z ,z 0 +m z ]
then the atomic a coordinate at time t is:
x t =x 0 +α t m x ,y t =y 0 +α t m y ,z t =z 0 +α t m z
wherein alpha is t ∈[-1,1]And is a one-digit decimal.
Setting a random number sequence for generating alpha i Represents different times t i Each atom is in a random position within its respective range. Then t 1 Time and t 2 The time atom positions are as follows:
x 1 =x 0 +α 1 m x ;y 1 =y 0 +α 1 m y ;z 1 =z 0 +α 1 m z
x 2 =x 0 +α 2 m x ;y 2 =y 0 +α 2 m y ;z 2 =z 0 +α 2 m z
calculating t 1 To t 2 Amount of change in position at time:
then at t 1 To t 2 At that time, the distance that the atom travels on the x-axis is (α) 2 -α 1 )m x Taking the boundary value alpha in extreme cases 2 =1,α 1 = 1 then the distance travelled by the atom along the x-axis is 2m x The final position is x 1 +2m x 。
Substituting into a formula to calculate specifically: x is the number of 1 =x 0 -m x And x 2 =x 1 +2m x =x 0 +m x 。
Since each time is based on the initial position x 0 The calculation is carried out, and the processing has the advantages that the judgment of the boundary value of the reciprocating motion is not needed, and the atom position at each moment is ensured to be in the motion range.
Specifically, for example, the random number sequence generated by the current system is [0.10.8-0.4 … ], and the atom and NMA coordinate information is shown in table 1.
TABLE 1 atomic motion information analysis Table
t 1 Time t 2 Time of day, t 2 Time t 3 The amount of change in displacement in the x direction at time is:
the period T of the simple harmonic motion refers to the time required for an atom to start at the starting position, go through a complete round trip, and return to that point. The positions of atoms in their periodic divisions are shown in table 2.
TABLE 2 motion period information Table
The variation of the position of the atom from its initial position at different periods is as follows:
wherein, taking the position change information in the x-axis direction as an example, t 0 (initial time) to t 1 At time, the atomic displacement is 0.1m x ,t 1 To t 2 At time, the atomic displacement is 0.7m x Equivalent to at t 0 Time t 2 A total displacement of 0.8m x . And at t 2 To t 3 Calculating the atom displacement variation of-1.2 m at the time x Can be decomposed to 0.2m x +(-1.4m x ). 0.8m in the front x On the basis of the total weight of the powder, firstly passes through 0.2m x Reaches m x I.e. byHas reached the forward motion boundary, the displacement needs to be reversed. The remaining-1.4 m x The calculation can be performed directly. Can also be decomposed to-1.0 m x +(-0.4m x ) The front representing a reverse movement m x Distance, i.e. reachPosition, also the initial position, and then continue the reverse movement by 0.4m x Displacement of (2). It can be seen that although t is calculated 3 The time being only according to alpha 3 = -0.4, reverse movement 0.4m x But includes information on the change from the position at the previous time to the position at that time. The calculation results are visualized as shown in fig. 6 below.
In this embodiment, dynamic information of part of atoms constituting a channel is screened by a python script. Based on the data stored in the file: pdb has obtained the atomic information that makes up each channel, and is stored in the file: motion information of all atoms of the protein is calculated in 1bu7 \u _ anm \. The dynamics of atoms around the channel 1 of 1bu7 are shown in FIG. 7. The direction of the arrow represents the direction of motion and the length represents the magnitude of motion.
In this embodiment, in order to facilitate the detection of collision, the ligand molecules and the whole atoms are abstracted into a sphere, as shown in fig. 8. The target problem is subjected to environment modeling by adopting a grid method in a three-dimensional space, as shown in fig. 9. The conventional grid method determines whether a corresponding grid can move by judging whether obstacles exist in the grids around a target point, and because the size of the obstacles (atoms) in a target environment is far smaller than that of a grid unit, and each small atom is in a continuous motion state at any moment, the conventional idea is inappropriate to judge whether the grids around the target point can move, so that each grid in the target environment only serves as a minimum moving unit for ligand molecules to move, and whether the obstacles exist on the moving path is not considered. That is, the ligand molecule does not know the position of the next step before moving every step, and the environment can fully exert the advantage that the intelligent agent can know the unknown environment through reinforcement learning. And if the position at the current moment is allowed to move, the current position is proved to be not movable by adopting a collision detection mode, otherwise, the current position can be normally passed.
Since most ligand molecules present irregular three-dimensional structures, they need to be wrapped with a cylinder and then surrounded with a tri-dimensional AABB bounding box. The size of this bounding box is the size of a grid cell, and the cell flow is shown in fig. 10.
And (3) introducing a collision detection algorithm into the target environment, and abstracting each small sphere into a collision body. The method realizes real-time detection of atomic collision at each moment in a target environment by means of the current mainstream open source detection library FCL. At the moment, only the large sphere and the small sphere are left in the system, and whether the large sphere and the small sphere collide or not only needs to be detected.
And 2, initializing the environment. And initializing a state space, an action space and a reward and punishment function of reinforcement learning.
In this embodiment, a grid method in a three-dimensional space is adopted to perform environment modeling on a target problem, a state space S and an action space a of the target problem are both finite sets, and after rasterization, only six action spaces are available, and the state space is not large, so that a Q-learning algorithm is selected as a main body to construct a reinforcement learning model. The concrete description is as follows:
state space: because the target environment is constructed by adopting a grid method, the state space can be determined by encoding the grid, the grid numbers correspond to the state numbers one by one, and the state S t The grid positions where the ligand molecules are located at time t are numbered, and the grids are numbered from left to right and from top to bottom. The number of grid elements depends on the size of the ligand molecule itself and the size of the space occupied by the atoms surrounding the channels. The more the number of the grid units is, the higher the path planning precision is, but the larger the problem scale is caused, and the slower the convergence speed of the algorithm is.
An action space: because a collision detection algorithm based on an AABB bounding box is adopted in modeling, and the size of the bounding box is approximate to the size of a grid unit, the problems of specific postures and motion angles of ligand molecules do not need to be considered at present, and each three-dimensional grid is a minimum moving unit. The length x of each step of ligand molecule movement is the diameter of the inclusion, which is also the width of the cubic grid. The moving direction of the moving body is greatly simplified by adopting the rasterized environment, only six kinds of moving bodies, namely front, back, upper, lower, left and right need to be considered, the coordinate value is based on a right-hand space rectangular coordinate system, and the motion space information is shown in the table 3.
Table 3 action space information table
-a reward and punishment function: the first goal is to have the ligand molecule successfully reach the channel end point, so a relatively large reward needs to be placed on the end point. Since the boundary limit is equivalently increased when the map is constructed by rasterization, ligand molecules have a high probability of walking out of the boundary when the ligand molecules are just explored, and then the subsequent exploration has no meaning, so that a large penalty is given to the out-of-range state. If the collision of the ligand molecule and the atom shows that the path has obstacles to block the path, the path is failed to be explored, and a new round of exploration is directly started from the beginning, which also gives a large penalty and directly ends the round of attempt.
The reward and punishment function is described as follows:
and (3) state transition: the grid method ensures that the state space number and the three-dimensional coordinate value are uniquely corresponding, so that the position coordinate of the next moment can be calculated through the coordinate of the current moment and the adopted action, the corresponding state number can be found according to the position coordinate, and whether the boundary exists because the coordinate value outside the boundary has no corresponding state space number can be judged according to the corresponding relation of the new position and the state number.
And 3, setting the hyper-parameters. Learning rate, reward attenuation factor, exploration rate, etc.
In this embodiment, generally, a larger exploration rate belongs to a faster updating cost function in the early stage of the timing difference offline control algorithm Q-learning, and the exploration rate needs to be reduced to select an optimal path by experience in the later stage to ensure the convergence of the result. And the exploration rate in the epsilon-greedy method is always kept unchanged, so that the later convergence of the model is greatly influenced. Aiming at the problem, an exponential decay factor is introduced into the system aiming at the exploration rate, so that a higher exploration rate can be set at the beginning of training with confidence, the exploration rate is decayed continuously along with the training, the exploration rate is reduced to be particularly low at the later stage, and the complete convergence of the model is ensured, as shown in fig. 11.
If the small molecule and other atoms find collision, it indicates that there is an obstacle in this path, the training fails in this round, and the reward value convergence diagram is shown in fig. 12 (a). During the training process, the following are found: although collision has already occurred in the process of the search, small molecules can also continue to search, and although atoms have already collided in a certain state, the subsequent search is still valuable, and the number of training rounds required for convergence can be greatly reduced by adopting the method.
The small molecules are arranged to collide with atoms and then continue exploring without stopping, and experimental results show that: this approach, although initially slow in training, allows a stable reachable path to be found faster, and indeed allows the model to converge faster, as shown in fig. 12 (b). After the small molecule is set to collide with the atom and the exploration is continued, although the previous single-round training becomes slow, the total training time is reduced compared with the previous training.
The final hyper-parameter settings of the model are shown in table 4.
TABLE 4 model hyper-parameter information sheet
And 4, initializing the state. The ligand molecules and each atom position and state-action Q table are initialized and training begins.
And 5, selecting actions. At the current state s according to a given policy t Selecting a specific action a of the ligand molecule, acting the action a on the environment to obtain corresponding feedback r, and simultaneously memorizing a new state s t+1 (no state transition has occurred at this point).
Step 6:Q the table is updated. Updating formula according to Q-learning to state s t Using the Q value of the position of action aAnd (6) updating.
And 7, detecting the boundary. If ligand molecules s t+1 The position corresponding to the state is out of the boundary of the environment, the action is considered invalid, and the operation returns to s t Status, i.e. order s t+1 =s t . At the same time the environment feeds back a penalty to the ligand molecule, then the choice of this action in this state is avoided later.
And 8, updating the state. Environment by state s t Enter a new state s t+1 。
And 9, collision detection. For new state s t+1 And updating the positions of the ligand molecules and the positions of the atoms, and performing collision detection on the system. If the ligand molecule collides with Fe, the end point is successfully reached, the training of the round is finished, and the process directly jumps to the step 4 to restart a new round of training. Otherwise, the step 5 is continued to continue to be executed for the exploration.
And step 10, outputting the optimal path steps, the obtained total return value and the path selection information until all the training rounds are finished.
Through the operation, the simulation environment of the protein dynamic ligand channel is constructed at present, and a reasonable reinforcement learning algorithm flow is set for training and learning the target problem. The above processes are all realized by programming through Python, and a large number of experiments show that for any protein with a ligand channel, the method can complete the calculation of a static channel, the calculation of atomic dynamics, the environmental simulation of a dynamic ligand channel and the training of a reinforcement learning model in a short time, and finally can obtain an optimal path of the dynamic ligand channel reaching a catalytic activity center, and has a certain engineering reference value.
The results of the analysis of three channels of 1BU7 are shown in table 5 and fig. 13.
TABLE 5 1BU7 channel analysis and comparison table
The above description is only a preferred embodiment of the method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning disclosed in the present invention, and is not intended to limit the scope of the embodiments of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present disclosure should be included in the protection scope of the embodiments of the present disclosure.
Claims (10)
1. The protein dynamic ligand channel optimal path analysis method based on reinforcement learning is characterized by comprising the following steps:
acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamics of partial atoms forming a channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment;
constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, agents, actions and rewards, and the state data comprises position information of ligand molecules and part of atoms forming a channel; inputting previous state data into a reinforcement learning model, wherein the reinforcement learning model determines action data corresponding to the previous state data, the intelligent agent interacts with a simulation environment according to the action data to obtain an interaction result, and the interaction result is next state data corresponding to the action data and reward data corresponding to ligand molecules under the action data;
when the ligand molecules leave the boundary of the three-dimensional space grid map, a negative feedback value is given to the current state, the previous state is returned for continuous training until the ligand molecules reach the channel end point or the maximum exploration step number of a single round is reached, the training of the round is finished, and the path step number of the ligand molecules from the channel starting point to the channel end point in the training of the round and the total obtained return value are output; and continuing the next round of training until all the training rounds are finished, and outputting the path with the least path steps and the highest return value in all the training rounds, namely the optimal path of the protein dynamic ligand channel.
2. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 1, further comprising the steps of: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.
3. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method as claimed in claim 2, wherein the static ligand channel information of the protein is obtained using MOLE or CAver software.
4. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method as claimed in claim 2, wherein full atomic dynamic information of the protein is obtained by using normal model analysis or molecular dynamics simulation method.
5. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method as claimed in claim 1, wherein the introducing of the three-dimensional space grid-down method comprises the following steps:
constructing a three-dimensional space grid map, wherein the size of grid cells in the three-dimensional space grid map depends on the size of ligand molecules, and the number of the grid cells in the three-dimensional space grid map depends on the size of space occupied by atoms around the obtained static ligand channel;
and placing the ligand molecules at the starting point of the channel, wherein each grid unit in the three-dimensional space grid map is the minimum moving unit of the ligand molecules.
6. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 5, wherein the atomic collision detection mechanism is preceded by the following steps:
judging whether the three-dimensional structure shape of the ligand molecule is a regular sphere or not, if so, wrapping the ligand molecule through the sphere by adopting a sphere surrounding box, wherein the grid unit is a cube externally connected with the sphere; if not, adopting a cylinder to wrap ligand molecules, adopting a three-dimensional AABB bounding box, and taking a grid unit as a bounding box cube;
abstracting partial atoms forming the channel into a sphere, and detecting the collision of ligand molecules and each atom in the three-dimensional space grid map in real time.
7. The method as claimed in claim 5, wherein the motion data is a three-dimensional grid map, and the state data is a state value corresponding to a three-dimensional coordinate value.
8. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 1, wherein the reinforcement learning model employs a time-series difference offline control algorithm Q-learning.
9. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 7, further comprising the steps of: and introducing an exponential decay factor, so that the exploration rate in the reinforcement learning model is gradually reduced along with the continuous increase of time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210836151.5A CN115328117B (en) | 2022-07-15 | 2022-07-15 | Protein dynamic ligand channel optimal path analysis method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210836151.5A CN115328117B (en) | 2022-07-15 | 2022-07-15 | Protein dynamic ligand channel optimal path analysis method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115328117A true CN115328117A (en) | 2022-11-11 |
CN115328117B CN115328117B (en) | 2023-07-14 |
Family
ID=83917252
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210836151.5A Active CN115328117B (en) | 2022-07-15 | 2022-07-15 | Protein dynamic ligand channel optimal path analysis method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115328117B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1628316A (en) * | 2002-03-26 | 2005-06-15 | 科学与工业研究委员会 | Method and system to build optimal models of 3-D molecular structures |
CN102712956A (en) * | 2009-10-30 | 2012-10-03 | 普罗米修斯实验室股份有限公司 | Methods for diagnosing irritable bowel syndrome |
CN104657626A (en) * | 2015-02-25 | 2015-05-27 | 苏州大学 | Method for constructing protein interaction network by using text data |
CN108363908A (en) * | 2017-02-16 | 2018-08-03 | 北京毅新博创生物科技有限公司 | Intelligence spectra system for detecting biomolecule |
CN109964278A (en) * | 2017-03-30 | 2019-07-02 | 艾腾怀斯股份有限公司 | Pass through the system and method for the error in evaluated in parallel classifier the first classifier of output calibration |
CN110073301A (en) * | 2017-08-02 | 2019-07-30 | 强力物联网投资组合2016有限公司 | The detection method and system under data collection environment in industrial Internet of Things with large data sets |
CN112088070A (en) * | 2017-07-25 | 2020-12-15 | M·奥利尼克 | System and method for operating a robotic system and performing robotic interactions |
CN112669434A (en) * | 2020-12-21 | 2021-04-16 | 山东华数智能科技有限公司 | Collision detection method based on grid and bounding box |
CN113110509A (en) * | 2021-05-17 | 2021-07-13 | 哈尔滨工业大学(深圳) | Warehousing system multi-robot path planning method based on deep reinforcement learning |
CN113167691A (en) * | 2018-09-10 | 2021-07-23 | 富鲁达加拿大公司 | Autofocus sample imaging apparatus and method |
CN113358686A (en) * | 2021-06-07 | 2021-09-07 | 广西大学 | Method for analyzing pyrolysis reaction path of palm oil or vegetable insulating oil through molecular simulation and application of method to DGA thermal fault diagnosis of transformer oil |
US20220004191A1 (en) * | 2020-07-01 | 2022-01-06 | Wuhan University Of Technology | Usv formation path-following method based on deep reinforcement learning |
CN114153216A (en) * | 2021-12-14 | 2022-03-08 | 浙江大学湖州研究院 | Lunar surface path planning system and method based on deep reinforcement learning and block planning |
CN114187249A (en) * | 2021-12-03 | 2022-03-15 | 大理大学 | Breast cancer image identification and classification method based on deep neural network |
CN114333986A (en) * | 2021-09-06 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Method and device for model training, drug screening and affinity prediction |
CN114464270A (en) * | 2022-01-17 | 2022-05-10 | 北京工业大学 | Universal method for designing medicines aiming at different target proteins |
-
2022
- 2022-07-15 CN CN202210836151.5A patent/CN115328117B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1628316A (en) * | 2002-03-26 | 2005-06-15 | 科学与工业研究委员会 | Method and system to build optimal models of 3-D molecular structures |
CN102712956A (en) * | 2009-10-30 | 2012-10-03 | 普罗米修斯实验室股份有限公司 | Methods for diagnosing irritable bowel syndrome |
CN104657626A (en) * | 2015-02-25 | 2015-05-27 | 苏州大学 | Method for constructing protein interaction network by using text data |
CN108363908A (en) * | 2017-02-16 | 2018-08-03 | 北京毅新博创生物科技有限公司 | Intelligence spectra system for detecting biomolecule |
CN109964278A (en) * | 2017-03-30 | 2019-07-02 | 艾腾怀斯股份有限公司 | Pass through the system and method for the error in evaluated in parallel classifier the first classifier of output calibration |
CN112088070A (en) * | 2017-07-25 | 2020-12-15 | M·奥利尼克 | System and method for operating a robotic system and performing robotic interactions |
CN110073301A (en) * | 2017-08-02 | 2019-07-30 | 强力物联网投资组合2016有限公司 | The detection method and system under data collection environment in industrial Internet of Things with large data sets |
CN113167691A (en) * | 2018-09-10 | 2021-07-23 | 富鲁达加拿大公司 | Autofocus sample imaging apparatus and method |
US20220004191A1 (en) * | 2020-07-01 | 2022-01-06 | Wuhan University Of Technology | Usv formation path-following method based on deep reinforcement learning |
CN112669434A (en) * | 2020-12-21 | 2021-04-16 | 山东华数智能科技有限公司 | Collision detection method based on grid and bounding box |
CN113110509A (en) * | 2021-05-17 | 2021-07-13 | 哈尔滨工业大学(深圳) | Warehousing system multi-robot path planning method based on deep reinforcement learning |
CN113358686A (en) * | 2021-06-07 | 2021-09-07 | 广西大学 | Method for analyzing pyrolysis reaction path of palm oil or vegetable insulating oil through molecular simulation and application of method to DGA thermal fault diagnosis of transformer oil |
CN114333986A (en) * | 2021-09-06 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Method and device for model training, drug screening and affinity prediction |
CN114187249A (en) * | 2021-12-03 | 2022-03-15 | 大理大学 | Breast cancer image identification and classification method based on deep neural network |
CN114153216A (en) * | 2021-12-14 | 2022-03-08 | 浙江大学湖州研究院 | Lunar surface path planning system and method based on deep reinforcement learning and block planning |
CN114464270A (en) * | 2022-01-17 | 2022-05-10 | 北京工业大学 | Universal method for designing medicines aiming at different target proteins |
Non-Patent Citations (7)
Title |
---|
HAICHAO YANG等: "Emission reduction benefits and efficiency of e-waste recycling in China", 《WASTE MANAGEMENT》, no. 102, pages 541 - 549, XP085969219, DOI: 10.1016/j.wasman.2019.11.016 * |
JUSTIN JOSE等: "Reinforcement Learning Based Approach for Ligand Pose Prediction" * |
JUSTIN JOSE等: "Reinforcement Learning Based Approach for Ligand Pose Prediction", 《BIORXIV》, pages 1 - 7 * |
MARTA M. STEPNIEWSKA-DZIUBINSKA等: "Development and evaluation of a deep learning model for protein–ligand binding affinity prediction", 《BIOINFORMATICS》, vol. 34, no. 21, pages 3666 - 3674 * |
ZHENGDAN ZHU等: "Simulation and Machine Learning Methods for Ion-Channel Structure Determination, Mechanistic Studies and Drug Design", 《FRONTIERS IN PHARMACOLOGY》, no. 13, pages 1 - 21 * |
王传栋;徐娇;张永;: "实体关系抽取综述", no. 12, pages 31 - 42 * |
黄东晋;蒋晨凤;韩凯丽;: "基于深度强化学习的三维路径规划算法", no. 15, pages 36 - 42 * |
Also Published As
Publication number | Publication date |
---|---|
CN115328117B (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lauri et al. | Planning for robotic exploration based on forward simulation | |
CN111638646B (en) | Training method and device for walking controller of quadruped robot, terminal and storage medium | |
CN102867078A (en) | Three-dimensional computer-aided design (CAD) platform-based quick planning method for mechanical product disassembly process | |
Toma et al. | Pathbench: A benchmarking platform for classical and learned path planning algorithms | |
Xia et al. | Virtual comissioning of manufacturing system intelligent control | |
CN112131754A (en) | Extended POMDP planning method and system based on robot accompanying behavior model | |
CN112966433A (en) | Instant compiling-based neurodynamics simulation method and device | |
Varghese et al. | A hybrid multi-task learning approach for optimizing deep reinforcement learning agents | |
Zhao et al. | TaskNet: A neural task planner for autonomous excavator | |
Pretorius et al. | Evolutionary robotics applied to hexapod locomotion: A comparative study of simulation techniques | |
CN115328117A (en) | Protein dynamic ligand channel optimal path analysis method based on reinforcement learning | |
Basye et al. | Learning dynamics: System identification for perceptually challenged agents | |
CN113848893A (en) | Robot navigation method, device, equipment and storage medium | |
Neumann | The reinforcement learning toolbox, reinforcement learning for optimal control tasks | |
Neves et al. | An environment to support structural testing of autonomous vehicles | |
Orthey et al. | Towards reactive whole-body motion planning in cluttered environments by precomputing feasible motion spaces | |
Costa et al. | Data Mining applied to the navigation task in autonomous robots | |
CN114595612A (en) | Reinforcement calculation method and system for stressed member based on entity unit integral path | |
Tanner | Multi-agent car parking using reinforcement learning | |
Gross et al. | Sensory-based Robot Navigation using Self-organizing Networks and Q-learning | |
Yang et al. | A hybrid planning approach for accompanying information-gathering in plan execution monitoring | |
He et al. | MCTS-GEB: Monte Carlo Tree Search is a Good E-graph Builder | |
Parker et al. | Using cyclic genetic algorithms to evolve multi-loop control programs | |
Shiltagh et al. | A comparative study: Modified particle swarm optimization and modified genetic algorithm for global mobile robot navigation | |
Talbi et al. | Parallel Cooperating Genetic Algorithms: An application to robot motion planning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |