CN115328117B - Protein dynamic ligand channel optimal path analysis method based on reinforcement learning - Google Patents

Protein dynamic ligand channel optimal path analysis method based on reinforcement learning Download PDF

Info

Publication number
CN115328117B
CN115328117B CN202210836151.5A CN202210836151A CN115328117B CN 115328117 B CN115328117 B CN 115328117B CN 202210836151 A CN202210836151 A CN 202210836151A CN 115328117 B CN115328117 B CN 115328117B
Authority
CN
China
Prior art keywords
ligand
channel
reinforcement learning
protein
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210836151.5A
Other languages
Chinese (zh)
Other versions
CN115328117A (en
Inventor
羊海潮
景怡乐
李毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dali University
Original Assignee
Dali University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dali University filed Critical Dali University
Priority to CN202210836151.5A priority Critical patent/CN115328117B/en
Publication of CN115328117A publication Critical patent/CN115328117A/en
Application granted granted Critical
Publication of CN115328117B publication Critical patent/CN115328117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a protein dynamic ligand channel optimal path analysis method based on reinforcement learning, which comprises the following steps: acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamic properties of partial atoms forming the channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment; constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, intelligent agents, actions and rewards; when the ligand molecules leave the boundary of the three-dimensional space grid map, giving a negative feedback value to the current state and returning to the previous state to continue training until the channel end point is reached or the maximum exploration step number of a single round is reached, ending the training of the round, and outputting the training result of the round; and continuing the next training until all training rounds are finished, and outputting an optimal path of the protein dynamic ligand channel. The invention can help ligand molecules to find the optimal path to the catalytic active center position, and has certain engineering application reference value.

Description

Protein dynamic ligand channel optimal path analysis method based on reinforcement learning
Technical Field
The invention relates to the technical field of reinforcement learning, in particular to a protein dynamic ligand channel optimal path analysis method based on reinforcement learning.
Background
The main research methods of protein ligand channels include Stretching Molecular Dynamics (SMD), random Acceleration Molecular Dynamics (RAMD), crystal thermodynamic factor method, calculation method and the like. Due to the problem of biological experiment cost, the current calculation method plays a key role in the field of ligand channel identification. Channel computing methods are generally divided into two categories: 1) A method of incorporating a ligand molecule into a channel recognition process; 2) The method of calculating the channel based on the protein structure alone is as follows:
in the method of incorporating ligand molecules into the channel recognition process, specific ligand molecules are explicitly considered, and channels possibly existing in the protein structure are detected through techniques such as molecular dynamics simulation. This approach requires a lot of resources for simulation and these simulations do not always find successful channel exits and may create unrealistic channels, so it is not common.
In the method for calculating the channel based on the protein structure only, ligand molecules are not considered in advance, the protein structure is converted into a geometrically processable problem, and the task of channel identification is simplified. The advantage of this approach is that the computation time and computation cost are typically small and detailed information of the channel size and surrounding atoms can be generated. Of which the most representative is
Figure SMS_1
M et al in 2006. The CAVER works by exploring the mesh nodes using the Dijkstra algorithm, which is used to select the shortest and cheapest paths (i.e., those paths that are least costly). The CAVER is mainly used for analyzing protein structure channels, provides rapid, accurate and full-automatic calculation for static channels, plays a very important role in research of drug design and molecular enzymology, but has the problems of very high requirements on processor time and memory and continuous errors of a system when a large channel is explored. />
Figure SMS_2
M and its team proposed the Voronoi mesh-based mobile algorithm in 2007. MOLE overcomes many of the disadvantages and limitations of CAVER. Although the MOLE algorithm core is also the Dijkstra path search algorithm, the authors applied it to Voronoi diagrams (networks) and the algorithm has proven to work well for the testing of a variety of biomolecular systems, including potassium channels, cytochrome P450 ligand channels. 2013. In the years, david et al have performed an algorithm upgrade on the mobile, and proposed a mobile 2.0 version, and a new version can identify more relevant channels, and through benchmark tests with other tools, the mobile 2.0 is faster, more robust and more general. 2017. In the year Berka K introduced a MOLE2.5 version which not only calculated the channel but also detectedAnd (3) a pore. This is also the latest version of the current MOLE. With the continuous updating iteration in recent years, MOLE has gradually become a gold standard for rapidly detecting channels in biological macromolecular structures.
However, the channels calculated by the CAVER or MOLE software are static, and the atoms actually forming the channels are in continuous motion due to the existence of protein dynamics, so that the actual positions of the channels are correspondingly changed. None of the current channel calculation methods takes into account the dynamic information of the protein.
Disclosure of Invention
In order to solve the technical problems, the invention provides an optimal path analysis method of a protein dynamic ligand channel based on reinforcement learning. The dynamic problem of the protein ligand channel is considered on the basis of the static channel, so that the ligand channel problem is more in line with the actual situation. Protein dynamics refers to the fact that the spatial conformation of a protein can change in a certain microenvironment, and various conformational isomers can regulate the folding and functions of the protein.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
the optimal path analysis method of the protein dynamic ligand channel based on reinforcement learning comprises the following steps:
acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamic properties of partial atoms forming the channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment;
constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, intelligent agents, actions and rewards, and the state data comprises position information of ligand molecules and partial atoms forming a channel; inputting the previous state data into a reinforcement learning model, wherein the reinforcement learning model determines action data corresponding to the previous state data, and the intelligent agent interacts with a simulation environment according to the action data to obtain an interaction result, wherein the interaction result is next state data corresponding to the action data and reward data corresponding to ligand molecules under the action data;
when the ligand molecule leaves the boundary of the three-dimensional space grid map, giving a negative feedback value to the current state and returning to the previous state to continue training until the ligand molecule reaches the channel end point or the maximum exploration step number of a single round is reached, ending the training round, and outputting the path step number of the ligand molecule from the channel start point to the channel end point and the obtained total return value in the training round; and continuing the next training until all training rounds are finished, and outputting the path with the minimum path steps and the highest return value in all training rounds, namely the optimal path of the protein dynamic ligand channel.
Preferably, the method further comprises the following steps: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.
Preferably, static ligand channel information for the protein is obtained using MOLE or Caver software.
Preferably, the full atomic dynamic information of the protein is obtained using a normal model analysis or a molecular dynamics simulation method.
Preferably, the grid method processing under the introduced three-dimensional space specifically comprises the following steps:
constructing a three-dimensional space grid map, wherein the size of grid units in the three-dimensional space grid map depends on the size of ligand molecules, and the number of grid units in the three-dimensional space grid map depends on the size of the space occupied by atoms around an obtained static ligand channel;
and placing the ligand molecules at the starting point of the channel, wherein each grid unit in the three-dimensional space grid map is the minimum movement unit of the ligand molecules.
Preferably, before the atomic collision detection mechanism, the method further comprises the following steps:
judging whether the three-dimensional structure shape of the ligand molecule is a regular sphere, if so, wrapping the ligand molecule by a sphere, adopting a sphere bounding box, and enabling the grid unit to be a sphere external cube; if not, adopting a cylinder to wrap ligand molecules, adopting a three-dimensional AABB bounding box, and adopting a grid unit as a bounding box cube;
and abstracting part of atoms forming the channel into spheres, and detecting collision between ligand molecules and each atom in the three-dimensional space grid map in real time.
Preferably, the action data is up, down, left, right, front, back and forth in the three-dimensional space grid map, and the state data is a state value corresponding to a three-dimensional space coordinate value.
Preferably, the reinforcement learning model employs a time-series differential offline control algorithm Q-learning.
Preferably, the method further comprises the following steps: and introducing an exponential decay factor to gradually reduce the exploration rate in the reinforcement learning model along with the continuous increase of time.
Preferably, the reward data corresponding to the ligand molecule is obtained according to a reward and punishment function, wherein the reward and punishment function is described as follows:
Figure SMS_3
based on the technical scheme, the invention has the beneficial effects that: according to the invention, based on the excellent static channel computing capability of MOLE, dynamic information of proteins is calculated by using a normal model analysis, the dynamic channel information of the proteins is obtained by combining, then based on a reinforcement Learning environment Gym, a grid method is improved by introducing a collision detection algorithm based on bounding boxes, learning is performed by adopting a time sequence difference offline control algorithm Q-Learning, an exponential decay factor is introduced for the exploration rate, a punishment function is designed to avoid dynamic atomic collision, the position of a catalytic active center is reached in the shortest step number, and a certain engineering application reference value is provided.
Drawings
FIG. 1 is a schematic diagram of a method for analyzing optimal paths of protein dynamic ligand channels based on reinforcement learning in one embodiment;
FIG. 2 is a diagram of all channels of 1bu7 in one embodiment;
FIG. 3 is a schematic diagram of all channel surrounding atoms of 1bu7 in one embodiment;
FIG. 4 is a schematic diagram of the surrounding atoms of each channel of 1bu7 in FIG. 3;
FIG. 5 is a flowchart of ANM computation in one embodiment;
FIG. 6 is a dynamic display of 1bu7 protein in one embodiment;
FIG. 7 is a representation of atom dynamics around a 1bu7 lane in one embodiment;
FIG. 8 is a schematic diagram of a three-dimensional environmental raster method in one embodiment;
FIG. 9 is a flow diagram of constructing a grid cell in one embodiment;
FIG. 10 is a schematic diagram of a method of optimal path analysis of a protein dynamic ligand channel based on reinforcement learning in one embodiment;
FIG. 11 is a convergence graph of different exploration rates in one embodiment;
FIG. 12 is a graph of prize convergence versus one embodiment;
fig. 13 is a diagram showing the optimal path for the different channels 1bu7 in fig. 4.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, the embodiment provides a method for analyzing an optimal path of a protein dynamic ligand channel based on reinforcement learning, which includes the following steps:
and 1, creating an environment. And combining the static ligand channel information calculated by the MOLE software and the full-atom dynamic information calculated by the NMA to generate dynamic information of atoms forming the channel, namely the dynamic information of the ligand channel. And simultaneously, rasterizing processing of the target environment and introduction of an atomic collision detection mechanism are performed based on the python script.
In this example, the protein static ligand channel was calculated by means of the MOLE software. The core of the localization deployment MOLE is to configure an XML file for calculation, set a proper diameter for channel filtering, and set a channel endpoint coordinate for path calculation of a channel. The MOLE supports outputting different results in multiple formats, and is mainly used in the system to obtain PDBPprofile and PDBSstructure and PyMOL formats. The PDBPprofile output format is a PDB file, and data such as channel center coordinates, channel radius and the like are stored. The PDBSstructure output format is PDB file, and the surrounding atom information of the component channels is saved. The PyMOL output format is Python file, which is mainly used for quickly visualizing the channel in the PyMOL tool. Taking 1BU7 as an example, setting the minimum radius of a channel as 1.2, taking the starting point of the channel as the Fe atom of a catalytic center, configuring input_1bu7.xml files, and enabling Linux and MacOS users to install Mono for use, wherein the commands are as follows: mono mole2. Exe./test_data/1 BU7/input_1BU7.Xml, the static channel calculation results of which are visualized as shown in FIGS. 2, 3 and 4.
In this example, the dynamic information of the protein total atoms was calculated using Normal Mode Analysis (NMA). NMA can be broadly divided into two categories: force field based methods and network based methods. Force field based methods utilize detailed molecular interactions, often yielding more accurate patterns than network based methods. However, traditional force field based methods require minimal initial capability, which distorts the structure. The network-based approach is typically an Elastic Network (ENM), this step is bypassed.
ENM is an effective means of processing and analyzing proteins by biophysical methods, which essentially is a variational expansion of hooke's law of elasticity, and characterizes the intrinsic kinetic properties of proteins from the high-dimensional aspect of the network. The system not only needs to consider the motion amplitude information of each atom, but also needs to consider the motion directions of each atom, and an Anisotropic Network Model (ANM) is selected for calculation.
The calculation was performed using ANM, and the input file was a protein three-dimensional structure PDB file, and mainly contains information such as position coordinates of atoms and residues constituting the protein. The ANM computing core generation flow is shown in fig. 5.
The new coordinates calculated by NMA are called normal coordinates (Normal Mode Coordinates, NMC), one NMC representing one vibration mode. Assuming that the atomic A starting position coordinates are (40, 50, 60), and NMC calculated by NMA is (5, 6), then the atomic A range of motion is:
x∈[35,45],y∈[45,55],z∈[54,66]
generalizing the problem, let the atomic A coordinate be (x 0 ,y 0 ,z 0 ) NMC calculated by NMA was (m x ,m y ,m z ) Then the range of motion of atom a is:
x∈[x 0 -m x ,x 0 +m x ],y∈[y 0 -m y ,y 0 +m y ],z∈[z 0 -m z ,z 0 +m z ]
then the atomic a coordinate at time t is:
x t =x 0t m x ,y t =y 0t m y ,z t =z 0t m z
wherein alpha is t ∈[-1,1]And is a one-bit decimal.
Setting a random number sequence for generating alpha i Representing the values of different instants t i Each atom is in a random position within its respective range. Then t 1 Time sum t 2 The time atomic positions are:
x 1 =x 01 m x ;y 1 =y 01 m y ;z 1 =z 01 m z
x 2 =x 02 m x ;y 2 =y 02 m y ;z 2 =z 02 m z
calculating t 1 To t 2 Position change amount at time:
Figure SMS_4
Figure SMS_5
Figure SMS_6
then at t 1 To t 2 At the moment, atoms pass in the x-axisIs (alpha) 21 )m x Taking the boundary value alpha in extreme cases 2 =1,α 1 = -1 then the distance an atom travels along the x-axis is 2m x Final position x 1 +2m x
Substituting the formula to calculate specifically: x is x 1 =x 0 -m x And x is 2 =x 1 +2m x =x 0 +m x
Since each time it is based on the initial position x 0 The calculation is performed, and the advantage of the processing is that the judgment of the boundary value of the reciprocating motion is not needed, so that the atomic position at each moment is ensured to be in the motion range.
As a specific example, the atomic and NMA coordinate information is shown in Table 1, assuming the current system generated random number sequence is [0.10.8-0.4 … ].
TABLE 1 atomic motion information analysis table
Figure SMS_7
t 1 From time to t 2 Time t 2 From time to t 3 The amount of change in displacement in the x-direction at time is:
Figure SMS_8
the period T of the simple harmonic motion refers to the time it takes for an atom to start at the starting position, go through a complete round trip, and return to that point. The positions of the atoms in their periodic divisions are shown in table 2.
Table 2 motion cycle information table
Figure SMS_9
In different periods, the change amount of the position of the atom relative to the initial position is as follows:
Figure SMS_10
taking the position change information in the x-axis direction as an example, t 0 (initial time) to t 1 At the moment, the atomic displacement is 0.1m x ,t 1 To t 2 At the moment, the atomic displacement is 0.7m x Corresponding to t 0 From time to t 2 A total displacement of 0.8m x . And at t 2 To t 3 At the moment, the atomic displacement variation is calculated to be-1.2 m x Can be decomposed into 0.2m x +(-1.4m x ). 0.8m in front x On the basis of (1), firstly go through 0.2m x Reach m x I.e.
Figure SMS_11
At this point, the forward motion boundary has been reached, and a reverse displacement is required. The remaining-1.4 m x The calculation may be performed directly. Can also be decomposed into-1.0 m x +(-0.4m x ) The front represents the reverse movement m x Distance, i.e. reach
Figure SMS_12
The position, also the initial position, is then continued to move in the reverse direction for 0.4m x Is a displacement of (a). It can be seen that although t is computationally 3 The moment is only according to alpha 3 -0.4, reverse motion 0.4m x But contains information about the change from the previous time position to the time position. The calculation results are visualized as shown in fig. 6 below.
In this embodiment, dynamic information of a part of atoms constituting a channel is screened by the python script. Based on the data stored in the file: atomic information that makes up each channel has been obtained in the tunnel1_structure.pdb and stored in a file: the motion information of all atoms of the protein is calculated by the 1bu7_anm_modes.nmd, the atom numbers of all the component channels are extracted from the tunnel1_structure.pdb file, the coordinates of all the atoms and the motion mode coordinates thereof are extracted from the 1bu7_anm_modes.nmd file, and then the coordinates and the motion information corresponding to the atoms of the component channels are screened. The atomic dynamics around channel 1 of 1bu7 are shown in fig. 7. The arrow direction represents the direction of movement and the length represents the magnitude of movement.
In this embodiment, for convenience of collision detection, the ligand molecules and the total atoms are abstracted into spheres, as shown in fig. 8. The target problem is modeled in the environment by using a grid method under three-dimensional space, as shown in fig. 9. The conventional grid method determines whether the corresponding grid can move by judging whether barriers exist in the grids around the target point or not, and because the size of the barriers (atoms) in the target environment is far smaller than the size of the grid units and each small atom is in a continuous motion state at all times, judging whether the grids around the target environment can move according to a conventional thought is unsuitable, so that each grid in the target environment is only used as a minimum moving unit for ligand molecules to move without considering whether the barriers exist on the moving path. That is, the ligand molecule does not know the next position before each step of movement and can not encounter an obstacle, and the environment can fully play the advantage of enabling an intelligent body to sense an unknown environment through reinforcement learning. Whether the current position is allowed to move or not is realized by adopting a collision detection mode, if the collision is detected, the current position is proved to be unable to move, otherwise, the current position can normally pass.
Since most ligand molecules exhibit an irregular three-dimensional structure, it is necessary to wrap them with a cylinder and then wrap them with a three-dimensional AABB bounding box. The size of this bounding box is the size of one grid cell, and the cell flow is shown in fig. 10.
A collision detection algorithm is introduced into the target environment, and each small sphere is abstracted into a collision body. The real-time detection of atomic collisions at each moment in the target environment is realized by means of the currently mainstream open source detection library FCL. At this time, only big spheres and small spheres are left in the system, and only whether collision occurs to the big spheres and the small spheres is detected.
And 2, initializing the environment. Initializing state space, action space and punishment functions of reinforcement learning.
In this embodiment, the environment modeling is performed on the target problem by using a three-dimensional space lower grid method, the state space S and the action space a are both limited sets, and after the rasterization processing, the action space is six, and the state space is not large, so that the Q-learning algorithm is selected as a main body to perform the construction of the reinforcement learning model. The concrete explanation is as follows:
state space: because the target environment is constructed by adopting a grid method, the state space can be determined by coding grids, the grid numbers correspond to the state numbers one by one, and the state S t The grid position number of the ligand molecule at the time t is shown, and the grid is numbered from left to right and from top to bottom. The number of grid units depends on the size of the ligand molecule itself and the size of the space occupied by atoms around the channel. The larger the number of grid units, the higher the path planning accuracy, but at the same time, the larger the problem scale, the slower the algorithm convergence speed.
Action space: since the collision detection algorithm based on the AABB bounding box is adopted in the modeling, and the size of the bounding box is approximate to the size of a grid unit, the specific attitude problem and the movement angle problem of ligand molecules do not need to be considered at present, and each three-dimensional grid is a minimum mobile unit. The length x of each step of movement of the ligand molecule is the diameter of the inclusion and is also the width of the square grid. The adoption of the gridding environment greatly simplifies the moving direction of the moving body, only six types of front, back, upper, lower, left and right are needed to be considered, and the coordinate value is based on a right-hand space rectangular coordinate system, so that the action space information is shown in the table 3.
Table 3 action space information table
Figure SMS_13
Prize and punish function: the first goal is to have the ligand molecule successfully reach the end of the channel, so a relatively large prize needs to be placed on the end. Since the boundary limit is increased when the map is constructed in a rasterization manner, the ligand molecules have a large probability of going out of the boundary when the exploration is just started, and then the subsequent exploration is of no significance, so that a large penalty is given to the out-of-range state. If the collision of the ligand molecule and atom indicates that the path has an obstacle to prevent the path from proceeding, the path has failed to explore, and also gives a large penalty and directly terminates the round of trial at the same time, and a new round of exploration is directly started from the beginning.
The reward and punishment function is described as follows:
Figure SMS_14
state transition: the grid method ensures that the state space number and the three-dimensional coordinate value are uniquely corresponding, so that the position coordinate of the next moment can be calculated through the coordinate of the current moment and the adopted action, then the corresponding state number is found according to the position coordinate, and whether the boundary is found or not because the coordinate value outside the boundary does not have the corresponding state space number can be judged according to the corresponding relation between the new position and the state number.
And 3, super parameter setting. Super parameters in models such as learning rate, rewarding attenuation factor, exploration rate and the like.
In this embodiment, in general, a larger exploration rate e is needed to update the cost function faster in the early stage of the timing differential offline control algorithm Q-learning, and an optimal path is selected by experience to ensure the convergence of the result in the later stage by reducing the exploration rate. The exploration rate in the epsilon-greedy method is always kept unchanged, which greatly influences the convergence of the model in the later stage. In order to solve the problem, an exponential decay factor is introduced into the system aiming at the exploration rate, so that a relatively high exploration rate can be set at the beginning of training, the exploration rate is continuously decayed along with the progress of training, the exploration rate is reduced to be particularly low at the later stage, and the complete convergence of the model is ensured, as shown in fig. 11.
If small molecules collide with other atoms, the path is proved to have an obstacle, the training of the round fails, and the convergence chart of the reward value is shown in fig. 12 (a). During training, it was found that: although collision occurs in the searching process, small molecules can be used for searching continuously, and atoms collide in a certain state, the subsequent searching is still valuable, so that the number of training rounds required for convergence can be greatly reduced.
The small molecules are arranged to collide with atoms, and then the continuous exploration is not stopped, and the experimental results show that: this approach, while being slow at the beginning of the training, can find a steadily reachable path faster, indeed can bring the model to convergence faster, as shown in fig. 12 (b). After the small molecules collide with atoms and continue to be explored, the previous single-round training is slow, but the total training time is reduced compared with the previous training time.
The final hyper-parameters settings of the model are shown in table 4.
Table 4 model hyper-parameter information table
Figure SMS_15
And 4, initializing the state. The ligand molecule and each atom position and state-action Q table are initialized and training is started.
And 5, selecting actions. At the current state s according to a given policy t Selecting a specific action a of the ligand molecule, applying the action a to the environment to obtain corresponding feedback r, and remembering a new state s t+1 (no state transition has occurred at this time).
Step 6:Q table updates. The state s is updated according to the updated formula of Q-learning t And updating by adopting the Q value of the position where the action a is positioned.
And 7, boundary detection. If ligand molecules s t+1 The location corresponding to the state has already been bordered by the environment, then this action is deemed invalid and s is returned again t Status, i.e. let s t+1 =s t . While the environment feeds back a penalty to the ligand molecule, this action is avoided later on in this state.
And 8, updating the state. Environment from state s t Entering a new state s t+1
And 9, collision detection. For new state s t+1 The positions of the ligand molecules and the positions of the atoms are updated, and the system is realizedCollision detection is performed. If the ligand molecule collides with Fe, the end point is successfully reached, the training is finished, and the step 4 is directly skipped to restart a new training round. Otherwise, go back to step 5 to continue the search.
And 10, outputting the optimal path step number, the obtained total return value and the path selection information until all training rounds are finished.
Through the operation, a simulation environment of the protein dynamic ligand channel is constructed at present, and a reasonable reinforcement learning algorithm flow is set for training and learning the target problem. All the processes are programmed and realized through Python, and a large number of experiments show that for any protein with ligand channels, the method can complete the calculation of static channels, the calculation of atom dynamic property, the environment simulation of dynamic ligand channels and the training of reinforcement learning models in a very fast time, and finally an optimal path of dynamic ligand channels reaching a catalytic active center can be obtained, so that the method has a certain engineering reference value.
Three channels of 1BU7 were analyzed and the results are shown in table 5 and fig. 13.
TABLE 5 BU7 channel analysis comparison Table 1
Figure SMS_16
The above description is merely of the preferred embodiments of the method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning disclosed in the present invention, and is not intended to limit the scope of the examples of the present specification. Any modification, equivalent replacement, improvement, or the like made within the spirit and principles of the embodiments of the present specification should be included in the protection scope of the embodiments of the present specification.

Claims (10)

1. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning is characterized by comprising the following steps:
acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamic properties of partial atoms forming the channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment;
constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, intelligent agents, actions and rewards, and the state data comprises position information of ligand molecules and partial atoms forming a channel; inputting the previous state data into a reinforcement learning model, wherein the reinforcement learning model determines action data corresponding to the previous state data, and the intelligent agent interacts with a simulation environment according to the action data to obtain an interaction result, wherein the interaction result is next state data corresponding to the action data and reward data corresponding to ligand molecules under the action data;
when the ligand molecule leaves the boundary of the three-dimensional space grid map, giving a negative feedback value to the current state and returning to the previous state to continue training until the ligand molecule reaches the channel end point or the maximum exploration step number of a single round is reached, ending the training round, and outputting the path step number of the ligand molecule from the channel start point to the channel end point and the obtained total return value in the training round; and continuing the next training until all training rounds are finished, and outputting the path with the minimum path steps and the highest return value in all training rounds, namely the optimal path of the protein dynamic ligand channel.
2. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method according to claim 1, further comprising the steps of: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.
3. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method according to claim 2, wherein static ligand channel information of the protein is acquired using mobile or cap software.
4. The reinforcement learning-based optimal path analysis method for a protein dynamic ligand channel according to claim 2, wherein the full-atom dynamic information of the protein is obtained by using a normal model analysis or a molecular dynamics simulation method.
5. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning according to claim 1, wherein the method for introducing the three-dimensional space under-grid method comprises the following steps:
constructing a three-dimensional space grid map, wherein the size of grid units in the three-dimensional space grid map depends on the size of ligand molecules, and the number of grid units in the three-dimensional space grid map depends on the size of the space occupied by atoms around an obtained static ligand channel;
and placing the ligand molecules at the starting point of the channel, wherein each grid unit in the three-dimensional space grid map is the minimum movement unit of the ligand molecules.
6. The method for optimal path analysis of a protein dynamic ligand channel based on reinforcement learning according to claim 5, further comprising the steps of, before the atomic collision detection mechanism:
judging whether the three-dimensional structure shape of the ligand molecule is a regular sphere, if so, wrapping the ligand molecule by a sphere, adopting a sphere bounding box, and enabling the grid unit to be a sphere external cube; if not, adopting a cylinder to wrap ligand molecules, adopting a three-dimensional AABB bounding box, and adopting a grid unit as a bounding box cube;
and abstracting part of atoms forming the channel into spheres, and detecting collision between ligand molecules and each atom in the three-dimensional space grid map in real time.
7. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning according to claim 5, wherein the action data is a state value corresponding to a three-dimensional space coordinate value, and the state data is a front-back, a left-right, a right-front, and a left-back in the three-dimensional space grid map.
8. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning according to claim 1, wherein the reinforcement learning model adopts a time sequence difference offline control algorithm Q-learning.
9. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method according to claim 7, further comprising the steps of: and introducing an exponential decay factor to gradually reduce the exploration rate in the reinforcement learning model along with the continuous increase of time.
10. The method for analyzing optimal paths of protein dynamic ligand channels based on reinforcement learning according to claim 1, wherein the reward data corresponding to the ligand molecules is obtained according to a reward and punishment function, and the reward and punishment function is described as follows:
Figure FDA0003748341110000021
CN202210836151.5A 2022-07-15 2022-07-15 Protein dynamic ligand channel optimal path analysis method based on reinforcement learning Active CN115328117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210836151.5A CN115328117B (en) 2022-07-15 2022-07-15 Protein dynamic ligand channel optimal path analysis method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210836151.5A CN115328117B (en) 2022-07-15 2022-07-15 Protein dynamic ligand channel optimal path analysis method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN115328117A CN115328117A (en) 2022-11-11
CN115328117B true CN115328117B (en) 2023-07-14

Family

ID=83917252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210836151.5A Active CN115328117B (en) 2022-07-15 2022-07-15 Protein dynamic ligand channel optimal path analysis method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115328117B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187249A (en) * 2021-12-03 2022-03-15 大理大学 Breast cancer image identification and classification method based on deep neural network

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002249545B2 (en) * 2002-03-26 2007-10-18 Council Of Scientific And Industrial Research Method for building optimal models of 3-dimensional molecular structures
MX2012005015A (en) * 2009-10-30 2012-06-12 Prometheus Lab Inc Methods for diagnosing irritable bowel syndrome.
CN104657626A (en) * 2015-02-25 2015-05-27 苏州大学 Method for establishing protein-protein interaction network by utilizing text data
WO2019028269A2 (en) * 2017-08-02 2019-02-07 Strong Force Iot Portfolio 2016, Llc Methods and systems for detection in an industrial internet of things data collection environment with large data sets
CN108363908B (en) * 2017-02-16 2022-04-01 北京毅新博创生物科技有限公司 Intelligent spectroscopy system for detecting biomolecules
US10546237B2 (en) * 2017-03-30 2020-01-28 Atomwise Inc. Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel
US11345040B2 (en) * 2017-07-25 2022-05-31 Mbl Limited Systems and methods for operating a robotic system and executing robotic interactions
WO2020055810A1 (en) * 2018-09-10 2020-03-19 Fluidigm Canada Inc. Autofocus sample imaging apparatus and method
CN111694365B (en) * 2020-07-01 2021-04-20 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning
CN112669434B (en) * 2020-12-21 2022-05-03 山东华数智能科技有限公司 Collision detection method based on grid and bounding box
CN113110509B (en) * 2021-05-17 2023-02-28 哈尔滨工业大学(深圳) Warehousing system multi-robot path planning method based on deep reinforcement learning
CN113358686A (en) * 2021-06-07 2021-09-07 广西大学 Method for analyzing pyrolysis reaction path of palm oil or vegetable insulating oil through molecular simulation and application of method to DGA thermal fault diagnosis of transformer oil
CN114333986A (en) * 2021-09-06 2022-04-12 腾讯科技(深圳)有限公司 Method and device for model training, drug screening and affinity prediction
CN114153216B (en) * 2021-12-14 2023-10-03 浙江大学湖州研究院 Lunar surface path planning system and method based on deep reinforcement learning and block planning
CN114464270A (en) * 2022-01-17 2022-05-10 北京工业大学 Universal method for designing medicines aiming at different target proteins

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187249A (en) * 2021-12-03 2022-03-15 大理大学 Breast cancer image identification and classification method based on deep neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Development and evaluation of a deep learning model for protein–ligand binding affinity prediction;Marta M. Stepniewska-Dziubinska等;《Bioinformatics》;第34卷(第21期);第3666-3674页 *
Emission reduction benefits and efficiency of e-waste recycling in China;Haichao Yang等;《Waste Management》(第102期);第541-549页 *
Reinforcement Learning Based Approach for Ligand Pose Prediction;Justin Jose等;《bioRxiv》;第1-7页 *
Simulation and Machine Learning Methods for Ion-Channel Structure Determination, Mechanistic Studies and Drug Design;Zhengdan Zhu等;《Frontiers in Pharmacology》(第13期);第1-21页 *

Also Published As

Publication number Publication date
CN115328117A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
Borrelly et al. The ORCCAD architecture
US20020040291A1 (en) Method of emulating machine tool behavior for programmable logic controller logical verification system
Arrieta et al. Search-based test case generation for cyber-physical systems
CN110929422B (en) Robot cluster simulation method and device
Toma et al. Pathbench: A benchmarking platform for classical and learned path planning algorithms
EP3884345A1 (en) Method and system for predicting motion-outcome data of a robot moving between a given pair of robotic locations
CN115328117B (en) Protein dynamic ligand channel optimal path analysis method based on reinforcement learning
Xia et al. Virtual comissioning of manufacturing system intelligent control
US11886174B2 (en) Virtualized cable modeling for manufacturing resource simulation
Basye et al. Learning dynamics: System identification for perceptually challenged agents
Ryu et al. Application of moving objects and spatiotemporal reasoning
Neumann The reinforcement learning toolbox, reinforcement learning for optimal control tasks
CN112131754A (en) Extended POMDP planning method and system based on robot accompanying behavior model
Winkler et al. Identifiying nonlinear model structures using genetic programming techniques
Likhachev et al. Planning for Markov decision processes with sparse stochasticity
CN113918133B (en) Optimal control problem solver construction method and system
JP2002149717A (en) Structure optimizing method and recording medium recorded with structure optimizing program
Tanner Multi-agent car parking using reinforcement learning
CN106156437B (en) The automatic structured data for calculating product is realized by interconnection constraint multiaxis simulation figure
Shiltagh et al. A comparative study: Modified particle swarm optimization and modified genetic algorithm for global mobile robot navigation
JP2000339012A (en) Method for planning route for global operation route of robot and its controller
CN109446566A (en) Reinforcing bar intelligent barrier avoiding arrangement method at a kind of node based on intensified learning
Talbi et al. Parallel Cooperating Genetic Algorithms: An application to robot motion planning
Muise et al. Executing Contingent Plans: Addressing Challenges in Deploying Artificial Agents
JP2001067125A (en) Method and device for constructing real world information database and method for learning autonomous mobile traveling body

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant