CN115328117A

CN115328117A - Protein dynamic ligand channel optimal path analysis method based on reinforcement learning

Info

Publication number: CN115328117A
Application number: CN202210836151.5A
Authority: CN
Inventors: 羊海潮; 景怡乐; 李毅
Original assignee: Dali University
Current assignee: Dali University
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2022-11-11
Anticipated expiration: 2042-07-15
Also published as: CN115328117B

Abstract

The invention discloses a protein dynamic ligand channel optimal path analysis method based on reinforcement learning, which comprises the following steps: acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamic properties of partial atoms forming a channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment; constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, agents, actions and rewards; when the ligand molecules leave the boundary of the three-dimensional space grid map, giving a negative feedback value to the current state and returning to the previous state for continuous training until the channel end point is reached or the maximum exploration step number of a single round is reached, ending the training of the round and outputting the training result of the round; and continuing the next round of training until all the training rounds are finished, and outputting the optimal path of the protein dynamic ligand channel. The invention can assist ligand molecules to find the optimal path reaching the catalytic activity center position, and has certain engineering application reference value.

Description

Protein dynamic ligand channel optimal path analysis method based on reinforcement learning

Technical Field

The invention relates to the technical field of reinforcement learning, in particular to a protein dynamic ligand channel optimal path analysis method based on reinforcement learning.

Background

The main research methods of the protein ligand channel include Stretching Molecular Dynamics (SMD), random Accelerated Molecular Dynamics (RAMD), crystal thermodynamic factor method, calculation method and the like. Due to the problem of biological experiment cost, the current calculation method plays a key role in the field of ligand channel identification. Channel calculation methods generally fall into two categories: 1) A method of incorporating ligand molecules into a channel recognition process; 2) A method for calculating channels based on protein structure only, as follows:

in the method for incorporating ligand molecules into the channel recognition process, specific ligand molecules are explicitly considered, and possible channels in the protein structure are detected by the techniques such as molecular dynamics simulation and the like. This approach requires a significant amount of resources for the simulations, and these simulations are not always used to find successful channel exits and may result in unrealistic channels.

In the method for calculating the channel only based on the protein structure, ligand molecules are not considered in advance, the protein structure is converted into a problem which can be processed geometrically, and the task of channel identification is simplified. The advantage of this method is that the computation time and computation cost is usually small and detailed information of the channel size and surrounding atoms can be generated. The most representative of them is

M et al, in 2006. CAVER works by exploring the grid nodes using Dijkstra's algorithm, which is used to select the shortest and cheapest paths (i.e., those with the lowest cost). The CAVER is mainly used for analyzing a protein structure channel and providing quick, accurate and full-automatic calculation for a static channel, the occurrence of the CAVER plays an important role in the research of drug design and molecular enzymology, but the CAVER has the problems of high requirements on processor time and memory and system continuous errors when a large channel is explored.

M and its team proposed in 2007 a mol algorithm based on Voronoi grids. MOLE overcomes many of the disadvantages and limitations of CAVER. Although the core of the MOLE algorithm is also the Dijkstra path search algorithm, the authors apply it to Voronoi diagrams (networks), and for the testing of various biomolecular systems,including potassium channels, cytochrome P450 ligand channels, have all proven this algorithm to work well. 2013. In the year, david et al performed an algorithmic upgrade to MOLE, presenting MOLE2.0 version, which new version could identify more relevant channels, and benchmark tests with other tools showed that MOLE2.0 is faster, more robust, and more versatile. 2017. Berka K, in turn, introduced the MOLE version 2.5, which not only calculates channels but also detects pores. This is also the latest version of the current mol. With the continual updating of iterations in recent years, MOLE has gradually become a gold standard for rapid detection of channels in biomacromolecular structures.

However, the channels calculated by the CAVER software and the MOLE software are static, and due to the existence of protein dynamics, atoms actually composing the channels are in continuous motion, so that the actual positions of the channels are correspondingly changed. The dynamic information of the protein is not considered in the current channel calculation method.

Disclosure of Invention

In order to solve the technical problem, the invention provides a protein dynamic ligand channel optimal path analysis method based on reinforcement learning. The problem of the dynamic property of a protein ligand channel is considered on the basis of a static channel, so that the problem of the ligand channel is more in line with the actual situation. The dynamic property of protein means that the spatial conformation of protein can be changed in a certain microenvironment, and various conformational isomers can regulate the folding and the function of protein.

In order to achieve the purpose, the technical scheme of the invention is as follows:

the protein dynamic ligand channel optimal path analysis method based on reinforcement learning comprises the following steps:

acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamics of partial atoms forming a channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment;

constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, agents, actions and rewards, and the state data comprises position information of ligand molecules and part of atoms forming a channel; inputting previous state data into a reinforcement learning model, wherein the reinforcement learning model determines action data corresponding to the previous state data, the intelligent agent interacts with a simulation environment according to the action data to obtain an interaction result, and the interaction result is next state data corresponding to the action data and reward data corresponding to ligand molecules under the action data;

when the ligand molecules leave the boundary of the three-dimensional space grid map, a negative feedback value is given to the current state, the previous state is returned for continuous training until the ligand molecules reach the channel end point or the maximum exploration step number of a single round is reached, the training of the round is finished, and the path step number of the ligand molecules from the channel starting point to the channel end point in the training of the round and the total obtained return value are output; and continuing the next round of training until all the training rounds are finished, and outputting the path with the least path steps and the highest return value in all the training rounds, namely the optimal path of the protein dynamic ligand channel.

Preferably, the method further comprises the following steps: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.

Preferably, the static ligand channel information of the protein is obtained using the MOLE or carver software.

Preferably, full atomic dynamics information of the protein is obtained using normal mode analysis or molecular dynamics simulation methods.

Preferably, the introducing the grid method under the three-dimensional space for processing specifically comprises the following steps:

constructing a three-dimensional space grid map, wherein the size of grid cells in the three-dimensional space grid map depends on the size of ligand molecules, and the number of the grid cells in the three-dimensional space grid map depends on the size of space occupied by atoms around the obtained static ligand channel;

and placing the ligand molecules at the starting point of the channel, wherein each grid unit in the three-dimensional space grid map is the minimum moving unit of the ligand molecules.

Preferably, before the atomic collision detection mechanism, the method further comprises the following steps:

judging whether the three-dimensional structure shape of the ligand molecule is a regular sphere or not, if so, wrapping the ligand molecule through the sphere by adopting a sphere surrounding box, wherein the grid unit is a cube externally connected with the sphere; if not, adopting a cylinder to wrap ligand molecules, adopting a three-dimensional AABB bounding box, and taking a grid unit as a bounding box cube;

abstracting partial atoms forming the channel into a sphere, and detecting the collision of ligand molecules and each atom in the three-dimensional space grid map in real time.

Preferably, the motion data is up, down, left, right, front, back, and back in a three-dimensional space grid map, and the state data is a state value corresponding to a three-dimensional space coordinate value.

Preferably, the reinforcement learning model adopts a time sequence difference off-line control algorithm Q-learning.

Preferably, the method further comprises the following steps: and introducing an exponential decay factor, so that the exploration rate in the reinforcement learning model is gradually reduced along with the continuous increase of time.

Preferably, the reward data corresponding to the ligand molecule is obtained according to a reward and punishment function, which is described as follows:

based on the technical scheme, the invention has the beneficial effects that: according to the method, based on the outstanding static channel calculation capacity of the MOLE, the dynamic information of the protein is calculated by using normal mode analysis, the dynamic channel information of the protein is obtained by combining, then based on the reinforcement Learning environment Gym, the grid method is improved by introducing the collision detection algorithm based on the bounding box, the time sequence difference offline control algorithm Q-Learning is adopted for Learning, an exponential decay factor is introduced aiming at the exploration rate, a reward and penalty function is designed to avoid dynamic atom collision, the catalytic activity center position can be reached within the shortest step, and the method has a certain engineering application reference value.

Drawings

FIG. 1 is a schematic diagram of the structure of a protein dynamic ligand channel optimal path analysis method based on reinforcement learning in one embodiment;

FIG. 2 is a diagram of all channels of 1bu7 in one embodiment;

FIG. 3 is a diagram of atomic figures around all channels of 1bu7 in one embodiment;

FIG. 4 is a diagram of atomic diagram around each channel of 1bu7 in FIG. 3;

FIG. 5 is a flow diagram of ANM calculation in one embodiment;

FIG. 6 is a dynamic display of the 1bu7 protein of one example;

FIG. 7 is a dynamic representation of atoms surrounding a 1bu7 channel in one embodiment;

FIG. 8 is a schematic diagram of a three-dimensional environment grid method in one embodiment;

FIG. 9 is a flow diagram of constructing a grid cell in one embodiment;

FIG. 10 is a schematic structural diagram of a reinforcement learning-based protein dynamic ligand channel optimal path analysis method in one embodiment;

FIG. 11 is a graph of convergence for different quest rates in one embodiment;

FIG. 12 is a comparison graph of reward value convergence in one embodiment;

FIG. 13 is an optimal path display diagram corresponding to different channels of 1bu7 in FIG. 4.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

As shown in fig. 1, the present embodiment provides a method for analyzing an optimal path of a protein dynamic ligand channel based on reinforcement learning, which includes the following steps:

step 1, environment creation. And combining the static ligand channel information calculated by the MOLE software and the full atom dynamic information calculated by the NMA to generate the dynamic information of atoms forming the channel, namely the dynamic information of the ligand channel. And rasterization processing of the target environment and introduction of an atomic collision detection mechanism are carried out on the basis of the python script.

In this example, the protein static ligand channels were calculated by the MOLE software. The core of the localization deployment of the MOLE is to configure an XML file for calculation, set a proper diameter for channel filtering, and set a channel endpoint coordinate for path calculation of a channel. The MOLE supports different results of output multi-formats, and PDBProfile, PDBSstructure and PyMOL formats are mainly used in the system. The PDBPprofile output format is a PDB file, and data such as a channel center coordinate, a channel radius and the like are stored. The PDBSstructure output format is a PDB file, and the peripheral atom information of the channel is stored. The PyMOL output format is a Python file, which is mainly used for quickly visualizing a channel in a PyMOL tool. Taking 1BU7 as an example, setting the minimum radius of a channel to be 1.2, setting the starting point of the channel to be a Fe atom of a catalytic center, configuring an input _1bu7.Xml file, and enabling Linux and MacOS users to install Mono to use the file, wherein the command is as follows: mono move 2. Exe./Test _ data/1BU7/input _1bu7.Xml, and the static channel calculation results are visualized as shown in fig. 2, fig. 3 and fig. 4.

In this example, the kinetic information of all atoms of the protein was calculated using Normal Model Analysis (NMA). NMA can be roughly divided into two categories: force field based methods and network based methods. Force field-based methods, which utilize detailed molecular interactions, generally produce more accurate patterns than network-based methods. However, conventional force field based approaches require initial capability minimization, which distorts the structure. Network-based methods, typically Elastic Networks (ENMs), bypass this step.

ENM is an effective means for processing and analyzing protein by a biophysical method, and is essentially a phase change expansion of Hooke elasticity law, and the intrinsic kinetic properties of the protein are characterized from a high-dimensional level of a network. The system needs to consider the motion amplitude information of each atom, also needs to consider the motion directions of the atoms, and selects an Anisotropic Network Model (ANM) for calculation.

The calculation is performed using ANM, and the input file is a protein three-dimensional structure PDB file mainly including information such as position coordinates of atoms and residues constituting the protein. The ANM computation core generation flow is shown in fig. 5.

The new Coordinates calculated by NMA are called Normal Coordinates (NMC), one NMC representing one vibration Mode. Assuming that the coordinates of the starting position of the atom a are (40,50,60) and the NMC calculated by NMA is (5,5,6), the range of motion of the atom a is:

x∈[35,45],y∈[45,55],z∈[54,66]

generalizing the problem, let the atomic A coordinate be (x) ₀ ,y ₀ ,z ₀ ) NMC calculated by NMA is (m) _x ,m _y ,m _z ) Then the range of motion of atom a is:

x∈[x ₀ -m _x ,x ₀ +m _x ],y∈[y ₀ -m _y ,y ₀ +m _y ],z∈[z ₀ -m _z ,z ₀ +m _z ]

then the atomic a coordinate at time t is:

x _t ＝x ₀ +α _t m _x ,y _t ＝y ₀ +α _t m _y ,z _t ＝z ₀ +α _t m _z

wherein alpha is _t ∈[-1,1]And is a one-digit decimal.

Setting a random number sequence for generating alpha _i Represents different times t _i Each atom is in a random position within its respective range. Then t ₁ Time and t ₂ The time atom positions are as follows:

x ₁ ＝x ₀ +α ₁ m _x ；y ₁ ＝y ₀ +α ₁ m _y ；z ₁ ＝z ₀ +α ₁ m _z

x ₂ ＝x ₀ +α ₂ m _x ；y ₂ ＝y ₀ +α ₂ m _y ；z ₂ ＝z ₀ +α ₂ m _z

calculating t ₁ To t ₂ Amount of change in position at time:

then at t ₁ To t ₂ At that time, the distance that the atom travels on the x-axis is (α) ₂ -α ₁ )m _x Taking the boundary value alpha in extreme cases ₂ ＝1，α ₁ = 1 then the distance travelled by the atom along the x-axis is 2m _x The final position is x ₁ +2m _x 。

Substituting into a formula to calculate specifically: x is the number of ₁ ＝x ₀ -m _x And x ₂ ＝x ₁ +2m _x ＝x ₀ +m _x 。

Since each time is based on the initial position x ₀ The calculation is carried out, and the processing has the advantages that the judgment of the boundary value of the reciprocating motion is not needed, and the atom position at each moment is ensured to be in the motion range.

Specifically, for example, the random number sequence generated by the current system is [0.10.8-0.4 … ], and the atom and NMA coordinate information is shown in table 1.

TABLE 1 atomic motion information analysis Table

t ₁ Time t ₂ Time of day, t ₂ Time t ₃ The amount of change in displacement in the x direction at time is:

the period T of the simple harmonic motion refers to the time required for an atom to start at the starting position, go through a complete round trip, and return to that point. The positions of atoms in their periodic divisions are shown in table 2.

TABLE 2 motion period information Table

The variation of the position of the atom from its initial position at different periods is as follows:

wherein, taking the position change information in the x-axis direction as an example, t ₀ (initial time) to t ₁ At time, the atomic displacement is 0.1m _x ，t ₁ To t ₂ At time, the atomic displacement is 0.7m _x Equivalent to at t ₀ Time t ₂ A total displacement of 0.8m _x . And at t ₂ To t ₃ Calculating the atom displacement variation of-1.2 m at the time _x Can be decomposed to 0.2m _x +(-1.4m _x ). 0.8m in the front _x On the basis of the total weight of the powder, firstly passes through 0.2m _x Reaches m _x I.e. by

Has reached the forward motion boundary, the displacement needs to be reversed. The remaining-1.4 m _x The calculation can be performed directly. Can also be decomposed to-1.0 m _x +(-0.4m _x ) The front representing a reverse movement m _x Distance, i.e. reach

Position, also the initial position, and then continue the reverse movement by 0.4m _x Displacement of (2). It can be seen that although t is calculated ₃ The time being only according to alpha ₃ = -0.4, reverse movement 0.4m _x But includes information on the change from the position at the previous time to the position at that time. The calculation results are visualized as shown in fig. 6 below.

In this embodiment, dynamic information of part of atoms constituting a channel is screened by a python script. Based on the data stored in the file: pdb has obtained the atomic information that makes up each channel, and is stored in the file: motion information of all atoms of the protein is calculated in 1bu7 \u _ anm \. The dynamics of atoms around the channel 1 of 1bu7 are shown in FIG. 7. The direction of the arrow represents the direction of motion and the length represents the magnitude of motion.

In this embodiment, in order to facilitate the detection of collision, the ligand molecules and the whole atoms are abstracted into a sphere, as shown in fig. 8. The target problem is subjected to environment modeling by adopting a grid method in a three-dimensional space, as shown in fig. 9. The conventional grid method determines whether a corresponding grid can move by judging whether obstacles exist in the grids around a target point, and because the size of the obstacles (atoms) in a target environment is far smaller than that of a grid unit, and each small atom is in a continuous motion state at any moment, the conventional idea is inappropriate to judge whether the grids around the target point can move, so that each grid in the target environment only serves as a minimum moving unit for ligand molecules to move, and whether the obstacles exist on the moving path is not considered. That is, the ligand molecule does not know the position of the next step before moving every step, and the environment can fully exert the advantage that the intelligent agent can know the unknown environment through reinforcement learning. And if the position at the current moment is allowed to move, the current position is proved to be not movable by adopting a collision detection mode, otherwise, the current position can be normally passed.

Since most ligand molecules present irregular three-dimensional structures, they need to be wrapped with a cylinder and then surrounded with a tri-dimensional AABB bounding box. The size of this bounding box is the size of a grid cell, and the cell flow is shown in fig. 10.

And (3) introducing a collision detection algorithm into the target environment, and abstracting each small sphere into a collision body. The method realizes real-time detection of atomic collision at each moment in a target environment by means of the current mainstream open source detection library FCL. At the moment, only the large sphere and the small sphere are left in the system, and whether the large sphere and the small sphere collide or not only needs to be detected.

And 2, initializing the environment. And initializing a state space, an action space and a reward and punishment function of reinforcement learning.

In this embodiment, a grid method in a three-dimensional space is adopted to perform environment modeling on a target problem, a state space S and an action space a of the target problem are both finite sets, and after rasterization, only six action spaces are available, and the state space is not large, so that a Q-learning algorithm is selected as a main body to construct a reinforcement learning model. The concrete description is as follows:

state space: because the target environment is constructed by adopting a grid method, the state space can be determined by encoding the grid, the grid numbers correspond to the state numbers one by one, and the state S _t The grid positions where the ligand molecules are located at time t are numbered, and the grids are numbered from left to right and from top to bottom. The number of grid elements depends on the size of the ligand molecule itself and the size of the space occupied by the atoms surrounding the channels. The more the number of the grid units is, the higher the path planning precision is, but the larger the problem scale is caused, and the slower the convergence speed of the algorithm is.

An action space: because a collision detection algorithm based on an AABB bounding box is adopted in modeling, and the size of the bounding box is approximate to the size of a grid unit, the problems of specific postures and motion angles of ligand molecules do not need to be considered at present, and each three-dimensional grid is a minimum moving unit. The length x of each step of ligand molecule movement is the diameter of the inclusion, which is also the width of the cubic grid. The moving direction of the moving body is greatly simplified by adopting the rasterized environment, only six kinds of moving bodies, namely front, back, upper, lower, left and right need to be considered, the coordinate value is based on a right-hand space rectangular coordinate system, and the motion space information is shown in the table 3.

Table 3 action space information table

-a reward and punishment function: the first goal is to have the ligand molecule successfully reach the channel end point, so a relatively large reward needs to be placed on the end point. Since the boundary limit is equivalently increased when the map is constructed by rasterization, ligand molecules have a high probability of walking out of the boundary when the ligand molecules are just explored, and then the subsequent exploration has no meaning, so that a large penalty is given to the out-of-range state. If the collision of the ligand molecule and the atom shows that the path has obstacles to block the path, the path is failed to be explored, and a new round of exploration is directly started from the beginning, which also gives a large penalty and directly ends the round of attempt.

The reward and punishment function is described as follows:

and (3) state transition: the grid method ensures that the state space number and the three-dimensional coordinate value are uniquely corresponding, so that the position coordinate of the next moment can be calculated through the coordinate of the current moment and the adopted action, the corresponding state number can be found according to the position coordinate, and whether the boundary exists because the coordinate value outside the boundary has no corresponding state space number can be judged according to the corresponding relation of the new position and the state number.

And 3, setting the hyper-parameters. Learning rate, reward attenuation factor, exploration rate, etc.

In this embodiment, generally, a larger exploration rate belongs to a faster updating cost function in the early stage of the timing difference offline control algorithm Q-learning, and the exploration rate needs to be reduced to select an optimal path by experience in the later stage to ensure the convergence of the result. And the exploration rate in the epsilon-greedy method is always kept unchanged, so that the later convergence of the model is greatly influenced. Aiming at the problem, an exponential decay factor is introduced into the system aiming at the exploration rate, so that a higher exploration rate can be set at the beginning of training with confidence, the exploration rate is decayed continuously along with the training, the exploration rate is reduced to be particularly low at the later stage, and the complete convergence of the model is ensured, as shown in fig. 11.

If the small molecule and other atoms find collision, it indicates that there is an obstacle in this path, the training fails in this round, and the reward value convergence diagram is shown in fig. 12 (a). During the training process, the following are found: although collision has already occurred in the process of the search, small molecules can also continue to search, and although atoms have already collided in a certain state, the subsequent search is still valuable, and the number of training rounds required for convergence can be greatly reduced by adopting the method.

The small molecules are arranged to collide with atoms and then continue exploring without stopping, and experimental results show that: this approach, although initially slow in training, allows a stable reachable path to be found faster, and indeed allows the model to converge faster, as shown in fig. 12 (b). After the small molecule is set to collide with the atom and the exploration is continued, although the previous single-round training becomes slow, the total training time is reduced compared with the previous training.

The final hyper-parameter settings of the model are shown in table 4.

TABLE 4 model hyper-parameter information sheet

And 4, initializing the state. The ligand molecules and each atom position and state-action Q table are initialized and training begins.

And 5, selecting actions. At the current state s according to a given policy _t Selecting a specific action a of the ligand molecule, acting the action a on the environment to obtain corresponding feedback r, and simultaneously memorizing a new state s _t+1 (no state transition has occurred at this point).

Step 6:Q the table is updated. Updating formula according to Q-learning to state s _t Using the Q value of the position of action aAnd (6) updating.

And 7, detecting the boundary. If ligand molecules s _t+1 The position corresponding to the state is out of the boundary of the environment, the action is considered invalid, and the operation returns to s _t Status, i.e. order s _t+1 ＝s _t . At the same time the environment feeds back a penalty to the ligand molecule, then the choice of this action in this state is avoided later.

And 8, updating the state. Environment by state s _t Enter a new state s _t+1 。

And 9, collision detection. For new state s _t+1 And updating the positions of the ligand molecules and the positions of the atoms, and performing collision detection on the system. If the ligand molecule collides with Fe, the end point is successfully reached, the training of the round is finished, and the process directly jumps to the step 4 to restart a new round of training. Otherwise, the step 5 is continued to continue to be executed for the exploration.

And step 10, outputting the optimal path steps, the obtained total return value and the path selection information until all the training rounds are finished.

Through the operation, the simulation environment of the protein dynamic ligand channel is constructed at present, and a reasonable reinforcement learning algorithm flow is set for training and learning the target problem. The above processes are all realized by programming through Python, and a large number of experiments show that for any protein with a ligand channel, the method can complete the calculation of a static channel, the calculation of atomic dynamics, the environmental simulation of a dynamic ligand channel and the training of a reinforcement learning model in a short time, and finally can obtain an optimal path of the dynamic ligand channel reaching a catalytic activity center, and has a certain engineering reference value.

The results of the analysis of three channels of 1BU7 are shown in table 5 and fig. 13.

TABLE 5 1BU7 channel analysis and comparison table

The above description is only a preferred embodiment of the method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning disclosed in the present invention, and is not intended to limit the scope of the embodiments of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present disclosure should be included in the protection scope of the embodiments of the present disclosure.

Claims

1. The protein dynamic ligand channel optimal path analysis method based on reinforcement learning is characterized by comprising the following steps:

2. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 1, further comprising the steps of: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.

3. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method as claimed in claim 2, wherein the static ligand channel information of the protein is obtained using MOLE or CAver software.

4. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method as claimed in claim 2, wherein full atomic dynamic information of the protein is obtained by using normal model analysis or molecular dynamics simulation method.

5. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method as claimed in claim 1, wherein the introducing of the three-dimensional space grid-down method comprises the following steps:

6. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 5, wherein the atomic collision detection mechanism is preceded by the following steps:

7. The method as claimed in claim 5, wherein the motion data is a three-dimensional grid map, and the state data is a state value corresponding to a three-dimensional coordinate value.

8. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 1, wherein the reinforcement learning model employs a time-series difference offline control algorithm Q-learning.

9. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method of claim 7, further comprising the steps of: and introducing an exponential decay factor, so that the exploration rate in the reinforcement learning model is gradually reduced along with the continuous increase of time.

10. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method according to claim 1, wherein the reward data corresponding to the ligand molecules is obtained according to a reward-penalty function, and the reward-penalty function is described as follows: