CN115328117B

CN115328117B - Protein dynamic ligand channel optimal path analysis method based on reinforcement learning

Info

Publication number: CN115328117B
Application number: CN202210836151.5A
Authority: CN
Inventors: 羊海潮; 景怡乐; 李毅
Original assignee: Dali University
Current assignee: Dali University
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2023-07-14
Anticipated expiration: 2042-07-15
Also published as: CN115328117A

Abstract

The invention discloses a protein dynamic ligand channel optimal path analysis method based on reinforcement learning, which comprises the following steps: acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamic properties of partial atoms forming the channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment; constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, intelligent agents, actions and rewards; when the ligand molecules leave the boundary of the three-dimensional space grid map, giving a negative feedback value to the current state and returning to the previous state to continue training until the channel end point is reached or the maximum exploration step number of a single round is reached, ending the training of the round, and outputting the training result of the round; and continuing the next training until all training rounds are finished, and outputting an optimal path of the protein dynamic ligand channel. The invention can help ligand molecules to find the optimal path to the catalytic active center position, and has certain engineering application reference value.

Description

Protein dynamic ligand channel optimal path analysis method based on reinforcement learning

Technical Field

The invention relates to the technical field of reinforcement learning, in particular to a protein dynamic ligand channel optimal path analysis method based on reinforcement learning.

Background

The main research methods of protein ligand channels include Stretching Molecular Dynamics (SMD), random Acceleration Molecular Dynamics (RAMD), crystal thermodynamic factor method, calculation method and the like. Due to the problem of biological experiment cost, the current calculation method plays a key role in the field of ligand channel identification. Channel computing methods are generally divided into two categories: 1) A method of incorporating a ligand molecule into a channel recognition process; 2) The method of calculating the channel based on the protein structure alone is as follows:

in the method of incorporating ligand molecules into the channel recognition process, specific ligand molecules are explicitly considered, and channels possibly existing in the protein structure are detected through techniques such as molecular dynamics simulation. This approach requires a lot of resources for simulation and these simulations do not always find successful channel exits and may create unrealistic channels, so it is not common.

In the method for calculating the channel based on the protein structure only, ligand molecules are not considered in advance, the protein structure is converted into a geometrically processable problem, and the task of channel identification is simplified. The advantage of this approach is that the computation time and computation cost are typically small and detailed information of the channel size and surrounding atoms can be generated. Of which the most representative is

M et al in 2006. The CAVER works by exploring the mesh nodes using the Dijkstra algorithm, which is used to select the shortest and cheapest paths (i.e., those paths that are least costly). The CAVER is mainly used for analyzing protein structure channels, provides rapid, accurate and full-automatic calculation for static channels, plays a very important role in research of drug design and molecular enzymology, but has the problems of very high requirements on processor time and memory and continuous errors of a system when a large channel is explored. />

M and its team proposed the Voronoi mesh-based mobile algorithm in 2007. MOLE overcomes many of the disadvantages and limitations of CAVER. Although the MOLE algorithm core is also the Dijkstra path search algorithm, the authors applied it to Voronoi diagrams (networks) and the algorithm has proven to work well for the testing of a variety of biomolecular systems, including potassium channels, cytochrome P450 ligand channels. 2013. In the years, david et al have performed an algorithm upgrade on the mobile, and proposed a mobile 2.0 version, and a new version can identify more relevant channels, and through benchmark tests with other tools, the mobile 2.0 is faster, more robust and more general. 2017. In the year Berka K introduced a MOLE2.5 version which not only calculated the channel but also detectedAnd (3) a pore. This is also the latest version of the current MOLE. With the continuous updating iteration in recent years, MOLE has gradually become a gold standard for rapidly detecting channels in biological macromolecular structures.

However, the channels calculated by the CAVER or MOLE software are static, and the atoms actually forming the channels are in continuous motion due to the existence of protein dynamics, so that the actual positions of the channels are correspondingly changed. None of the current channel calculation methods takes into account the dynamic information of the protein.

Disclosure of Invention

In order to solve the technical problems, the invention provides an optimal path analysis method of a protein dynamic ligand channel based on reinforcement learning. The dynamic problem of the protein ligand channel is considered on the basis of the static channel, so that the ligand channel problem is more in line with the actual situation. Protein dynamics refers to the fact that the spatial conformation of a protein can change in a certain microenvironment, and various conformational isomers can regulate the folding and functions of the protein.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

the optimal path analysis method of the protein dynamic ligand channel based on reinforcement learning comprises the following steps:

acquiring static ligand channel information and full-atom dynamic information of the protein, screening the dynamic properties of partial atoms forming the channel, introducing a grid method and an atom collision detection mechanism in a three-dimensional space, and constructing a simulation environment;

constructing a reinforcement learning model, wherein the reinforcement learning model comprises states, intelligent agents, actions and rewards, and the state data comprises position information of ligand molecules and partial atoms forming a channel; inputting the previous state data into a reinforcement learning model, wherein the reinforcement learning model determines action data corresponding to the previous state data, and the intelligent agent interacts with a simulation environment according to the action data to obtain an interaction result, wherein the interaction result is next state data corresponding to the action data and reward data corresponding to ligand molecules under the action data;

when the ligand molecule leaves the boundary of the three-dimensional space grid map, giving a negative feedback value to the current state and returning to the previous state to continue training until the ligand molecule reaches the channel end point or the maximum exploration step number of a single round is reached, ending the training round, and outputting the path step number of the ligand molecule from the channel start point to the channel end point and the obtained total return value in the training round; and continuing the next training until all training rounds are finished, and outputting the path with the minimum path steps and the highest return value in all training rounds, namely the optimal path of the protein dynamic ligand channel.

Preferably, the method further comprises the following steps: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.

Preferably, static ligand channel information for the protein is obtained using MOLE or Caver software.

Preferably, the full atomic dynamic information of the protein is obtained using a normal model analysis or a molecular dynamics simulation method.

Preferably, the grid method processing under the introduced three-dimensional space specifically comprises the following steps:

constructing a three-dimensional space grid map, wherein the size of grid units in the three-dimensional space grid map depends on the size of ligand molecules, and the number of grid units in the three-dimensional space grid map depends on the size of the space occupied by atoms around an obtained static ligand channel;

and placing the ligand molecules at the starting point of the channel, wherein each grid unit in the three-dimensional space grid map is the minimum movement unit of the ligand molecules.

Preferably, before the atomic collision detection mechanism, the method further comprises the following steps:

judging whether the three-dimensional structure shape of the ligand molecule is a regular sphere, if so, wrapping the ligand molecule by a sphere, adopting a sphere bounding box, and enabling the grid unit to be a sphere external cube; if not, adopting a cylinder to wrap ligand molecules, adopting a three-dimensional AABB bounding box, and adopting a grid unit as a bounding box cube;

and abstracting part of atoms forming the channel into spheres, and detecting collision between ligand molecules and each atom in the three-dimensional space grid map in real time.

Preferably, the action data is up, down, left, right, front, back and forth in the three-dimensional space grid map, and the state data is a state value corresponding to a three-dimensional space coordinate value.

Preferably, the reinforcement learning model employs a time-series differential offline control algorithm Q-learning.

Preferably, the method further comprises the following steps: and introducing an exponential decay factor to gradually reduce the exploration rate in the reinforcement learning model along with the continuous increase of time.

Preferably, the reward data corresponding to the ligand molecule is obtained according to a reward and punishment function, wherein the reward and punishment function is described as follows:

based on the technical scheme, the invention has the beneficial effects that: according to the invention, based on the excellent static channel computing capability of MOLE, dynamic information of proteins is calculated by using a normal model analysis, the dynamic channel information of the proteins is obtained by combining, then based on a reinforcement Learning environment Gym, a grid method is improved by introducing a collision detection algorithm based on bounding boxes, learning is performed by adopting a time sequence difference offline control algorithm Q-Learning, an exponential decay factor is introduced for the exploration rate, a punishment function is designed to avoid dynamic atomic collision, the position of a catalytic active center is reached in the shortest step number, and a certain engineering application reference value is provided.

Drawings

FIG. 1 is a schematic diagram of a method for analyzing optimal paths of protein dynamic ligand channels based on reinforcement learning in one embodiment;

FIG. 2 is a diagram of all channels of 1bu7 in one embodiment;

FIG. 3 is a schematic diagram of all channel surrounding atoms of 1bu7 in one embodiment;

FIG. 4 is a schematic diagram of the surrounding atoms of each channel of 1bu7 in FIG. 3;

FIG. 5 is a flowchart of ANM computation in one embodiment;

FIG. 6 is a dynamic display of 1bu7 protein in one embodiment;

FIG. 7 is a representation of atom dynamics around a 1bu7 lane in one embodiment;

FIG. 8 is a schematic diagram of a three-dimensional environmental raster method in one embodiment;

FIG. 9 is a flow diagram of constructing a grid cell in one embodiment;

FIG. 10 is a schematic diagram of a method of optimal path analysis of a protein dynamic ligand channel based on reinforcement learning in one embodiment;

FIG. 11 is a convergence graph of different exploration rates in one embodiment;

FIG. 12 is a graph of prize convergence versus one embodiment;

fig. 13 is a diagram showing the optimal path for the different channels 1bu7 in fig. 4.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

As shown in fig. 1, the embodiment provides a method for analyzing an optimal path of a protein dynamic ligand channel based on reinforcement learning, which includes the following steps:

and 1, creating an environment. And combining the static ligand channel information calculated by the MOLE software and the full-atom dynamic information calculated by the NMA to generate dynamic information of atoms forming the channel, namely the dynamic information of the ligand channel. And simultaneously, rasterizing processing of the target environment and introduction of an atomic collision detection mechanism are performed based on the python script.

In this example, the protein static ligand channel was calculated by means of the MOLE software. The core of the localization deployment MOLE is to configure an XML file for calculation, set a proper diameter for channel filtering, and set a channel endpoint coordinate for path calculation of a channel. The MOLE supports outputting different results in multiple formats, and is mainly used in the system to obtain PDBPprofile and PDBSstructure and PyMOL formats. The PDBPprofile output format is a PDB file, and data such as channel center coordinates, channel radius and the like are stored. The PDBSstructure output format is PDB file, and the surrounding atom information of the component channels is saved. The PyMOL output format is Python file, which is mainly used for quickly visualizing the channel in the PyMOL tool. Taking 1BU7 as an example, setting the minimum radius of a channel as 1.2, taking the starting point of the channel as the Fe atom of a catalytic center, configuring input_1bu7.xml files, and enabling Linux and MacOS users to install Mono for use, wherein the commands are as follows: mono mole2. Exe./test_data/1 BU7/input_1BU7.Xml, the static channel calculation results of which are visualized as shown in FIGS. 2, 3 and 4.

In this example, the dynamic information of the protein total atoms was calculated using Normal Mode Analysis (NMA). NMA can be broadly divided into two categories: force field based methods and network based methods. Force field based methods utilize detailed molecular interactions, often yielding more accurate patterns than network based methods. However, traditional force field based methods require minimal initial capability, which distorts the structure. The network-based approach is typically an Elastic Network (ENM), this step is bypassed.

ENM is an effective means of processing and analyzing proteins by biophysical methods, which essentially is a variational expansion of hooke's law of elasticity, and characterizes the intrinsic kinetic properties of proteins from the high-dimensional aspect of the network. The system not only needs to consider the motion amplitude information of each atom, but also needs to consider the motion directions of each atom, and an Anisotropic Network Model (ANM) is selected for calculation.

The calculation was performed using ANM, and the input file was a protein three-dimensional structure PDB file, and mainly contains information such as position coordinates of atoms and residues constituting the protein. The ANM computing core generation flow is shown in fig. 5.

The new coordinates calculated by NMA are called normal coordinates (Normal Mode Coordinates, NMC), one NMC representing one vibration mode. Assuming that the atomic A starting position coordinates are (40, 50, 60), and NMC calculated by NMA is (5, 6), then the atomic A range of motion is:

x∈[35,45],y∈[45,55],z∈[54,66]

generalizing the problem, let the atomic A coordinate be (x ₀ ,y ₀ ,z ₀ ) NMC calculated by NMA was (m _x ,m _y ,m _z ) Then the range of motion of atom a is:

x∈[x ₀ -m _x ,x ₀ +m _x ],y∈[y ₀ -m _y ,y ₀ +m _y ],z∈[z ₀ -m _z ,z ₀ +m _z ]

then the atomic a coordinate at time t is:

x _t ＝x ₀ +α _t m _x ,y _t ＝y ₀ +α _t m _y ,z _t ＝z ₀ +α _t m _z

wherein alpha is _t ∈[-1,1]And is a one-bit decimal.

Setting a random number sequence for generating alpha _i Representing the values of different instants t _i Each atom is in a random position within its respective range. Then t ₁ Time sum t ₂ The time atomic positions are:

x ₁ ＝x ₀ +α ₁ m _x ；y ₁ ＝y ₀ +α ₁ m _y ；z ₁ ＝z ₀ +α ₁ m _z

x ₂ ＝x ₀ +α ₂ m _x ；y ₂ ＝y ₀ +α ₂ m _y ；z ₂ ＝z ₀ +α ₂ m _z

calculating t ₁ To t ₂ Position change amount at time:

then at t ₁ To t ₂ At the moment, atoms pass in the x-axisIs (alpha) ₂ -α ₁ )m _x Taking the boundary value alpha in extreme cases ₂ ＝1，α ₁ = -1 then the distance an atom travels along the x-axis is 2m _x Final position x ₁ +2m _x 。

Substituting the formula to calculate specifically: x is x ₁ ＝x ₀ -m _x And x is ₂ ＝x ₁ +2m _x ＝x ₀ +m _x 。

Since each time it is based on the initial position x ₀ The calculation is performed, and the advantage of the processing is that the judgment of the boundary value of the reciprocating motion is not needed, so that the atomic position at each moment is ensured to be in the motion range.

As a specific example, the atomic and NMA coordinate information is shown in Table 1, assuming the current system generated random number sequence is [0.10.8-0.4 … ].

TABLE 1 atomic motion information analysis table

t ₁ From time to t ₂ Time t ₂ From time to t ₃ The amount of change in displacement in the x-direction at time is:

the period T of the simple harmonic motion refers to the time it takes for an atom to start at the starting position, go through a complete round trip, and return to that point. The positions of the atoms in their periodic divisions are shown in table 2.

Table 2 motion cycle information table

In different periods, the change amount of the position of the atom relative to the initial position is as follows:

taking the position change information in the x-axis direction as an example, t ₀ (initial time) to t ₁ At the moment, the atomic displacement is 0.1m _x ，t ₁ To t ₂ At the moment, the atomic displacement is 0.7m _x Corresponding to t ₀ From time to t ₂ A total displacement of 0.8m _x . And at t ₂ To t ₃ At the moment, the atomic displacement variation is calculated to be-1.2 m _x Can be decomposed into 0.2m _x +(-1.4m _x ). 0.8m in front _x On the basis of (1), firstly go through 0.2m _x Reach m _x I.e.

At this point, the forward motion boundary has been reached, and a reverse displacement is required. The remaining-1.4 m _x The calculation may be performed directly. Can also be decomposed into-1.0 m _x +(-0.4m _x ) The front represents the reverse movement m _x Distance, i.e. reach

The position, also the initial position, is then continued to move in the reverse direction for 0.4m _x Is a displacement of (a). It can be seen that although t is computationally ₃ The moment is only according to alpha ₃ -0.4, reverse motion 0.4m _x But contains information about the change from the previous time position to the time position. The calculation results are visualized as shown in fig. 6 below.

In this embodiment, dynamic information of a part of atoms constituting a channel is screened by the python script. Based on the data stored in the file: atomic information that makes up each channel has been obtained in the tunnel1_structure.pdb and stored in a file: the motion information of all atoms of the protein is calculated by the 1bu7_anm_modes.nmd, the atom numbers of all the component channels are extracted from the tunnel1_structure.pdb file, the coordinates of all the atoms and the motion mode coordinates thereof are extracted from the 1bu7_anm_modes.nmd file, and then the coordinates and the motion information corresponding to the atoms of the component channels are screened. The atomic dynamics around channel 1 of 1bu7 are shown in fig. 7. The arrow direction represents the direction of movement and the length represents the magnitude of movement.

In this embodiment, for convenience of collision detection, the ligand molecules and the total atoms are abstracted into spheres, as shown in fig. 8. The target problem is modeled in the environment by using a grid method under three-dimensional space, as shown in fig. 9. The conventional grid method determines whether the corresponding grid can move by judging whether barriers exist in the grids around the target point or not, and because the size of the barriers (atoms) in the target environment is far smaller than the size of the grid units and each small atom is in a continuous motion state at all times, judging whether the grids around the target environment can move according to a conventional thought is unsuitable, so that each grid in the target environment is only used as a minimum moving unit for ligand molecules to move without considering whether the barriers exist on the moving path. That is, the ligand molecule does not know the next position before each step of movement and can not encounter an obstacle, and the environment can fully play the advantage of enabling an intelligent body to sense an unknown environment through reinforcement learning. Whether the current position is allowed to move or not is realized by adopting a collision detection mode, if the collision is detected, the current position is proved to be unable to move, otherwise, the current position can normally pass.

Since most ligand molecules exhibit an irregular three-dimensional structure, it is necessary to wrap them with a cylinder and then wrap them with a three-dimensional AABB bounding box. The size of this bounding box is the size of one grid cell, and the cell flow is shown in fig. 10.

A collision detection algorithm is introduced into the target environment, and each small sphere is abstracted into a collision body. The real-time detection of atomic collisions at each moment in the target environment is realized by means of the currently mainstream open source detection library FCL. At this time, only big spheres and small spheres are left in the system, and only whether collision occurs to the big spheres and the small spheres is detected.

And 2, initializing the environment. Initializing state space, action space and punishment functions of reinforcement learning.

In this embodiment, the environment modeling is performed on the target problem by using a three-dimensional space lower grid method, the state space S and the action space a are both limited sets, and after the rasterization processing, the action space is six, and the state space is not large, so that the Q-learning algorithm is selected as a main body to perform the construction of the reinforcement learning model. The concrete explanation is as follows:

state space: because the target environment is constructed by adopting a grid method, the state space can be determined by coding grids, the grid numbers correspond to the state numbers one by one, and the state S _t The grid position number of the ligand molecule at the time t is shown, and the grid is numbered from left to right and from top to bottom. The number of grid units depends on the size of the ligand molecule itself and the size of the space occupied by atoms around the channel. The larger the number of grid units, the higher the path planning accuracy, but at the same time, the larger the problem scale, the slower the algorithm convergence speed.

Action space: since the collision detection algorithm based on the AABB bounding box is adopted in the modeling, and the size of the bounding box is approximate to the size of a grid unit, the specific attitude problem and the movement angle problem of ligand molecules do not need to be considered at present, and each three-dimensional grid is a minimum mobile unit. The length x of each step of movement of the ligand molecule is the diameter of the inclusion and is also the width of the square grid. The adoption of the gridding environment greatly simplifies the moving direction of the moving body, only six types of front, back, upper, lower, left and right are needed to be considered, and the coordinate value is based on a right-hand space rectangular coordinate system, so that the action space information is shown in the table 3.

Table 3 action space information table

Prize and punish function: the first goal is to have the ligand molecule successfully reach the end of the channel, so a relatively large prize needs to be placed on the end. Since the boundary limit is increased when the map is constructed in a rasterization manner, the ligand molecules have a large probability of going out of the boundary when the exploration is just started, and then the subsequent exploration is of no significance, so that a large penalty is given to the out-of-range state. If the collision of the ligand molecule and atom indicates that the path has an obstacle to prevent the path from proceeding, the path has failed to explore, and also gives a large penalty and directly terminates the round of trial at the same time, and a new round of exploration is directly started from the beginning.

The reward and punishment function is described as follows:

state transition: the grid method ensures that the state space number and the three-dimensional coordinate value are uniquely corresponding, so that the position coordinate of the next moment can be calculated through the coordinate of the current moment and the adopted action, then the corresponding state number is found according to the position coordinate, and whether the boundary is found or not because the coordinate value outside the boundary does not have the corresponding state space number can be judged according to the corresponding relation between the new position and the state number.

And 3, super parameter setting. Super parameters in models such as learning rate, rewarding attenuation factor, exploration rate and the like.

In this embodiment, in general, a larger exploration rate e is needed to update the cost function faster in the early stage of the timing differential offline control algorithm Q-learning, and an optimal path is selected by experience to ensure the convergence of the result in the later stage by reducing the exploration rate. The exploration rate in the epsilon-greedy method is always kept unchanged, which greatly influences the convergence of the model in the later stage. In order to solve the problem, an exponential decay factor is introduced into the system aiming at the exploration rate, so that a relatively high exploration rate can be set at the beginning of training, the exploration rate is continuously decayed along with the progress of training, the exploration rate is reduced to be particularly low at the later stage, and the complete convergence of the model is ensured, as shown in fig. 11.

If small molecules collide with other atoms, the path is proved to have an obstacle, the training of the round fails, and the convergence chart of the reward value is shown in fig. 12 (a). During training, it was found that: although collision occurs in the searching process, small molecules can be used for searching continuously, and atoms collide in a certain state, the subsequent searching is still valuable, so that the number of training rounds required for convergence can be greatly reduced.

The small molecules are arranged to collide with atoms, and then the continuous exploration is not stopped, and the experimental results show that: this approach, while being slow at the beginning of the training, can find a steadily reachable path faster, indeed can bring the model to convergence faster, as shown in fig. 12 (b). After the small molecules collide with atoms and continue to be explored, the previous single-round training is slow, but the total training time is reduced compared with the previous training time.

The final hyper-parameters settings of the model are shown in table 4.

Table 4 model hyper-parameter information table

And 4, initializing the state. The ligand molecule and each atom position and state-action Q table are initialized and training is started.

And 5, selecting actions. At the current state s according to a given policy _t Selecting a specific action a of the ligand molecule, applying the action a to the environment to obtain corresponding feedback r, and remembering a new state s _t+1 (no state transition has occurred at this time).

Step 6:Q table updates. The state s is updated according to the updated formula of Q-learning _t And updating by adopting the Q value of the position where the action a is positioned.

And 7, boundary detection. If ligand molecules s _t+1 The location corresponding to the state has already been bordered by the environment, then this action is deemed invalid and s is returned again _t Status, i.e. let s _t+1 ＝s _t . While the environment feeds back a penalty to the ligand molecule, this action is avoided later on in this state.

And 8, updating the state. Environment from state s _t Entering a new state s _t+1 。

And 9, collision detection. For new state s _t+1 The positions of the ligand molecules and the positions of the atoms are updated, and the system is realizedCollision detection is performed. If the ligand molecule collides with Fe, the end point is successfully reached, the training is finished, and the step 4 is directly skipped to restart a new training round. Otherwise, go back to step 5 to continue the search.

And 10, outputting the optimal path step number, the obtained total return value and the path selection information until all training rounds are finished.

Through the operation, a simulation environment of the protein dynamic ligand channel is constructed at present, and a reasonable reinforcement learning algorithm flow is set for training and learning the target problem. All the processes are programmed and realized through Python, and a large number of experiments show that for any protein with ligand channels, the method can complete the calculation of static channels, the calculation of atom dynamic property, the environment simulation of dynamic ligand channels and the training of reinforcement learning models in a very fast time, and finally an optimal path of dynamic ligand channels reaching a catalytic active center can be obtained, so that the method has a certain engineering reference value.

Three channels of 1BU7 were analyzed and the results are shown in table 5 and fig. 13.

TABLE 5 BU7 channel analysis comparison Table 1

The above description is merely of the preferred embodiments of the method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning disclosed in the present invention, and is not intended to limit the scope of the examples of the present specification. Any modification, equivalent replacement, improvement, or the like made within the spirit and principles of the embodiments of the present specification should be included in the protection scope of the embodiments of the present specification.

Claims

1. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning is characterized by comprising the following steps:

2. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method according to claim 1, further comprising the steps of: the three-dimensional structure data of the protein and ligand molecules, which have been obtained experimentally, are downloaded from a local or online protein structure database.

3. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method according to claim 2, wherein static ligand channel information of the protein is acquired using mobile or cap software.

4. The reinforcement learning-based optimal path analysis method for a protein dynamic ligand channel according to claim 2, wherein the full-atom dynamic information of the protein is obtained by using a normal model analysis or a molecular dynamics simulation method.

5. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning according to claim 1, wherein the method for introducing the three-dimensional space under-grid method comprises the following steps:

6. The method for optimal path analysis of a protein dynamic ligand channel based on reinforcement learning according to claim 5, further comprising the steps of, before the atomic collision detection mechanism:

7. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning according to claim 5, wherein the action data is a state value corresponding to a three-dimensional space coordinate value, and the state data is a front-back, a left-right, a right-front, and a left-back in the three-dimensional space grid map.

8. The method for analyzing the optimal path of the protein dynamic ligand channel based on reinforcement learning according to claim 1, wherein the reinforcement learning model adopts a time sequence difference offline control algorithm Q-learning.

9. The reinforcement learning-based protein dynamic ligand channel optimal path analysis method according to claim 7, further comprising the steps of: and introducing an exponential decay factor to gradually reduce the exploration rate in the reinforcement learning model along with the continuous increase of time.

10. The method for analyzing optimal paths of protein dynamic ligand channels based on reinforcement learning according to claim 1, wherein the reward data corresponding to the ligand molecules is obtained according to a reward and punishment function, and the reward and punishment function is described as follows: