CN116825180A - Method, system and computer readable carrier for analyzing large dynamic protein-small molecule interactions - Google Patents
Method, system and computer readable carrier for analyzing large dynamic protein-small molecule interactions Download PDFInfo
- Publication number
- CN116825180A CN116825180A CN202210282538.0A CN202210282538A CN116825180A CN 116825180 A CN116825180 A CN 116825180A CN 202210282538 A CN202210282538 A CN 202210282538A CN 116825180 A CN116825180 A CN 116825180A
- Authority
- CN
- China
- Prior art keywords
- data
- interaction
- format
- file
- track
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 173
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000004458 analytical method Methods 0.000 claims abstract description 45
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 39
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 39
- 150000003384 small molecules Chemical class 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000003860 storage Methods 0.000 claims abstract description 17
- 238000012800 visualization Methods 0.000 claims abstract description 13
- 150000001413 amino acids Chemical class 0.000 claims description 27
- 230000002209 hydrophobic effect Effects 0.000 claims description 22
- 229910052736 halogen Inorganic materials 0.000 claims description 16
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 15
- 150000004696 coordination complex Chemical class 0.000 claims description 14
- 229910052739 hydrogen Inorganic materials 0.000 claims description 14
- 239000001257 hydrogen Substances 0.000 claims description 14
- 238000007781 pre-processing Methods 0.000 claims description 13
- 239000003086 colorant Substances 0.000 claims description 10
- 150000001768 cations Chemical class 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 5
- 238000000329 molecular dynamics simulation Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000007405 data analysis Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 238000011160 research Methods 0.000 abstract description 8
- 238000013461 design Methods 0.000 abstract description 2
- 238000007877 drug screening Methods 0.000 abstract description 2
- 238000003786 synthesis reaction Methods 0.000 abstract description 2
- 239000003446 ligand Substances 0.000 description 4
- 125000004429 atom Chemical group 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 125000003275 alpha amino acid group Chemical group 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000008846 dynamic interplay Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012353 t test Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000012482 interaction analysis Methods 0.000 description 1
- 150000002611 lead compounds Chemical class 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000004853 protein function Effects 0.000 description 1
- 238000003041 virtual screening Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to a method, system and computer readable carrier for analyzing large dynamic protein-small molecule interactions. The method specifically discloses the following steps: s1) setting a parameter preset value of program operation and command; s2) detecting preset parameter values; s3) acquiring file information and carrying out local storage; s4) processing and analyzing the data obtained in the step S3): s5) carrying out 2D visualization and result output storage according to the data obtained in the step S4). The system provided by the invention can be suitable for analyzing large data texts, a user does not need to consider whether the data file is too large, the rapid analysis can be performed on track types with more than 500 frames, and the characteristic analysis can be performed by setting user-defined parameters better. Can be applied to various fields of researches such as bioinformatics research, protein design, drug screening, small molecule chemical synthesis and the like.
Description
Technical Field
The invention belongs to the technical field of biological information, and particularly relates to a system for analyzing interaction of large dynamic protein and small molecules.
Background
The Protein Database (PDB) has approximately 10 tens of thousands of deposited protein structures, more than 75% of which are complexed with small molecule ligands. Binding of a ligand to its host protein requires a specific arrangement of attractive, usually non-covalent, contacts between the two molecules. With the existing wealth of data, it is possible to gain insight into how ligands interact with their protein targets. Detailed characterization of these interaction modes is critical to understanding molecular recognition and protein function or to develop and optimize lead compounds. In theory, a comparative high throughput analysis of interaction patterns can greatly improve protein-ligand docking or virtual screening, enhancing computational methods in drug discovery. However, the prior art has little research about how to perform deep analysis by using existing data, more than independent research of biological medicine research personnel aiming at a certain medicine or protein, but lacks a platform capable of providing interaction analysis of protein and small molecule, and at present, although a single platform provides visual analysis of protein and small molecule, only static result output and input can be provided, for example, only a pdb type file can be input, and dynamic result output and input, for example, a motion trail type file cannot be realized.
As is well known, the dynamic trajectory is more accurate than the static data, and has higher research value, both for the interaction of proteins and small molecules and for the research of the stability of proteins. Although the applicant has developed analysis website service based on network platform in advance, the analysis website service is not suitable for analyzing large-scale data files due to the characteristics of website clients, so that a great amount of time is consumed in uploading data, the pressure of a server is increased, and the user experience is greatly reduced. On the other hand, websites prefer to provide uniform and conventional analysis services, and it is relatively difficult to meet the customized analysis requirements of professional users.
Disclosure of Invention
The present invention is based on the above problem, and provides a software service for analyzing large dynamic protein-small molecule interactions. On one hand, the problem that the conventional analysis website service is difficult to analyze larger data software is solved; on the other hand, a more flexible, customizable analysis service is provided, which the user can customize through parameter settings.
In one aspect, the invention provides a method of analyzing large dynamic protein-small molecule interactions, the method comprising the steps of:
s1) setting a parameter preset value of program operation and command;
s2) detecting preset parameter values;
s3) acquiring file information and carrying out local storage;
s4) processing and analyzing the data obtained in the step S3):
s41) processing the data stored in the step S3);
s42) carrying out model construction and MD simulation on the data in the pdb format file obtained in the step S41) by using gromacs to obtain Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) data of the original data, generating a xvg format result file, drawing corresponding RMSD and RMSF figures through a python script, and storing the result figures;
s43) cleaning the data in the file obtained in the step S41), and then performing calculation and analysis by utilizing physical geometrical properties to obtain an xml-type interaction type result file; and storing corresponding data;
s5) carrying out 2D visualization and result output storage according to the data obtained in the step S4):
s51) drawing and displaying protein-small molecule interaction radar maps, 9 in total, including a hydrophobic interaction (hydrophobic interaction) radar map, a hydrogen bond (water bridge) interaction radar map, a salt bridge (salt bridge) interaction radar map, a pi-pi stack (pi_stack) interaction radar map, a cation-pi interaction (pi cation interaction) radar map, a halogen bond (metal complex) interaction radar map, and a total interaction radar map;
s52) drawing and displaying a heat map of interaction distribution of the trace-class data, wherein the abscissa of the heat map is a different Frame (Frame) in the trace-class data, the ordinate represents amino acid sites in proteins interacting with small molecules, and each grid represents the number and type of interactions between the corresponding amino acid sites of the corresponding Frame and the small molecules;
s53) showing the RMSD/RMSF map obtained in step S52);
s54) automatically integrating all the visualization results into pdf with python, generating a result document, and outputting to the working catalog.
In some embodiments of the present invention, in step S1), a parameter preset value is set in a user-defined manner, where the parameter preset value is selected from a coordinate file path parameter, a track file path parameter, and an analysis interval parameter.
In some embodiments of the present invention, the coordinate file path parameters in step S1) are used to provide the original coordinate file to be analyzed, preferably, the format of the original coordinate file is selected from the pdb format or the gro format.
In some embodiments of the present invention, the track file path parameter in step S1) is used to provide the original track file to be analyzed, preferably, the original track file format is at least one selected from xtc format, dcd format, gro format.
In some embodiments of the present invention, the analysis interval parameter in step S1) is used to set the frame interval to be analyzed.
In some embodiments of the present invention, the setting of the analysis interval parameter in step S1) is applied to the original track file having a frame number higher than 500 frames, and the analysis interval parameter is selected from integers not higher than the total frame number of the original track file.
In some specific technical schemes of the invention, in step S2), the preset value of the parameter set in step S1) is compared with a preset threshold value to determine, if the parameter is abnormal, the operation is ended, and an abnormal information prompt is given, if the parameter is normal, the step 3) is continued.
In some specific technical solutions of the present invention, in step S4), structural coordinate class data and track class data are obtained, and the coordinate class data and track class data are stored in a local processing analysis directory, and a result storage directory is set, where an acceptable structural coordinate class file format is selected from a pdb format or a gro format, and a track file format is selected from at least one of a xtc format, a dcd format and a gro format.
In some specific technical solutions of the present invention, step S41) first determines whether the stored file is a single file containing only coordinate class data or a double file containing coordinate class data and track class data, and converts the file format of the single file containing only coordinate class data into the pdb format; and when the track type data is double files, aligning the track type data according to the coordinate type data, and then cutting and processing the track type data according to the analysis interval parameters set in the step S1).
In some embodiments of the invention, the type of interaction is selected from the group consisting of hydrophobic interactions (hydrophobic interaction), hydrogen bonds (hydrogen bond), water bridges (water bridge), salt bridges (salt bridge), pi-pi stacks (pi_stack), cation-pi interactions (pi cation interaction), halogen bonds (halogen bond), metal complex interactions (metal complex).
In some embodiments of the invention, each axis of the radar chart represents an amino acid in a protein that has an interaction site with a small molecule, on a scale of the frequency at which the corresponding interaction occurs.
In some embodiments of the invention, the total radar chart, each axis represents an amino acid in a protein that has an interaction site with a small molecule, scaled as the sum of the frequencies at which the amino acids at that site interact.
In some specific technical schemes of the invention, in the heat map, the number and types of interactions between the corresponding amino acid sites and the small molecules in each grid representing the corresponding frame are expressed in a manner that different colors are adopted to represent different action types, and when the corresponding frame contains multiple action sites, the grids are equally divided by adopting different corresponding colors.
In another aspect the invention provides a computer readable carrier for analysing large dynamic protein-small molecule interactions, the computer readable storage medium storing program code for performing the method of analysing protein-small molecule interactions described above.
In a further aspect, the invention provides a system for analyzing large dynamic protein-small molecule interactions, the system comprising a command parameter preset, a data preprocessing module, a data analysis module and a result generation module;
the command parameter is preset, is used for setting a parameter preset value of program operation and command by a user, and realizes detection of the set parameter preset value;
the data preprocessing module is used for preprocessing and storing the data obtained by the data uploading module;
the analysis module is used for pre-detecting the data and processing and analyzing the data stored by the storage module; the data processing and analyzing module: 1) The data processing module is used for processing the data in the pre-detection and storage module; 2) The method comprises the steps of performing model construction and MD simulation through gromacs, obtaining Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) data of original data, generating a xvg format result file, drawing corresponding RMSD and RMSF images through a python script, and storing the result images; 3) The method comprises the steps of 1) cleaning data of the file obtained in the step, and then calculating and analyzing by utilizing physical geometrical properties to obtain an xml type interaction result file; and storing corresponding data;
the result generation module is used for carrying out 2D visualization on the data of the data processing and analyzing module and outputting and storing the result:
the 2D visualization refers to: 1) Drawing and displaying a total of 9 protein-small molecule interaction radar maps including a hydrophobic interaction (hydrophobic interaction) radar map, a hydrogen bond (water bridge) interaction radar map, a salt bridge (salt bridge) interaction radar map, a pi-pi stack (pi_stack) interaction radar map, a cation-pi interaction (pi cation interaction) radar map, a halogen bond (halogen bond) interaction radar map, a metal complex interaction radar map, and a total interaction radar map; 2) Drawing and displaying a heat map of interaction distribution of trace data, wherein the abscissa of the heat map is different frames (frames) in the trace data, the ordinate represents amino acid sites in proteins interacted with small molecules, and each grid represents the number and types of interactions of the corresponding amino acid sites of the corresponding frames with the small molecules; 3) And displaying the Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) data results obtained by the data processing and analyzing module.
And the result output is that all the visualized results are automatically integrated into pdf by using python, and a result document is generated and output to the working catalog.
In some specific technical solutions of the present invention, the coordinate files that the data preprocessing module can accept include pdb format files and gro format files, and the track type includes xtc format files, dcd format files and gro format files.
In some specific technical schemes of the invention, in the data preprocessing module, the method for obtaining the pdb format file for the track data in the data preprocessing module is as follows: firstly, calculating the track number in track type data, then, carrying out equidistant sampling on the tracks according to parameter values analyzed in a command parameter presetting module (when the distance parameter-s value is 1, sampling is not needed, and the original all track data) and then synthesizing a pdb file; the gro format, xtc format and dcd format files are converted into pdb format files through MDANALYSIS of an analysis tool python.
In some embodiments of the invention, the type of interaction is selected from the group consisting of hydrophobic interactions (hydrophobic interaction), hydrogen bonds (hydrogen bond), water bridges (water bridge), salt bridges (salt bridge), pi-pi stacks (pi_stack), cation-pi interactions (pi cation interaction), halogen bonds (halogen bond), metal complex interactions (metal complex).
In some embodiments of the invention, each axis of the radar chart represents an amino acid in a protein that has an interaction site with a small molecule, on a scale of the frequency at which the corresponding interaction occurs.
In some embodiments of the invention, the total radar chart, each axis represents an amino acid in a protein that has an interaction site with a small molecule, scaled as the sum of the frequencies at which the amino acids at that site interact.
In some embodiments of the present invention, different colors are used to display different interaction types in the heat map, and when there are multiple interactions, the colors corresponding to the multiple interactions equally divide the grids in the heat map, and each grid has several colors representing several interactions.
The invention discloses a system for dynamically analyzing interaction of protein and micromolecules, which is divided into a command parameter presetting module, a data preprocessing module, a data analyzing module and a result generating module. The user runs the program script locally, provides the data file to be analyzed according to the command parameters, aligns and preprocesses the data according to the input parameters, calculates and analyzes the interaction between the protein and the micromolecule according to the geometric property, analyzes the stability of the protein by utilizing a formula, and finally visualizes the result to the local folder.
Advantageous effects
1. The system provided by the invention can be suitable for analyzing large data texts, a user does not need to consider whether the data file is too large, the rapid analysis can be performed on track types with more than 500 frames, and the characteristic analysis can be performed by setting user-defined parameters better. The whole system is stable and rapid, and a user can quickly obtain a visualized dynamic interaction result after running a program, so that various fields of bioinformatics research, protein design, drug screening, small molecule chemical synthesis and the like are greatly facilitated.
2. The unique 2D visual representation of the present invention provides a radar map, 2D dynamic trajectory interaction profile to more clearly demonstrate the overall course of all interactions.
Drawings
FIG. 1 is a software system structure for dynamically analyzing protein-small molecule interactions, and the whole system is divided into a command parameter presetting module, a data preprocessing module, a data analysis module and a result generation module. The user runs the program script locally, provides the data file to be analyzed according to the command parameters, the program analyzes the input parameters, aligns and preprocesses the data, calculates and analyzes the interaction between the protein and the micromolecule according to the geometric property, analyzes the stability of the protein by utilizing a formula, and finally visualizes the result to a local folder.
Fig. 2 is a graph of 2D interaction visualization results. The analyzed 2D dynamic interaction result comprises a radar interaction schematic diagram, 2D dynamic track interaction distribution and RMSD and RMSF result diagrams.
Fig. 3 is a graph of the results of the 2D radar analysis graph, which is divided into eight interaction type radar graphs and one integrated radar graph. Taking the second radar chart hydrophobic interactions as an example, it can be seen that there are 13 total protein-small molecule hydrophobic interactions, I78, F153, L133, L121, L118, F114, V111, V103, a99, L91, Y88, V87, L84, respectively. I in I78 represents the corresponding amino acid, 78 is the position of the amino acid in the data file, and other representations are the same, and it can be clearly seen that the interactions with higher frequencies are I78, F153, L118, A99, which are more likely to remain stable with movement.
Fig. 4 is a 2D dynamic trajectory interaction profile showing the dynamic trajectory interaction profile with the corresponding trajectory sequence on the abscissa and the small molecule-protein position in the raw data on the ordinate, it can be clearly seen which interactions can exist stably and which are relatively unstable throughout the motion trajectory.
FIG. 5 is a RMSD graph showing the RMSD value (Root Mean Square Deviation) of a protein over the entire motion profile, depicting the degree of change in protein structure, and showing that the protein is relatively stable between 2 and 700.
Fig. 6 is a RMSF graph showing RMSF values (Root Mean Square Fluctuation) of proteins in the entire motion profile, showing degrees of freedom of movement of individual atoms in molecules, and it can be seen that the structure is relatively flexible throughout the interval 0 to 160.
FIG. 7 is a RMSD graph showing the RMSD value of a protein-small molecule over the entire motion profile (Root Mean Square Deviation), which can be seen to be relatively stable over the entire motion profile.
Detailed Description
The following detailed description of the present invention will be made in detail to make the above objects, features and advantages of the present invention more apparent, but should not be construed to limit the scope of the present invention.
The method of the present invention will be described in detail with reference to fig. 1. The invention provides a method for analyzing large dynamic protein-small molecule interactions, comprising the steps of:
step S1), setting a parameter preset value of program operation and command;
in the step S1), parameter preset values are set in a self-defining mode, wherein the parameter preset values are selected from coordinate file path parameters, track file path parameters and analysis interval parameters.
The coordinate file path parameters in step S1) are used to provide the original coordinate file to be analyzed, preferably the format of the original coordinate file is selected from the pdb format or the gro format. The markup file path parameter is set to-c, e.g., -c test. Pdb, assuming that only single coordinate files are analyzed, the interval is default, then the complete command is: python moladi.
The track file path parameters in step S1) are used to provide the original track file to be analyzed, preferably, the original track file format is selected from at least one of xtc format, dcd, gro format. The track file path parameter is set to-t, for example-t test.xtc, and assuming that the analysis file is a test.pdb and test.xtc dual file and the interval is default, the complete command is: python moladi. Py-ctest. Pdb-t test. Xtc. If the track file is assumed to be 1000 frames in total, it is desirable to set the interval to 2, and analyze every 2 frames, i.e., 500 frames in total, the complete command is python moladi.
The analysis interval parameter in step S1) is used to set the frame interval to be analyzed.
The setting of the analysis interval parameter in step S1) is applied to the original track file having a frame number higher than 500 frames, and the analysis interval parameter is selected from integers not higher than the total frame number of the original track file. For example, an integer selected from 2-20, such as a track frame number of 500, and an analysis interval of 2, then when analyzing interactions, equidistant sampling [1,3, 5.+ -. Is performed, and eventually 500/2=250 frames are analyzed. The specific implementation mode is that whether the corresponding parameter setting occurs or not is identified in the program through capturing the command line character string, if not, the corresponding parameter setting is the default parameter value, and then the default parameter value is transmitted to the subsequent program for analysis and processing. Through the setting of the three parameters, a user can analyze the interaction of single files, double files and any frame number, so that the analysis content is more flexible and changeable.
Step S2), detecting preset values of the set parameters;
s3) acquiring file information and carrying out local storage;
the file information obtained in step S3) may include a structural coordinate class file or a structural coordinate class file and a track file format, where the structural coordinate class file format is selected from a pdb format and a gro format; the track file format is at least one selected from xtc format, dcd and gro format.
Step S4) processing and analyzing the data obtained in the step S3); specifically, structural coordinate class data and track class data are obtained, the coordinate class data and the track class data are stored in a local processing analysis catalog, a result storage catalog is set, an acceptable structural coordinate class file format is selected from a pdb format or a gro format, and a track file format is selected from at least one of a xtc format, a dcd format and a gro format.
S41) processing the data stored in the step S3);
s42) carrying out model construction and MD simulation on the data in the pdb format file obtained in the step S41) by using gromacs to obtain Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) data of the original data, generating a xvg format result file, drawing corresponding RMSD and RMSF figures through a python script, and storing the result figures;
s43) cleaning the data in the file obtained in the step S41), and then performing calculation and analysis by utilizing physical geometrical properties to obtain an xml-type interaction type result file; and saves the corresponding data.
Step S41), firstly judging whether the stored file is a single file containing only coordinate type data or a double file containing the coordinate type data and track type data, and converting the file format of the single file containing only the coordinate type data into the pdb format; and when the track type data is double files, the track type data is subjected to the matching according to the coordinate type data, and then the track type data is cut and processed according to the analysis interval parameters set in the step S1).
In some specific embodiments, the method for obtaining the pdb file in step S51) includes, for the track class data, calculating the number of tracks in the track class data, then performing equidistant sampling (when the value of the interval parameter-S is 1, no sampling is needed, and the original all track data) on the tracks according to the parameter value analyzed in the command parameter preset module, and then synthesizing the pdb file; the gro format, xtc format and dcd format files are converted into pdb format files through an MDANalysis method of an analysis tool python.
In some specific embodiments, step S51) first determines the amount of uploaded data, then further determines the file format, and for the structural coordinate data, directly converts it into a unified pdb format. For the data containing the structure coordinates and the track data, the track number contained in the track data is calculated, then the track data is sampled at equal intervals, and finally the file containing 50 frames to be calculated and analyzed is synthesized.
In some specific embodiments, in step S33), the data file is cleaned, the water molecules and the synthesized model serial numbers are removed, then the start and end positions of each track are searched, the irrelevant first and last items in each track are removed, then the physical geometrical property is utilized for calculation and analysis, an xml-type interaction type result file is obtained, and the path of the xml-type interaction type result file is stored in the database row where the corresponding input file is located.
In some embodiments of the invention, the type of interaction is selected from the group consisting of hydrophobic interactions (hydrophobic interaction), hydrogen bonds (hydrogen bond), water bridges (water bridge), salt bridges (salt bridge), pi-pi stacks (pi_stack), cation-pi interactions (pi cation interaction), halogen bonds (halogen bond), metal complex interactions (metal complex).
S5) carrying out 2D visualization and result output storage according to the data obtained in the step S4):
s51) drawing and displaying protein-small molecule interaction radar maps, 9 in total, including a hydrophobic interaction (hydrophobic interaction) radar map, a hydrogen bond (water bridge) interaction radar map, a salt bridge (salt bridge) interaction radar map, a pi-pi stack (pi_stack) interaction radar map, a cation-pi interaction (pi cation interaction) radar map, a halogen bond (metal complex) interaction radar map, and a total interaction radar map;
s52) drawing and displaying a heat map of interaction distribution of the trace-class data, wherein the abscissa of the heat map is a different Frame (Frame) in the trace-class data, the ordinate represents amino acid sites in proteins interacting with small molecules, and each grid represents the number and type of interactions between the corresponding amino acid sites of the corresponding Frame and the small molecules;
s53) showing the RMSD/RMSF map obtained in step S52);
s54) automatically integrating all the visualization results into pdf with python, generating a result document, and outputting to the working catalog.
In some embodiments of the invention, each axis of the radar chart represents an amino acid in a protein that has an interaction site with a small molecule, on a scale of the frequency with which the corresponding interaction occurs.
In some embodiments of the invention, the total radar chart, each axis represents an amino acid in a protein that has a site of interaction with a small molecule, scaled as the sum of the frequencies at which the amino acids at that site interact.
In some embodiments of the invention, each radar map expresses a specific interaction, the frequency with which the interaction occurs. Specific interactions are all interactions, hydrophobic interactions, hydrogen bonds, water bridges, salt bridges, pi-pi stacks, cation-pi interactions, halogen bonds, metal complex interactions. Wherein all interactions are weighted sums of interactions of hydrophobic interactions, hydrogen bonds, water bridges, salt bridges, pi-pi stacks, cation-pi interactions, halogen bonds, metal complex interactions.
In some embodiments of the present invention, in the heat map, each grid represents the number and type of interactions between the corresponding amino acid sites and the small molecules of the corresponding frame in such a manner that different colors represent different types of effects, and when the corresponding frame contains multiple types of effects sites, the grid is equally divided by different corresponding colors.
In some embodiments of the invention, a 2D dynamic trajectory interaction profile is plotted by recording each interaction and the trajectory sequence in which the interaction is located, with the abscissa being the trajectory sequence and the ordinate being the specific interaction at the time of the plot.
In some embodiments of the invention, RMSD and RMSF maps are used to characterize the stability of the protein itself, calculated by means of the mean square error equation (RMSD), the root mean square fluctuation equation (RMSF), in combination with the variation of atomic coordinates in the trace file data. The RMSD calculation method includes calculating the average position of the atomic coordinates (x, y, z) moving n times, calculating the square difference between the n coordinates and the average coordinate, and summing all the square differences to obtain the RMSD value. The RMSF calculation method is to calculate the average value of the square difference variation of each atomic position over time in one time T. RMSF indicates the degree of freedom in movement of individual atoms in a molecule.
In some embodiments of the present invention, since one interaction has multiple interaction types at the same time, the number of interactions is calculated to be evenly distributed in the corresponding grid, and the different interaction types have the corresponding colors, so that the whole heat map is drawn finally.
One example of a visualization result that is analyzed and generated by uploading a structural coordinate class file and a track class file is as follows:
1) The 2D radar chart is shown in fig. 3, and is divided into eight interaction type radar charts and one integrated radar chart. Taking the second radar chart hydrophobic interactions as an example, it can be seen that there are 13 total protein-small molecule hydrophobic interactions, I78, F153, L133, L121, L118, F114, V111, V103, a99, L91, Y88, V87, L84, respectively. . I in I78 represents the corresponding amino acid, 78 is the position of the amino acid in the data file, and other representations are the same, and it can be clearly seen that the interactions with higher frequencies are I78, F153, L118, A99, which are more likely to remain stable with movement.
2) The 2D dynamic trajectory interaction profile is shown in fig. 4. Fig. 4 shows the dynamic trajectory interaction profile with the corresponding trajectory sequence on the abscissa and the position of the small molecule-protein in the raw data on the ordinate, it can be clearly seen which interactions can exist stably and which are relatively unstable throughout the entire trajectory.
3) The RMSD and RMSF patterns are shown in FIGS. 5-7. Wherein FIG. 5 is a RMSD graph showing the RMSD value (Root Mean Square Deviation) of a protein over the entire movement trace, depicting the degree of variation in protein structure, it can be seen that the protein is relatively stable between 2 and 700. Fig. 6 is a RMSF graph showing RMSF values (Root Mean Square Fluctuation) of proteins in the whole motion trajectory, showing degrees of freedom of movement of each atom in the molecule, and it can be seen that the corresponding structure is flexible and the other parts are relatively stable in the interval 0 to 160. FIG. 7 is a RMSD graph showing the RMSD value of a protein-small molecule over the entire motion profile (Root Mean Square Deviation), which can be seen to be relatively stable over the entire motion profile.
Claims (10)
1. A method of analyzing large dynamic protein-small molecule interactions, the method comprising the steps of:
s1) setting a parameter preset value of program operation and command;
s2) detecting preset parameter values;
s3) acquiring file information and carrying out local storage;
s4) processing and analyzing the data obtained in the step S3):
s41) processing the data stored in the step S3);
s42) carrying out model construction and MD simulation on the data in the pdb format file obtained in the step S41) by using gromacs to obtain Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) data of the original data, generating a xvg format result file, drawing corresponding RMSD and RMSF figures through a python script, and storing the result figures;
s43) cleaning the data in the file obtained in the step S41), and then performing calculation and analysis by utilizing physical geometrical properties to obtain an xml-type interaction type result file; and storing corresponding data;
s5) carrying out 2D visualization and result output storage according to the data obtained in the step S4):
s51) drawing and displaying a protein-small molecule interaction radar map, the interaction radar map comprising 9 in total, a hydrophobic interaction radar map, a hydrogen bond interaction radar map, a water bridge interaction radar map, a salt bridge interaction radar map, a pi-pi stack interaction radar map, a cation-pi interaction radar map, a halogen bond interaction radar map, a metal complex interaction radar map, and a total interaction radar map;
s52) drawing and displaying a heat map of interaction distribution of the track data, wherein the abscissa of the heat map is different frames in the track data, the ordinate represents amino acid sites in proteins interacted with small molecules, and each grid represents the number and types of interactions between the corresponding amino acid sites of the corresponding frames and the small molecules;
s53) showing the RMSD/RMSF map obtained in step S52);
s54) automatically integrating all the visualization results into pdf with python, generating a result document, and outputting to the working catalog.
2. The method according to claim 1, wherein in step S1), parameter preset values are set in a user-defined manner, and the parameter preset values are selected from a coordinate file path parameter, a track file path parameter and an analysis interval parameter;
the coordinate file path parameters are used for providing an original coordinate file to be analyzed, and preferably, the format of the original coordinate file is selected from the pdb format or the gro format;
the track file path parameters are used for providing an original track file to be analyzed, and preferably, the original track file format is at least one selected from xtc format, dcd format and gro format;
the analysis interval parameter is used to set the frame interval to be analyzed.
3. The method according to claim 1, wherein the setting of the analysis interval parameter in step S1) is applied to the original track file having a frame number higher than 500 frames, and the analysis interval parameter is selected from integers not higher than the total frame number of the original track file.
4. The method according to claim 1, wherein in step S4), the structural coordinate class data and the track class data are obtained, and the coordinate class data and the track class data are stored in a local processing analysis directory, and the result storage directory is set, and the acceptable structural coordinate class file format is selected from the pdb format or the gro format, and the track file format is selected from at least one of the xtc format, the dcd format, and the gro format.
5. The method according to claim 1, wherein in step S41), it is first determined whether the stored file is a single file containing only coordinate class data or a double file containing both coordinate class data and track class data, and the file format is converted into pdb format for the single file containing only coordinate class data; and when the track type data is double files, the track type data is subjected to the matching according to the coordinate type data, and then the track type data is cut and processed according to the analysis interval parameters set in the step S1).
6. The method of claim 1, wherein the type of interaction is selected from the group consisting of hydrophobic interactions, hydrogen bonds, water bridges, salt bridges, pi-pi stacks, cation-pi interactions, halogen bonds, metal complex interactions;
each axis of the radar chart represents amino acids in a protein which have interaction sites with small molecules, and the scale is the frequency of corresponding interaction;
preferably, each axis in the total radar chart represents an amino acid in a protein that has an interaction site with a small molecule, the scale being the sum of the frequencies at which the amino acids at that site interact;
in the heat map, the number and type of interactions between the corresponding amino acid sites and the small molecules in each grid representing the corresponding frame are expressed in a way that different colors are adopted to represent different types of effects, and when the corresponding frame contains multiple types of effect sites, the grid is equally divided by adopting different corresponding colors.
7. A computer readable carrier for analyzing large dynamic protein-small molecule interactions, the computer readable storage medium storing program code for performing the method of dynamically analyzing protein-small molecule interactions of any one of claims 1-6.
8. A computer readable carrier for analyzing large dynamic protein-small molecule interactions, wherein the system comprises a command parameter presetting module, a data preprocessing module, a data analysis module and a result generation module;
the command parameter is preset, is used for setting a parameter preset value of program operation and command by a user, and realizes detection of the set parameter preset value;
the data preprocessing module is used for preprocessing and storing the data obtained by the data uploading module;
the analysis module is used for pre-detecting the data and processing and analyzing the data stored by the storage module; the data processing and analyzing module: 1) The data processing module is used for processing the data in the pre-detection and storage module; 2) The method comprises the steps of performing model construction and MD simulation through gromacs, obtaining Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) data of original data, generating a xvg format result file, drawing corresponding RMSD and RMSF images through a python script, and storing the result images; 3) The method comprises the steps of 1) cleaning data of the file obtained in the step, and then calculating and analyzing by utilizing physical geometrical properties to obtain an xml type interaction result file; and storing corresponding data;
the result generation module is used for carrying out 2D visualization on the data of the data processing and analyzing module and outputting and storing the result:
the 2D visualization refers to: 1) Drawing and displaying a total of 9 protein-small molecule interaction radar maps including a hydrophobic interaction (hydrophobic interaction) radar map, a hydrogen bond (water bridge) interaction radar map, a salt bridge (salt bridge) interaction radar map, a pi-pi stack (pi_stack) interaction radar map, a cation-pi interaction (pi cation interaction) radar map, a halogen bond (halogen bond) interaction radar map, a metal complex interaction radar map, and a total interaction radar map; 2) Drawing and displaying a heat map of interaction distribution of trace data, wherein the abscissa of the heat map is different frames (frames) in the trace data, the ordinate represents amino acid sites in proteins interacted with small molecules, and each grid represents the number and types of interactions of the corresponding amino acid sites of the corresponding frames with the small molecules; 3) Displaying Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) data results obtained by the data processing and analyzing module;
and the result output is that all the visualized results are automatically integrated into pdf by using python, and a result document is generated and output to the working catalog.
9. The system of claim 8, wherein the data preprocessing module is capable of accepting a structural coordinate class file format selected from a pdb format or a gro format, and a track file format selected from at least one of a xtc format, a dcd format, and a gro format.
10. The system of claim 8, wherein the preprocessing module obtains the pdb format file for the track class data by: firstly, calculating the track number in track type data, then, carrying out equidistant sampling on tracks according to parameter values analyzed in a command parameter presetting module, and then, synthesizing a pdb file; the gro format, xtc format and dcd format files are converted into pdb format files through MDANALYSIS of an analysis tool python.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210282538.0A CN116825180A (en) | 2022-03-22 | 2022-03-22 | Method, system and computer readable carrier for analyzing large dynamic protein-small molecule interactions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210282538.0A CN116825180A (en) | 2022-03-22 | 2022-03-22 | Method, system and computer readable carrier for analyzing large dynamic protein-small molecule interactions |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116825180A true CN116825180A (en) | 2023-09-29 |
Family
ID=88117169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210282538.0A Pending CN116825180A (en) | 2022-03-22 | 2022-03-22 | Method, system and computer readable carrier for analyzing large dynamic protein-small molecule interactions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116825180A (en) |
-
2022
- 2022-03-22 CN CN202210282538.0A patent/CN116825180A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wriggers et al. | Situs: a package for docking crystal structures into low-resolution maps from electron microscopy | |
Chen et al. | iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets | |
Pagliosa et al. | Projection inspector: Assessment and synthesis of multidimensional projections | |
Park et al. | Clustering multivariate functional data with phase variation | |
CA2424031A1 (en) | System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map | |
Raoult et al. | Confronting soil moisture dynamics from the ORCHIDEE land surface model with the ESA-CCI product: Perspectives for data assimilation | |
US20200272702A1 (en) | Graphic user interface assisted chemical structure generation | |
van den Berg et al. | SPiCE: a web-based tool for sequence-based protein classification and exploration | |
EP4044063A1 (en) | Information processing system, information processing method, and information processing program | |
Hua et al. | Centrality metrics’ performance comparisons on stock market datasets | |
Mahmoodi-Reihani et al. | A novel graphical representation and similarity analysis of protein sequences based on physicochemical properties | |
Aksan et al. | Clustering methods for power quality measurements in virtual power plant | |
Li et al. | An improved parallelized multi-objective optimization method for complex geographical spatial sampling: AMOSA-II | |
US9400868B2 (en) | Method computer program and system to analyze mass spectra | |
Manis et al. | A two-steps-ahead estimator for bubble entropy | |
Murayama et al. | Characterizing reaction route map of realistic molecular reactions based on weight rank clique filtration of persistent homology | |
Fang et al. | Assessment of forest ecological function levels based on multi-source data and machine learning | |
Jaber et al. | Hierarchical structure of the cosmic web and galaxy properties | |
Herbert et al. | Interaction energy analysis of monovalent inorganic anions in bulk water versus air/water interface | |
CN113808681A (en) | ABO (abnormal noise) rapid prediction based on SHAP-Catboost3Method and system for specific surface area of perovskite material | |
Yan et al. | Multi-task bioassay pre-training for protein-ligand binding affinity prediction | |
Borgoo et al. | Quantum similarity study of atoms: A bridge between hardness and similarity indices | |
CN116825180A (en) | Method, system and computer readable carrier for analyzing large dynamic protein-small molecule interactions | |
Hu et al. | Synthetic Data Generation Based on RDB-CycleGAN for Industrial Object Detection | |
Zhang et al. | Modeling of tunneling total loads based on symbolic regression algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |