CN116106461B - Method and device for predicting liquid chromatograph retention time based on deep graph network - Google Patents

Method and device for predicting liquid chromatograph retention time based on deep graph network Download PDF

Info

Publication number
CN116106461B
CN116106461B CN202211374166.0A CN202211374166A CN116106461B CN 116106461 B CN116106461 B CN 116106461B CN 202211374166 A CN202211374166 A CN 202211374166A CN 116106461 B CN116106461 B CN 116106461B
Authority
CN
China
Prior art keywords
information
layer
graph network
retention time
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211374166.0A
Other languages
Chinese (zh)
Other versions
CN116106461A (en
Inventor
蓝振忠
康启越
刘航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Westlake University
Original Assignee
Westlake University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Westlake University filed Critical Westlake University
Priority to CN202211374166.0A priority Critical patent/CN116106461B/en
Publication of CN116106461A publication Critical patent/CN116106461A/en
Application granted granted Critical
Publication of CN116106461B publication Critical patent/CN116106461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method and a device for predicting liquid chromatography retention time based on a deep map network. The method comprises the steps of obtaining molecular structure information of chemical substances to be detected, and constructing graph network information according to the molecular structure information, wherein the graph network information comprises node characteristics, edge characteristics and adjacent matrixes; and inputting the graph network information into a trained deep graph network model for predicting the liquid chromatography retention time, and predicting the liquid chromatography retention time by using the deep graph network model. The deep layer graph network model comprises a graph network layer, a reading layer and a linear layer; the graph network layer introduces molecular side information into an information transmission process, introduces residual connection, and increases model depth to improve prediction effect; the readout layer employs a attention-based readout layer. The method for predicting the liquid chromatography retention time based on the deep map network can improve the prediction accuracy.

Description

Method and device for predicting liquid chromatograph retention time based on deep graph network
Technical Field
The invention belongs to the technical field of liquid chromatography and information processing, and particularly relates to a method and a device for predicting liquid chromatography retention time based on a deep map network.
Background
In the last few decades, liquid chromatography-mass spectrometry (LC-MS) has been used as the most efficient method for identifying small molecular structures due to its high sensitivity and high selectivity. While tandem mass spectrometry (MS/MS) information has proven useful for characterizing structures, relying solely on tandem mass spectrometry is insufficient to determine structures because the tandem mass spectrometry database is extremely limited. In face of this challenge, retention times have been used to aid in the identification of compounds. The retention time is the time required for a sample to enter the column until it exits the column to be detected by mass spectrometry. Because the retention time can provide orthogonal information beyond that obtained by tandem mass spectrometry, the ability to reduce the number of possible structures during identification is an important means to exclude identification of false positives. How to accurately predict the retention time of liquid chromatography and the retention time under different liquid phase conditions is a main problem to be solved by the present invention.
Currently, there are limited studies, and conventional machine learning methods, such as bayesian ridge regression, random forest, etc., are used to predict retention time based on molecular fingerprints or molecular descriptors. However, molecular fingerprints or descriptors can only represent part of the nature of chemical molecules, and cannot use the information of the overall structure of the molecules.
Disclosure of Invention
Aiming at the problem that the existing traditional machine learning based on molecular fingerprints or molecular descriptors is low in prediction accuracy, the invention provides a method for predicting liquid chromatography retention time based on a deep map network, so that prediction accuracy is improved.
The technical scheme adopted by the invention is as follows:
a method for predicting liquid chromatography retention time based on a deep map network, comprising the steps of:
acquiring molecular structure information of a chemical substance to be detected, and constructing graph network information according to the molecular structure information, wherein the graph network information comprises node characteristics, edge characteristics and adjacent matrixes;
and inputting the graph network information into a trained deep graph network model for predicting the liquid chromatography retention time, and predicting the liquid chromatography retention time by using the deep graph network model.
Further, the node features include: atom type, chiral center type, chirality, atomicity, formal charge, hybridization, aromaticity, whether the hydrogen donor or acceptor is a heteroatom, whether the number of hydrogen atoms in a ring, the number of radical electrons, the number of valence electrons, the Crippen LogP contribution rate, crippen molar refractive contribution rate, gaseiger charge, mass number, and topological polar surface area contribution; the edge feature includes: bond type, whether conjugated, whether part of a ring, whether rotatable, and steric structural information of the chemical bond; the adjacency matrix is constructed from molecular chemical bonds.
Further, the deep layer graph network model comprises a graph network layer, a reading layer and a linear layer; the graph network layer introduces molecular side information into an information transmission process, introduces residual connection, and increases model depth to improve prediction effect.
Further, the processing procedure of the graph network layer comprises the following steps:
transmitting side information between a source node u and a target node v and information of the source node u to the target node v, and aggregating the target node v by adopting a softmax function to obtain updated information m l
Information m after updating l Processing with linear layer, and finally updating molecular information and original molecular information by nonlinear activation function sigmaAnd adding, namely performing residual connection operation.
Further, the readout layer adopts a readout layer based on an attention mechanism; the reading layer based on the attention mechanism comprises super virtual nodes, wherein the super virtual nodes are connected with each atomic node in a molecule, and the codes of the super virtual nodes are firstly obtained by summation and then updated by using the following formula:
e i =concat(c,n i )*W+b
h k ,c k =GRU(h k-1 ,c k-1 )
wherein c is the code of the super virtual node, n i Representing the code of each atomic node in the molecule, e i Alpha is the weight after passing through the linear layer i For returning using softmaxThe importance of a normalization represents the coefficient of the degree, the sum of which is one;representing all atomic nodes in all molecules; GRU is a gating circulation unit, c k Calculating the codes of the super virtual nodes for the kth pass through the graph attention mechanism, h k Encoding the molecules after the kth update.
Further, the linear layer comprises 2 linear layers, wherein the hidden layer dimension of the first layer is 1024, and after passing through the first layer, the dimension is projected to 1 through the linear rectification function ReLU and then through the second layer, so as to predict the retention time.
Further, the training process of the deep map network model comprises the following steps: and selecting a retention time data set, dividing the retention time data set into a training set, a verification set and a test set, constructing graph network information, and training the deep graph network model by adopting a SmoothL1 loss function and adopting a self-adaptive moment estimation algorithm.
An apparatus for predicting liquid chromatography retention time based on a deep map network, comprising:
the graph network information construction module is used for acquiring the molecular structure information of the chemical substance to be detected and constructing graph network information according to the molecular structure information, wherein the graph network information comprises node characteristics, edge characteristics and adjacent matrixes;
and the retention time prediction module is used for inputting the graph network information into a trained deep layer graph network model for liquid chromatography retention time prediction, and predicting the liquid chromatography retention time by using the deep layer graph network model.
The beneficial effects of the invention are as follows:
aiming at the problem that the existing traditional machine learning based on molecular fingerprints or molecular descriptors is low in prediction accuracy, the invention firstly proposes to introduce a deep map network to perform retention time prediction, and aiming at the problem of chemical substance retention time prediction, performs multiple optimization on a model, and further achieves the effect of improving prediction accuracy. Compared with the traditional machine learning method, the graph network model can use atomic level descriptors and meanwhile use structural information (graph network information) of chemical substances, so that a better prediction effect can be achieved.
The invention develops a deep graph rolling network (deep GCN-RT) model, introduces residual connection in the model for the first time, introduces side (chemical bond) information of molecules, introduces a graph network 'read out' module based on an attention mechanism, and obtains the model with the best prediction effect at present on a 'METIN retention time data set' (SMRT).
Furthermore, the present invention compares the effect of the developed model on other liquid chromatography datasets given that different liquid chromatography conditions typically tend to be used between different studies. The results show that the model developed by the invention significantly improves the accuracy of predictions on the SMRT data set and the transfer learning data set as compared to the literature report model. Finally, LCMS-based molecular recognition using the RIKEN-PlaSMA dataset, deep gcn-RT shows great advantages in reducing the number of candidate structures and improving top-k recognition accuracy.
Drawings
Fig. 1. The model structure of the present invention.
Figure 2. Loss in training process of the present invention.
FIG. 3 Structure identification of RIKEN-PlaSMA dataset. Wherein (a) the graph is the average number of selected structures when different identification modes are used for identification, and the abscissa thereof represents the result of structure identification by using the software of MSFinder alone and the result of structure identification by using the MSFinder and the retention time prediction model developed in the study at the same time, respectively, and the ordinate thereof represents the average number of candidate structures of each chromatographic peak (the average of 100 candidate structures of the chromatogram in total); (b) The graph shows the accuracy of the identification, the abscissa represents whether the candidate structures of top-1, top-2, top-5, top-10, top-15, and top-20 contain the true structures, and the ordinate represents the proportion of correctly identified molecular structures, and identification type indicates the use of different identification means (MSFinder alone and MSFinder and DeepGCN-RT together).
Fig. 4. Predicted effect of the model of the present invention on METLIN retention time dataset, with the abscissa being the experimentally determined true retention time and the ordinate being the predicted retention time of the present study development model.
Fig. 5 is a histogram of the prediction error of the inventive model on the METLIN retention time dataset, with the abscissa representing the prediction error and the ordinate representing the corresponding count (count).
Detailed Description
The present invention will be further described in detail with reference to the following examples and drawings, so that the above objects, features and advantages of the present invention can be more clearly understood.
The invention relates to a method for predicting retention time based on a deep graph network, which comprises the construction of chemical substance graph network information, including the construction of node characteristics, edge characteristics and adjacency matrixes. The adopted chemical substance deep learning model is a deep layer graph network model, which comprises the following steps: in the information transmission process, introducing side information to transmit information; using residual connection to construct a deep graph network model; the "readout" module of the improved model uses a "readout" model based on the attention mechanism to achieve better prediction effect. The architecture of the present model is shown in fig. 1.
The specific scheme of the method for predicting the retention time based on the deep graph network is as follows:
1. construction of chemical substance graph network information
The construction of chemical graph network information includes constructing node features, edge features, and adjacency matrices.
The node features include: atom type, chiral center type, chirality, atomicity, formal charge, hybridization, aromaticity, whether the hydrogen donor or acceptor is a heteroatom, whether the number of hydrogen atoms in the ring, the number of radical electrons, the number of valence electrons, the Crippen LogP contribution rate, the Crippen molar refractive contribution rate, the gaseiger charge, the number of masses (divided by 100), and the topopolar surface area contribution (Topological polar surface area contribution).
The adjacency matrix is built using chemical bonds of chemicals. In addition, the invention also introduces edge features into the information transfer process, wherein the edge features comprise: bond type, whether conjugated, whether part of a ring, whether rotatable, and steric structural information of the chemical bond.
The information is respectively constructed into node characteristics, edge characteristics and adjacency matrixes by using open source software RDkit, and the information is input into a graph network to predict the retention time.
2. Construction of deep layer graph network model
As shown in FIG. 1, the deep GCN-RT model of the present invention consists of a graph network Layer (GNN Layer), a Readout Layer (GNN Readout), and a linear Layer (Dense Layer).
1. Picture network layer (GNN layer)
The graph network is a graph-rolling network, and the invention makes the following improvements on the basis of GCN proposed by Kensert et al (Kensert, A.; bouwmetester, R.; efthiadidis, K.; et al, graph convolutional networks for improved prediction and interpretability of chromatographic retention data. Anal chem.2021,93 (47), 15633-15641.): adding side (chemical bond) information of molecules to perform graph network model modeling; adding residual connections to improve the model structure; the depth of the model is increased to improve the predictive effect.
The GCN layer of Kensert et al is as follows:
wherein u and v are the source node and the target node respectively, and N (v) is all the source nodes of v. c uv Is the square root of the degree of the node. Sigma represents a nonlinear function.Molecular coding (ebedding) for the 1+1st updated target node v,>for the molecular code of the target node v after l times of updating, l is the number of times of updating,b l is the bias parameter of the first layer, W l Is the weight parameter of the first layer.
The GCN layer firstly transmits the side information between u and v and the information of the source node u to the target node v, and the target node v adopts a softmax function to aggregate, as shown in a formula (2) and a formula (3):
wherein the method comprises the steps ofRespectively representing information and side information of a source node, m l Representing updated information. The information of the source node refers to node characteristics in the previous text, the side information refers to side characteristics in the previous text, and the source node and the target node are determined by an adjacency matrix in the previous text.
Then, the obtained updated information m l Using linear layer processing (i is the number of updates, b l Is the bias parameter of the first layer, W l The weight parameter of the first layer) and then through a nonlinear activation function sigma. Finally, the updated molecular information and the original molecular informationThe summation (i.e., residual join operation) is performed as follows:
2. readout layer (GNN Readout)
Currently, map-based readout mostly employs simple readout operations such as "average", "summation", and the like. In order to improve the prediction accuracy of the model, the invention adopts a reading layer based on an attention mechanism. Specifically, after the message passing process, a molecular code is obtained for each atomic node in the molecule. The invention first creates a "super virtual" node and connects the node to each atomic node. The coding of the "super virtual" node is first obtained by summing and then updating using the following formula, in particular:
e i =concat(c,n i )*W+b (5)
h k ,c k =GRU(h k-1 ,c k-1 ) (8)
where c is the code of the "super virtual" node, n i Representing the code for each atomic node in the molecule,representing all atomic nodes in the molecule. e, e i Is the weight after passing through the linear layer. Alpha i The importance representing degree coefficient normalized by softmax is one in sum. The GRU is a gated loop unit. c k Calculating the coding of the super virtual node for the kth time passing through the graph attention mechanism, h k Encoding the molecules after the kth update.
The invention can achieve better retention time prediction effect based on the reading of the attention mechanism, because the attention mechanism of the graph can effectively capture the useful information of the target task. In addition, the gating circulation unit has good effects in the aspects of information retention and invalid information filtering. The two are combined, so that a better effect can be achieved in the aspect of capturing the global characteristics of chemical molecules.
3. Linear Layer (Dense Layer)
The code of the readout layer is input to a linear layer, and the structure of the linear layer is 2 layers of linear layers, wherein the hidden layer dimension of the first layer is 1024. After passing through the first layer, the dimensions are projected to 1 through a linear rectification function (ReLU) and then through the second layer for retention time prediction.
3. Retention time prediction
Training phase: the method comprises the steps of dividing an existing data set, such as a METIN retention time data set, which contains structural information of chemical substances and experimentally measured retention time, into a training set, a verification set and a test set, constructing the graph network information by using a graph network information constructing part, and then training the deep GCN-RT model by adopting a self-adaptive time estimation algorithm (Adam) algorithm by adopting a smoothL1 loss function.
Retention time prediction phase: the simplified molecular linear input specification (SMILES) of the chemical substance to be detected is obtained, the descriptor and molecular structure information of the chemical substance are extracted by using open source software RDkit, the construction of the graph network information is completed, the constructed graph network information (namely node characteristics, edge characteristics and adjacent matrixes) is input into a deep GCN-RT model after training, and the model outputs a retention time prediction result.
4. Examples
1. Model training
The METIN retention time dataset, which was derived from METIN laboratory, containing structural information of 80038 chemicals and experimentally determined retention times, was selected for model training. The invention divides the data set into a training set, a verification set and a test set, and based on the data, the graph network information is constructed by using the graph network information constructing part.
The training process of the model is based on the data set, a smoothL1 loss function is adopted, and an adaptive time estimation algorithm (Adam) algorithm is adopted for model training. The hidden layer dimension of the model was 200, the dense layer dimension was 1024, the dropout ratio was 0.1, and the batch size was 64. The training results are shown in fig. 2, wherein train_loss represents the loss of the training set in the training process, valid_mae represents the mean absolute error of the verification set, and test_mae represents the mean absolute error of the test set.
Fig. 4 is a graph showing the predictive effect of the model of the present invention on the MELIN retention time dataset. Fig. 5 is a graph of the prediction error of the model of the present invention on a METLIN retention time dataset. As can be seen from fig. 4 and fig. 5, the prediction error of the model of the present invention is smaller, and the accuracy of prediction is higher.
2. The technical proposal of the invention has the beneficial effects that
The effect of the retention time prediction model developed by the invention is far better than that of a model reported in literature.
2.1 Comparison of the model Effect of the invention with the model Effect of the prior art
Comparing the model effect of the present invention with the prior art literature model effect, as shown in table 1, the average absolute error (MAE) of the model of the present invention is lowest, the median absolute error (MedAE) and the average absolute percent error (MAPE) are lower than the literature reported model.
TABLE 1 comparison of the effects of the model of the invention (DeepGCN-RT) and the literature model
Model MAE(s)↓ MedAE(s)↓ MAPE↓ R2↑ Reference
GCN 29.4 - 0.04 0.89 Kensert et al.,Anal.Chem.2021
DNNpwa 39.62 25.08 0.05 0.85 Ju et al.,Anal.Chem.2021
GNN-RT 39.87 25.24 0.05 0.85 Yang et al.,Anal.Chem.2021
DeepGNN-RT 26.46 12.39 0.03 0.89 -
Among them, the results of GCN, DNNpwa, GNN-RT are cited in the following documents:
Kensert,A.;Bouwmeester,R.;Efthymiadis,K.,et al.,Graph convolutional networks for improved prediction and interpretability of chromatographic retention data.Anal Chem.2021,93(47),15633-15641.
Ju,R.;Liu,X.;Zheng,F.,et al.,Deep Neural Network Pretrained by Weighted Autoencoders and Transfer Learning for Retention Time Prediction of Small Molecules.Anal Chem.2021,93(47),15651-15658.
Yang,Q.;Ji,H.;Lu,H.,et al.,Prediction of liquid chromatographic retention time with graph neural networks to assist in small molecule identification.Anal Chem.2021,93(4),2200-2206.
furthermore, the present invention explores the effect of residual connection and model depth on the prediction effect, as shown in table 2. Overall, with the layer number model, residual connection (residual) is added, so that the effect is improved obviously; in the case of residual connection, the effect of the model gradually becomes better as the depth of the model increases.
TABLE 2 influence of residual connection and model depth on model effect
In addition, the different readout effects are shown in table 3. Wherein deep gcn-RT uses readout based on an attention mechanism. It can be seen that the average readout is better than the sum readout, while the attention-based mechanism of the present invention introduces the best readout effect.
TABLE 3 Effect of different readout layers
2.2 Migration learning effect)
Since different subject studies generally use different liquid phase conditions, the model built on the SMRT dataset cannot be directly used for datasets under other liquid phase conditions. To test the generalization ability of the model, 7 reverse phase liquid chromatography datasets and 2 hydrophilic interaction chromatography datasets were collected from the PredRet database (Stanstrup, j.; neumann, s.; vrhovsek, u.; predRet: prediction of retention time by direct mapping between multiple chromatographic systems. Animal chem.2015,87 (18), 9421-8.), and a model obtained using SMRT training was used for migration learning to obtain a migration learning model deep gcn-RT-TL, the model effects are shown in table 4:
table 4 comparison of migration learning effects
It can be found that the model effect of the invention is far better than that of the document report model. Among them, the results of DNNpwa-TL and GNN-RT-TL are respectively cited in the following documents:
Ju,R.;Liu,X.;Zheng,F.,et al.,Deep Neural Network Pretrained by Weighted Autoencoders and Transfer Learning for Retention Time Prediction of Small Molecules.Anal Chem.2021,93(47),15651-15658.
Yang,Q.;Ji,H.;Lu,H.,et al.,Prediction of liquid chromatographic retention time with graph neural networks to assist in small molecule identification.Anal Chem.2021,93(4),2200-2206.
2.3 Application of model to small molecule structure identification
A retention time prediction model was built, ultimately for structural identification of the compounds. Thus, the present invention selects the RIKEN-PlaSMA dataset from the MoNA database for structural identification of compounds. The data set is composed of 434 small molecule compounds, 334 compounds are taken to establish a migration learning model, and the other 100 compounds are used for structure identification. The structure identification adopts MSFinder software and the migration learning model of the invention, and the result is shown in figure 3. It can be seen that deep gcn-RT of the present invention shows great advantages in terms of reducing the number of candidate structures and improving top-k recognition accuracy: the average number of candidate structures is reduced from 50 to 35; the top-k accuracy is also significantly improved.
In summary, the present invention provides a method for predicting retention time effects based on deep graph networks. The effect of the method is better than that of the existing models reported in all documents.
Although the above-described method of the present invention is based on liquid chromatography for case analysis, the application of the present invention is not limited to liquid chromatography, and can be performed by using the model of the present study, such as gas chromatography.
Based on the same inventive concept, another embodiment of the present invention provides an apparatus for predicting retention time of liquid chromatography based on a deep map network, comprising:
the graph network information construction module is used for acquiring the molecular structure information of the chemical substance to be detected and constructing graph network information according to the molecular structure information, wherein the graph network information comprises node characteristics, edge characteristics and adjacent matrixes;
and the retention time prediction module is used for inputting the graph network information into a trained deep layer graph network model for liquid chromatography retention time prediction, and predicting the liquid chromatography retention time by using the deep layer graph network model.
Wherein the specific implementation of each module is referred to the previous description of the method of the present invention.
Based on the same inventive concept, another embodiment of the present invention provides a computer device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps in the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, implements the steps of the inventive method.
The above-disclosed embodiments of the present invention are intended to aid in understanding the contents of the present invention and to enable the same to be carried into practice, and it will be understood by those of ordinary skill in the art that various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention. The invention should not be limited to what has been disclosed in the examples of the specification, but rather by the scope of the invention as defined in the claims.

Claims (6)

1. A method for predicting liquid chromatography retention time based on a deep map network, comprising the steps of:
acquiring molecular structure information of a chemical substance to be detected, and constructing graph network information according to the molecular structure information, wherein the graph network information comprises node characteristics, edge characteristics and adjacent matrixes;
inputting the graph network information into a trained deep graph network model for predicting the retention time of liquid chromatography, and predicting the retention time of the liquid chromatography by using the deep graph network model;
the node features include: atom type, chiral center type, chirality, atomicity, formal charge, hybridization, aromaticity, whether the hydrogen donor or acceptor is a heteroatom, whether the number of hydrogen atoms in a ring, the number of radical electrons, the number of valence electrons, the Crippen LogP contribution rate, crippen molar refractive contribution rate, gaseiger charge, mass number, and topological polar surface area contribution; the edge feature includes: the type of bond, whether conjugated, whether part of a ring, whether rotatable, and the steric structure information of the chemical bond; the adjacency matrix is constructed according to molecular chemical bonds;
the deep layer graph network model comprises a graph network layer, a reading layer and a linear layer; the graph network layer introduces the chemical bond information of the molecules into the information transmission process, introduces residual connection, and increases the model depth to improve the prediction effect; the graph network is a graph rolling network;
the processing process of the graph network layer comprises the following steps:
transmitting side information between a source node u and a target node v and information of the source node u to the target node v, and aggregating the target node v by adopting a softmax function to obtain updated information m l
Information m after updating l Processing with linear layer, and finally updating molecular information and original molecular information by nonlinear activation function sigmaAdding, namely performing residual error connection operation;
the reading layer adopts a reading layer based on an attention mechanism; the reading layer based on the attention mechanism comprises super virtual nodes, wherein the super virtual nodes are connected with each atomic node in a molecule, and the codes of the super virtual nodes are firstly obtained by summation and then updated by using the following formula:
e i =concat(c,n i )*W+b
h k ,c k =GRU(h k-1 ,c k-1 )
wherein c is the code of the super virtual node, n i Representing the code of each atomic node in the molecule, e i Alpha is the weight after passing through the linear layer i A coefficient of importance representing degree for normalization using softmax, the sum of which is one;representing all atomic nodes in all molecules; GRU is a gating circulation unit, c k Calculating the codes of the super virtual nodes for the kth pass through the graph attention mechanism, h k Encoding the molecules after the kth update.
2. The method of claim 1, wherein the linear layers comprise 2 linear layers, wherein the hidden layer dimension of the first layer is 1024, and the dimension is projected to 1 after passing through the first layer and then through the linear rectification function ReLU and then through the second layer to predict the retention time.
3. The method of claim 1, wherein the training process of the deep-graph network model comprises: and selecting a retention time data set, dividing the retention time data set into a training set, a verification set and a test set, constructing graph network information, and training the deep graph network model by adopting a SmoothL1 loss function and adopting a self-adaptive moment estimation algorithm.
4. An apparatus for predicting liquid chromatography retention time based on a deep map network, comprising:
the graph network information construction module is used for acquiring the molecular structure information of the chemical substance to be detected and constructing graph network information according to the molecular structure information, wherein the graph network information comprises node characteristics, edge characteristics and adjacent matrixes;
the retention time prediction module is used for inputting the graph network information into a trained deep layer graph network model for liquid chromatography retention time prediction, and predicting the liquid chromatography retention time by using the deep layer graph network model;
the node features include: atom type, chiral center type, chirality, atomicity, formal charge, hybridization, aromaticity, whether the hydrogen donor or acceptor is a heteroatom, whether the number of hydrogen atoms in a ring, the number of radical electrons, the number of valence electrons, the Crippen LogP contribution rate, crippen molar refractive contribution rate, gaseiger charge, mass number, and topological polar surface area contribution; the edge feature includes: the type of bond, whether conjugated, whether part of a ring, whether rotatable, and the steric structure information of the chemical bond; the adjacency matrix is constructed according to molecular chemical bonds;
the deep layer graph network model comprises a graph network layer, a reading layer and a linear layer; the graph network layer introduces the chemical bond information of the molecules into the information transmission process, introduces residual connection, and increases the model depth to improve the prediction effect; the graph network is a graph rolling network;
the processing process of the graph network layer comprises the following steps:
transmitting side information between a source node u and a target node v and information of the source node u to the target node v, and aggregating the target node v by adopting a softmax function to obtain updated information m l
Updated informationm l Processing with linear layer, and finally updating molecular information and original molecular information by nonlinear activation function sigmaAdding, namely performing residual error connection operation;
the reading layer adopts a reading layer based on an attention mechanism; the reading layer based on the attention mechanism comprises super virtual nodes, wherein the super virtual nodes are connected with each atomic node in a molecule, and the codes of the super virtual nodes are firstly obtained by summation and then updated by using the following formula:
e i =concat(c,n i )*W+b
h k ,c k =GRU(h k-1 ,c k-1 )
wherein c is the code of the super virtual node, n i Representing the code of each atomic node in the molecule, e i Alpha is the weight after passing through the linear layer i A coefficient of importance representing degree for normalization using softmax, the sum of which is one;representing all atomic nodes in all molecules; GRU is a gating circulation unit, c k Calculating the codes of the super virtual nodes for the kth pass through the graph attention mechanism, h k Encoding the molecules after the kth update.
5. A computer device comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-3.
6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1-3.
CN202211374166.0A 2022-11-03 2022-11-03 Method and device for predicting liquid chromatograph retention time based on deep graph network Active CN116106461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211374166.0A CN116106461B (en) 2022-11-03 2022-11-03 Method and device for predicting liquid chromatograph retention time based on deep graph network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211374166.0A CN116106461B (en) 2022-11-03 2022-11-03 Method and device for predicting liquid chromatograph retention time based on deep graph network

Publications (2)

Publication Number Publication Date
CN116106461A CN116106461A (en) 2023-05-12
CN116106461B true CN116106461B (en) 2024-02-06

Family

ID=86258567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211374166.0A Active CN116106461B (en) 2022-11-03 2022-11-03 Method and device for predicting liquid chromatograph retention time based on deep graph network

Country Status (1)

Country Link
CN (1) CN116106461B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899510A (en) * 2020-07-28 2020-11-06 南京工程学院 Intelligent traffic system flow short-term prediction method and system based on divergent convolution and GAT
CN113192559A (en) * 2021-05-08 2021-07-30 中山大学 Protein-protein interaction site prediction method based on deep map convolution network
CN113241128A (en) * 2021-04-29 2021-08-10 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113241130A (en) * 2021-06-08 2021-08-10 西南交通大学 Molecular structure prediction method based on graph convolution network
CN113299354A (en) * 2021-05-14 2021-08-24 中山大学 Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network
CN114121178A (en) * 2021-12-07 2022-03-01 中国计量科学研究院 Chromatogram retention index prediction method and device based on graph convolution network
CN114565187A (en) * 2022-04-01 2022-05-31 吉林大学 Traffic network data prediction method based on graph space-time self-coding network
CN114629674A (en) * 2021-11-11 2022-06-14 北京计算机技术及应用研究所 Attention mechanism-based industrial control network security risk assessment method
CN114818515A (en) * 2022-06-24 2022-07-29 中国海洋大学 Multidimensional time sequence prediction method based on self-attention mechanism and graph convolution network
CN115148302A (en) * 2022-05-18 2022-10-04 上海天鹜科技有限公司 Compound property prediction method based on graph neural network and multi-task learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899510A (en) * 2020-07-28 2020-11-06 南京工程学院 Intelligent traffic system flow short-term prediction method and system based on divergent convolution and GAT
CN113241128A (en) * 2021-04-29 2021-08-10 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113192559A (en) * 2021-05-08 2021-07-30 中山大学 Protein-protein interaction site prediction method based on deep map convolution network
CN113299354A (en) * 2021-05-14 2021-08-24 中山大学 Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network
CN113241130A (en) * 2021-06-08 2021-08-10 西南交通大学 Molecular structure prediction method based on graph convolution network
CN114629674A (en) * 2021-11-11 2022-06-14 北京计算机技术及应用研究所 Attention mechanism-based industrial control network security risk assessment method
CN114121178A (en) * 2021-12-07 2022-03-01 中国计量科学研究院 Chromatogram retention index prediction method and device based on graph convolution network
CN114565187A (en) * 2022-04-01 2022-05-31 吉林大学 Traffic network data prediction method based on graph space-time self-coding network
CN115148302A (en) * 2022-05-18 2022-10-04 上海天鹜科技有限公司 Compound property prediction method based on graph neural network and multi-task learning
CN114818515A (en) * 2022-06-24 2022-07-29 中国海洋大学 Multidimensional time sequence prediction method based on self-attention mechanism and graph convolution network

Also Published As

Publication number Publication date
CN116106461A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
Li et al. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact‐map prediction in CASP13
Liu et al. Inferring gene regulatory networks using the improved Markov blanket discovery algorithm
CN112087420B (en) Network killing chain detection method, prediction method and system
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
Li et al. Protein contact map prediction based on ResNet and DenseNet
CN112733997B (en) Hydrological time series prediction optimization method based on WOA-LSTM-MC
CN113257357B (en) Protein residue contact map prediction method
Guo et al. Machine learning based feature selection and knowledge reasoning for CBR system under big data
Sarkar et al. An algorithm for DNA read alignment on quantum accelerators
CN115274007A (en) Generalizable and interpretable depth map learning method for discovering and optimizing drug lead compound
Xu et al. Adaptive surrogate models for uncertainty quantification with partially observed information
Chen et al. LOGER: A learned optimizer towards generating efficient and robust query execution plans
CN112270950B (en) Network enhancement and graph regularization-based fusion network drug target relation prediction method
CN116106461B (en) Method and device for predicting liquid chromatograph retention time based on deep graph network
CN116208399A (en) Network malicious behavior detection method and device based on metagraph
Chen et al. Nas-bench-zero: A large scale dataset for understanding zero-shot neural architecture search
Canchila et al. Hyperparameter optimization and importance ranking in deep learning–based crack segmentation
Wang et al. GNN-Dom: an unsupervised method for protein domain partition via protein contact map
Yang et al. Graph Contrastive Learning for Clustering of Multi-layer Networks
CN117976047B (en) Key protein prediction method based on deep learning
Bonetta Valentino et al. Machine learning using neural networks for metabolomic pathway analyses
CN117351300B (en) Small sample training method and device for target detection model
CN118243842A (en) Liquid chromatography retention time prediction method under different chromatographic conditions
Sun Construction of computer algorithms in bioinformatics of the fusion genetic algorithm
CN114842920A (en) Molecular property prediction method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant