CN115966266B - Anti-tumor molecule strengthening method based on graph neural network - Google Patents
Anti-tumor molecule strengthening method based on graph neural network Download PDFInfo
- Publication number
- CN115966266B CN115966266B CN202310015687.5A CN202310015687A CN115966266B CN 115966266 B CN115966266 B CN 115966266B CN 202310015687 A CN202310015687 A CN 202310015687A CN 115966266 B CN115966266 B CN 115966266B
- Authority
- CN
- China
- Prior art keywords
- molecules
- tumor
- molecular
- molecule
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000000259 anti-tumor effect Effects 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 36
- 238000005728 strengthening Methods 0.000 title claims abstract description 30
- 239000003814 drug Substances 0.000 claims abstract description 28
- 238000005457 optimization Methods 0.000 claims abstract description 28
- 229940079593 drug Drugs 0.000 claims abstract description 25
- 230000002787 reinforcement Effects 0.000 claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 10
- 230000004048 modification Effects 0.000 claims abstract description 8
- 238000012986 modification Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 42
- 239000011159 matrix material Substances 0.000 claims description 23
- 239000000126 substance Chemical group 0.000 claims description 22
- 238000005259 measurement Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 12
- 230000009471 action Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 3
- 239000000376 reactant Substances 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims 2
- 230000008859 change Effects 0.000 claims 1
- 238000013461 design Methods 0.000 abstract description 8
- 238000011160 research Methods 0.000 description 5
- 238000012216 screening Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000002246 antineoplastic agent Substances 0.000 description 1
- 229940041181 antineoplastic drug Drugs 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 229960000074 biopharmaceutical Drugs 0.000 description 1
- 238000007385 chemical modification Methods 0.000 description 1
- 238000003426 chemical strengthening reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 230000003389 potentiating effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Image Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention designs an antitumor molecule reinforcement learning method based on a graph neural network, which comprises the following steps of: step 1: classifying the molecules into antitumor (positive) and non-antitumor (negative) categories according to the molecular labels in the database, and step 2: inputting the obtained graph into a proposed anti-tumor molecule strengthening model, learning the implicit expression of the graph according to different properties of positive molecules and negative molecules, and obtaining the local structural characteristics of the anti-tumor molecules for one-step generation and optimization of the molecules, wherein the step 3 is as follows: constraint is applied, target optimization is carried out, the drug properties of molecules in the molecular strengthening process are ensured, and step 4: substituting the obtained local molecular structure into the anti-tumor molecule to carry out structural modification and optimization, and step 5: judging the synthesizability by using the existing molecular property reasonable detection tool, outputting reasonable molecules, and further reversely optimizing unreasonable molecules, wherein in the step 6: obtaining novel anti-tumor molecules reasonably and ending the task.
Description
Technical Field
The invention relates to a medicine molecule strengthening technology based on a graphic neural network, and belongs to the technical fields of anti-tumor medicine molecule chemistry research and graphic neural network strengthening learning.
Background
The state of the art and the problems closest to the present invention are described.
Optimization of drug molecules and research of new drugs are important for treatment of tumors. The goal of drug molecule optimization is to enhance the more desirable biological effects in a particular direction while ensuring biopharmaceutical acceptability. The current problem is the high time and monetary cost of developing a highly effective drug. In the current pharmaceutical research field, traditional strategies are based on screening of existing libraries of compounds. However, because of the limited structural diversity of the existing compounds, the library molecules, and the research institutions have already screened many times, the discovery and innovation of highly potent drug molecules has become more and more challenging.
With the development of machine learning, the method plays an increasing role in the field of drug discovery, and compared with the traditional screening strategy, the method for fusing machine learning and molecular research has more effectiveness and expandability on task processing.
At present, the related person in the field opens up a "de novo molecular design technique" -an atomic-based, fragment-based, reaction-based molecular design method, by which new molecules with high effectiveness are generated starting from scratch by analytical calculations. The design idea is as follows: the traditional rough direct screening method is abandoned, global and local are balanced, key features are explored, and target molecules are constructed based on clues. The method has the defects that the method needs high manual generation and molecular architecture from scratch; at the same time, explicit design goals and normalized design principles are also necessary for the interpretability of the results. Meanwhile, since the de novo molecule design is based on fragments, the mass of the resulting molecule is also indispensible from earlier manipulation. In summary, the current outstanding challenges of de novo design methods are high computational effort, low interpretability, and high dependence of the resulting on pre-selection.
Disclosure of Invention
Technical problems: the invention provides an anti-tumor drug molecule reinforcement learning method based on a graph neural network. Constructing a molecular anti-tumor enhancement graph neural network model, and carrying out feature extraction and molecular property classification based on drug molecules; and extracting key features by modifying the feature observation classification result. The analysis of key characteristics is utilized to carry out chemical strengthening and modification on the basis of the original molecular structure, and finally, new molecules with stronger target characteristics are obtained, so that the efficiency and the interpretability of molecule generation are improved, and the research and development difficulty, the development period and the cost of antitumor molecular medicaments are reduced.
The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: an antitumor molecule strengthening method based on a graph neural network, which comprises the following steps:
step 1: molecules are classified into antitumor (positive) and non-antitumor (negative) categories according to molecular tags in the database. Each input molecule is described as an undirected graph (graph matrix) in which nodes and edges correspond to atoms and chemical bonds, respectively. A graph neural network (Graph neural networks) model is constructed to generate chemical stability and molecular drug properties of molecules as loss functions and pre-train.
Step 2: inputting the obtained graph into a proposed anti-tumor molecule strengthening model, and learning the implicit expression of the graph according to different properties of positive molecules and negative molecules to obtain the local structural characteristics of the anti-tumor molecules, wherein the local structural characteristics are used for one-step generation and optimization of the molecules.
Step 3: constraint is applied, target optimization is carried out, and the drug properties of molecules in the molecular strengthening process are ensured.
Step 4: substituting the obtained local molecular structure into the anti-tumor molecule to modify and optimize the structure of the anti-tumor molecule.
Step 5: and judging the synthesizability by using the existing molecular property reasonable detection tool, and outputting reasonable molecules. And further reverse optimizing for unreasonable molecules.
Step 6: obtaining novel anti-tumor molecules reasonably and ending the task.
Further, in step (1), the molecules are classified into antitumor (positive) and non-antitumor (negative) categories according to the molecular tags in the database. Each input molecule is described as an undirected graph (graph matrix) in which nodes and edges correspond to atoms and chemical bonds, respectively. The method for constructing the graph neural network (Graph neural networks) model and pre-training comprises the following steps:
(101) The input molecules are graphically represented by g= (V, E, X), where G represents the input molecule, V represents the atoms of the molecule, 1×n One-hot encoding format, E represents the molecular chemical bond, n×n size contiguous matrix, X is characteristic of each atom of the molecule, n×n size matrix. Each input molecule is described as an undirected graph (graph matrix), i.e. G.
(102) Classifying the molecules into antitumor (positive) and non-antitumor (negative) classes according to molecular tags in the database, using G + Or G - And (3) representing.
(103) And constructing a graph neural network (Graph neural networks) model, inputting the graph neural network into an original molecular undirected graph G, outputting the graph neural network into a binary probability matrix P, and pre-training the GNN model by taking chemical stability of generated molecules and molecular drug properties as loss functions.
In the step (2), the obtained image is input into a proposed anti-tumor molecule strengthening model, and the local structural characteristics of the anti-tumor molecules are obtained according to the implicit expression of the learning image of different properties of the positive molecules and the negative molecules, so that the anti-tumor molecules are generated and optimized in one step. The method comprises the following steps:
(201) Inputting the obtained molecule G into a feature extraction model fG-F epsilon R n×h Extraction of implicit features F of molecules
(202) Feature extraction model fG-F epsilon R n×h Comprises (1)
Wherein the method comprises the steps ofCharacterization of the individual molecules obtained in layer 1>For matrix->The j-th row content, the neighbor node with N (v) v, the UPDATE function of each layer, AGG as aggregation function, READOUT as READOUT function, and the characteristics of the molecule obtained after l iterations
(2) To get h G As input, the implicit character F of the molecule is obtained by means of an MLP model.
(203) Inputting the obtained implicit characteristic F into a classification prediction network c.F-P out ∈R n×2 Obtaining a classification probability result matrix P u 。
(204) Carrying out partial transformation on the obtained molecular structure, carrying out feature extraction model to obtain transformed implicit feature F ', inputting the obtained implicit feature F' into a classification prediction network to obtain a classification probability result matrix P n 。
(205) Inputting the probability results obtained in the step (203) and the step (204) into a probability fluctuation function PFF (Fluctuation probability function), bringing the probability results into a MEAS (measurement) function, analyzing the calculation results of the PFF,
PFF=||p u -p n ||
S out =MEAS{PFF(P u ,P n )}
extracting molecular local structural features S with large influence probability fluctuation degree after modification out This is taken as output.
Further, in the step (3), constraint is applied, target optimization is performed, and the drug properties of the molecules in the molecular strengthening process are ensured, wherein the method comprises the following steps:
(301) Using QED values in MEAS functions in training an anti-tumor molecular enhancement model
Performing constraints
MEAS:S out =RF{PFF(P u ,P n )+γQED}
Wherein QED is calculated by adopting RDkit. The RF function maps the result of the probability fluctuation function calculation value subjected to QED constraint to a molecular structure so as to obtain a local molecular structure, thereby ensuring drug-like drug similarity corresponding to the generated local feature.
Further, in the step (4), the obtained local molecular structure is substituted into the structure modification and optimization method for the antitumor molecule as follows:
(401) Inputting existing positive molecules comprises G= (V, E, X), wherein G represents an input molecule, V represents an atom of the molecule, E represents a molecular chemical bond, and X is a characteristic of each atom of the molecule, so that the input molecule is obtained in a graph manner.
(402) Strengthening the molecules by using the molecular local structural characteristics obtained by the trained model: by an automatic iterative optimization method, the most possible atoms or chemical bonds are continuously searched for to be connected, so that the structure of the molecule is modified, and the molecular local structural characteristics obtained before are applied to the molecule, so that the molecule with better anti-tumor performance is gradually constructed.
Further, in the step (5), the existing molecular property reasonable detection tool is used for judging the synthesizability and outputting reasonable molecules. And for unreasonable molecules, the method is further reversely optimized as follows:
(501) An existing molecular property reasonable detection tool p & gtG= (V, E, X) & gtV epsilon & lt R & gt is established, molecular property rationality is analyzed, and chemical feasibility is evaluated.
(502) The model p mainly comprises a state-action module a and a reorder module Q, and at any time step t, the input of the module a is a state and the output is an action, which is a tensor defined in the feature representation space of all the initial reactants. The module Q calculates the optimal value of the state through the Q network; the Actor iteratively updates parameters of the strategy function by using the optimal value, so as to select actions and obtain feedback and a new state. The environment takes the state, the optimal reaction template and the action as references, and the calculation determines whether the round is ended. The evaluation value V is finally output through the Softmax layer to represent the feasibility score.
Further, in the step (6), a reasonable novel anti-tumor molecule is obtained, and the task is ended, and the method comprises the following steps:
(601) By claim 6, in the optimization of the feedback standard penalty, the generated molecules obviously cannot be made into a realistic usable drug, which means that the generated molecules can be manufactured or stably exist in real life by the anti-tumor molecule strengthening model with strong anti-tumor characteristics. This highlights the need for rewarding and punishment using multi-objective optimization when reinforcement learning is used for anti-tumor molecular reinforcement.
(602) Re-emphasizing variable definitions:
X v : dimension R DV Feature vector of node v
h v : dimension R DV State vector of node v
x v1,v2 : dimension R DE Edge (V) 1 ,V 2 ) Feature vectors of (a)
During node operation, a node state update function needs to be definedSo that the node state is iteratively stabilized. And for the case of
The state transformation function of a node, for node V, the transformation of its state vector can be expressed as:
(603) The back propagation process is emphasized here:
in this step, we can find the gradient of the parameter according to the back propagation step, and then use the gradient descent method
And (5) optimizing. The iterative process is as follows:
…
Δw ij =ηδ j x i
then, by using a Backpropagation mode and using automatic backward propagation under a pytorch frame, the molecular structure is continuously and iteratively optimized, so that the generated reinforced molecular model meets multiple requirements of anti-tumor, molecular rationality and the like.
The beneficial effects are that: compared with the prior art, the technical scheme of the method has the following beneficial technical effects:
1. the characteristics of the drug molecules are learned by a pattern neural network mode, the given molecules are reinforced by using the pattern neural network and reinforcement learning method, and the work of the reinforced molecules is transferred to a machine for processing, so that the efficiency is greatly improved compared with the traditional manual reinforcement method.
2. Compared with other methods for generating molecules from scratch according to a given molecular model, the method for strengthening the existing molecules based on the graph neural network and reinforcement learning greatly improves the efficiency and accuracy to a certain extent.
3. The molecules generated by the method have higher interpretability and better visualization, are favorable for understanding the characteristics and the reinforcement cause of the generated reinforced drug molecules, and improve more convenience for related drug researchers.
4. Multiple constraint and property detection are adopted in the generation process, so that the chemical rationality and the pharmaceutical property of the generated molecules are ensured.
Drawings
FIG. 1 is a flow chart of the steps for implementing the present invention;
FIG. 2 is a diagram of an anti-tumor molecular reinforcement model framework;
FIG. 3 is an example molecular structural diagram;
FIG. 4 is a graph showing the results of an example molecular optimization structure.
Detailed Description
In order to enhance the understanding of the present invention, the present embodiment will be described in detail with reference to the accompanying drawings.
Examples: the technical scheme of the invention is described in detail below by taking a chumbl database molecule as an example and combining the drawings.
An antitumor molecule strengthening method based on a graph neural network, which comprises the following steps:
step (1): molecules are classified into antitumor (positive) and non-antitumor (negative) categories according to molecular tags in the database. Each input molecule is described as an undirected graph (graph matrix) in which nodes and edges correspond to atoms and chemical bonds, respectively. The method for constructing the graph neural network (Graph neural networks) model to generate chemical stability and molecular drug properties of molecules as loss functions and pretraining comprises the following steps:
(101) The input molecules are graphically represented by g= (V, E, X), where G represents the input molecule, V represents the atoms of the molecule, 1×n One-hot encoding format, E represents the molecular chemical bond, n×n size contiguous matrix, X is characteristic of each atom of the molecule, n×n size matrix. Each input molecule is described as an undirected graph (graph matrix), i.e. G.
(102) Classifying the molecules into antitumor (positive) and non-antitumor (negative) classes according to molecular tags in the database, using G + Or G - And (3) representing.
(103) Constructing a graph neural network (Graph neural networks) model to generate chemical stability and molecular drug property of molecules as loss functions, wherein the model structure is shown in figure 2, the input is an original molecular undirected graph G, and the output is two classes
And (3) a probability matrix P, and pre-training the GNN model.
And (2) inputting the obtained image into a proposed anti-tumor molecule strengthening model, and learning the implicit expression of the image according to different properties of positive molecules and negative molecules to obtain the local structural characteristics of the anti-tumor molecules for one-step generation and optimization of the molecules. The method comprises the following steps:
(201) Inputting the obtained molecule G into a feature extraction model fG-F epsilon R n×h Extraction of implicit features F of molecules
(202) Feature extraction model fG-F epsilon R n×h Comprises (1)
Wherein the method comprises the steps ofCharacterization of the individual molecules obtained in layer 1>For matrix->The j-th row content, the neighbor node with N (v) v, the UPDATE function of each layer, AGG as aggregation function, READOUT as READOUT function, and the characteristics of the molecule obtained after l iterations
(2) To get h G As input, the implicit character F of the molecule is obtained by means of an MLP model.
(203) Inputting the obtained implicit characteristic F into a classification prediction network c.F-P out ∈R n×2 Obtaining a classification probability result matrix P u 。
(204) Carrying out partial transformation on the obtained molecular structure, carrying out feature extraction model to obtain transformed implicit feature F ', inputting the obtained implicit feature F' into a classification prediction network to obtain a classification probability result matrix P n 。
(205) Inputting the probability results obtained in the step (203) and the step (204) into a probability fluctuation function PFF (Fluctuation probability function), bringing the probability results into a MEAS (measurement) function, analyzing the calculation results of the PFF,
PFF=||p u -p n ||
S out =MEAS{PFF(P u ,P n )}
extracting molecular local structural features S with large influence probability fluctuation degree after modification out This is taken as output.
Step (3): constraint is applied, target optimization is carried out, and the drug properties of molecules in the molecular strengthening process are ensured, wherein the method comprises the following steps:
(301) Using QED values in MEAS functions in training an anti-tumor molecular enhancement model
Performing constraints
MEAS:S out =RF{PFF(P u ,P n )+γQED}
Wherein QED is calculated by adopting RDkit. The RF function maps the result of the probability fluctuation function calculation value subjected to QED constraint to a molecular structure so as to obtain a local molecular structure, thereby ensuring drug-like drug similarity corresponding to the generated local feature.
And (4) substituting the obtained local molecular structure into the structure of the antitumor molecule for modification and optimization. The method comprises the following steps:
(401) Inputting existing positive molecules comprises G= (V, E, X), wherein G represents an input molecule, V represents an atom of the molecule, E represents a molecular chemical bond, and X is a characteristic of each atom of the molecule, so that the input molecule is obtained in a graph manner.
(402) Strengthening the molecules by using the molecular local structural characteristics obtained by the trained model: by an automatic iterative optimization method, the most possible atoms or chemical bonds are continuously searched for to be connected, so that the structure of the molecule is modified, and the molecular local structural characteristics obtained before are applied to the molecule, so that the molecule with better anti-tumor performance is gradually constructed.
And (5) judging the synthesizability by using the existing molecular property reasonable detection tool, and outputting reasonable molecules. And for unreasonable molecules, the method is further reversely optimized as follows:
(501) Establishing an existing reasonable detection tool p: G= (V, E, X) →V epsilon R for molecular property, analyzing the rationality of molecular property and evaluating
Estimating the feasibility of chemistry
(502) The model p mainly comprises a state-action module a and a reorder module Q, and at any time step t, the input of the module a is a state and the output is an action, which is a tensor defined in the feature representation space of all the initial reactants. The module Q calculates the optimal value of the state through the Q network; the Actor iteratively updates parameters of the strategy function by using the optimal value, so as to select actions and obtain feedback and a new state. The environment takes the state, the optimal reaction template and the action as references, and the calculation determines whether the round is ended. The evaluation value V is finally output through the Softmax layer to represent the feasibility score.
Step (6): obtaining reasonable novel anti-tumor molecules, ending the task, and the method comprises the following steps:
(601) By claim 6, in the optimization of the feedback standard penalty, the generated molecules obviously cannot be made into a realistic usable drug, which means that the generated molecules can be manufactured or stably exist in real life by the anti-tumor molecule strengthening model with strong anti-tumor characteristics. This highlights the need for rewarding and punishment using multi-objective optimization when reinforcement learning is used for anti-tumor molecular reinforcement.
(602) Re-emphasizing variable definitions:
X v : dimension R DV Feature vector of node v
h v : dimension R DV State vector of node v
x v1,v2 : dimension R DE Edge (V) 1 ,V 2 ) Feature vectors of (a)
During node operation, a node state update function needs to be definedSo that the node state is iteratively stabilized. Whereas for node V, the state vector transformation may represent the state transformation function of the nodeThe method comprises the following steps:
(603) The back propagation process is emphasized here:
in this step we can find the gradient of the parameter according to the back-propagation step and then optimize it using the gradient descent method. The iterative process is as follows:
…
Δw ij =ηδ j X i
then, by using a Backpropagation mode and using automatic backward propagation under a pytorch frame, the molecular structure is continuously and iteratively optimized, so that the generated reinforced molecular model meets multiple requirements of anti-tumor, molecular rationality and the like. Number of ChEMBL
The optimization of a molecule in the database is shown in FIG. 4.
It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and equivalent changes or substitutions made on the basis of the above-mentioned technical solutions fall within the scope of the present invention as defined in the claims.
Claims (5)
1. An anti-tumor molecule reinforcement learning method based on a graph neural network is characterized by comprising the following steps of:
step 1: dividing the molecules into anti-tumor, namely positive, non-anti-tumor, namely negative categories according to molecular labels in a database, describing each input molecule as an undirected graph, namely a graph matrix, wherein nodes and edges respectively correspond to atoms and chemical bonds, constructing a graph neural network (Graph neural networks) model, taking chemical stability of the generated molecules and molecular drug properties as loss functions, pre-training for node classification tasks,
step 2: inputting the obtained graph into a proposed anti-tumor molecule strengthening model, learning the implicit expression of the graph according to different properties of positive molecules and negative molecules, obtaining the local structural characteristics of the anti-tumor molecules for one-step generation and optimization of the molecules,
step 3: constraint is applied to perform target optimization, so as to ensure the drug properties of molecules in the molecular strengthening process,
step 4: substituting the obtained local molecular structure into the anti-tumor molecule to modify and optimize the structure,
step 5: the existing molecular property reasonable detection tool is used for judging the synthesizability, outputting reasonable molecules, and aiming at unreasonable molecules, further reversely optimizing,
step 6: obtaining reasonable novel anti-tumor molecules and ending the task;
in step 1, according to the molecular labels in the database, dividing the molecules into anti-tumor positive and non-anti-tumor negative categories, describing each input molecule as an undirected graph, namely a graph matrix, wherein nodes and edges respectively correspond to atoms and chemical bonds, constructing a graph neural network (Graph neural networks) model and pre-training, and specifically, the method comprises the following steps:
(101) Graphically inputting molecules, including g= (V, E, X), wherein G represents an input molecule, V represents an atom of the molecule, is in a 1×n One-hot encoding format, E represents a molecular chemical bond, is an n×n-sized adjacent matrix, X is characteristic of each atom of the molecule, is an n×n-sized matrix, and each input molecule is described as an undirected graph, i.e., graph matrix, i.e., G;
(102) According to molecular markers in the databaseThe molecules are classified into anti-tumor positive and non-anti-tumor negative categories by G + Or G-represents;
(103) Constructing a graph neural network (Graph neural networks) model, wherein the input of the graph neural network is an original molecular undirected graph G, the output of the graph neural network is a binary probability matrix P, and the GNN model is pre-trained; in the step (2), inputting the obtained graph into a proposed anti-tumor molecule strengthening model, and learning the implicit expression of the graph according to different properties of positive molecules and negative molecules to obtain local structural characteristics of the anti-tumor molecules, wherein the local structural characteristics are used for one-step generation and optimization of the molecules, and the method comprises the following steps:
(201) Inputting the obtained molecule G into a feature extraction model
f:G→F∈R n×h
The implicit character F of the molecule is extracted,
(202) Feature extraction fG.fwdarw.F.epsilon.R n×h The model comprises (1)
Wherein the method comprises the steps ofCharacterization of the individual molecules obtained in layer 1>For matrix->The j-th row content, N (v) is the neighbor node of v, UPDATE is the UPDATE function of each layer, AGG is the aggregation function, READOUT is the read-out function, and the characteristic (2) of the molecule is obtained after l iterations to obtain h G As input, the molecules are obtained by an MLP modelIs a function of the implicit characteristic F of (a),
(203) Inputting the obtained implicit characteristic F into a classification prediction network
c:F→P out ∈R n×2
Obtaining a binary probability result matrix P u ,
(204) Carrying out partial transformation on the obtained molecular structure, carrying out feature extraction model to obtain transformed implicit feature F ', inputting the obtained implicit feature F' into a classification prediction network to obtain a classification probability result matrix P n ,
(205) Inputting the probability results obtained in the step (203) and the step (204) into a probability change function PFF (Fluctuation probability function), bringing the probability results into a MEAS (measurement) function, and analyzing the calculation result of the PFF
PFF=||p u -p n ||
S out =MEAS{PFF(P u ,P n )},
Extracting molecular local structural features S with large influence probability fluctuation degree after modification out This is taken as output.
2. The method for strengthening learning anti-tumor molecules based on a graph neural network according to claim 1, wherein in the step 3, constraint is applied to optimize targets, so as to ensure drug properties of molecules in the process of strengthening the molecules, and the method comprises the following steps:
(301) Restraint in MEAS function by QED (quantitative estimate ofdrug-token) value when training anti-tumor molecule strengthening model
MEAS:S out =RF{PFF(P u ,P n )+γQED}
The QED is calculated by adopting an RDkit, and the RF function maps the result of the probability fluctuation function calculation value subjected to QED constraint to a molecular structure so as to obtain a local molecular structure, thereby ensuring drug-like drug similarity corresponding to the generated local feature.
3. The method for strengthening learning anti-tumor molecules based on the graph neural network according to claim 1, wherein in the step 4, the obtained local molecular structure is substituted into the structure of the anti-tumor molecules for modification and optimization, and the method comprises the following steps:
(401) Inputting existing positive molecules comprises G= (V, E, X), wherein G represents input molecules, V represents atoms of the molecules, E represents molecular chemical bonds, X is characteristic of each atom of the molecules, thereby obtaining input molecules in a graph manner,
(402) Strengthening the molecules by using the molecular local structural characteristics obtained by the trained anti-tumor molecular strengthening model: by an automatic iterative optimization method, the most possible atoms or chemical bonds are continuously searched for to be connected, so that the structure of the molecule is modified, and the molecular local structural characteristics obtained before are applied to the molecule, so that the molecule with better anti-tumor performance is gradually constructed.
4. The method for reinforcement learning of antitumor molecules based on the graphic neural network according to claim 1, wherein in step 5, the existing molecular property rational detection tool is used to judge the synthesizability, and rational molecules are output, and for unreasonable molecules, the method is further reversely optimized, and the method is as follows:
(501) An existing molecular property reasonable detection tool p & gtG= (V, E, X) & gtV epsilon & lt R & gt is established, molecular property rationality is analyzed, chemical feasibility is evaluated,
(502) The model p mainly comprises a state-action module A and a reward module Q, at any time step t, the input of the module A is a state, the output is an action, the action is tensors defined in the feature representation space of all initial reactants, and the module Q calculates the optimal value of the state through a Q network; the Actor iteratively updates parameters of the strategy function by using the optimal value, further selects actions, obtains feedback and new states, uses the states, the optimal reaction templates and the actions as references in the environment, calculates and determines whether the round is ended, and finally outputs an evaluation value V through a Softmax layer to represent the feasibility score.
5. The method for reinforcement learning of antitumor molecules based on the graphic neural network according to claim 1, wherein the existing molecular property reasonable detection tool is used for judging the synthesizability, outputting reasonable molecules, and further reversely optimizing for unreasonable molecules, and the method is characterized in that in the step (6), a reasonable novel antitumor molecule is obtained, and comprises the following steps:
(601) In the optimization of feedback standard punishment, the generated molecules obviously cannot become a realistic and usable drug, which shows that the generated molecules can be generated to have strong anti-tumor characteristics through an anti-tumor molecule strengthening model, but the generated molecules cannot be ensured to be manufactured or stably exist in real life, which highlights the requirement of using multi-objective optimization for punishment when strengthening the anti-tumor molecules by using reinforcement learning,
(602) Re-emphasizing variable definitions:
X v : dimension R DV The feature vector of node v,
h v : dimension R DV The state vector of node v,
x v1,v2 : dimension R DE Edge (V) 1 ,V 2 ) Is used for the feature vector of (a),
during node operation, a node state update function needs to be definedSo that the node states are iteratively stabilized, while for the state transition function of the node, the transition of its state vector for node V can be expressed as:
(603) The back propagation process is emphasized here:
in this step, the gradient of the parameter is obtained according to the back propagation step, and then the gradient descent method is used for optimization, and the iterative process is as follows:
……
and then, by using a Backpropagation mode and using automatic backward propagation under a pytorch frame, continuously iterating and optimizing a molecular structure, so that the generated reinforced molecular model meets multiple requirements on anti-tumor and molecular rationality.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310015687.5A CN115966266B (en) | 2023-01-06 | 2023-01-06 | Anti-tumor molecule strengthening method based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310015687.5A CN115966266B (en) | 2023-01-06 | 2023-01-06 | Anti-tumor molecule strengthening method based on graph neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115966266A CN115966266A (en) | 2023-04-14 |
CN115966266B true CN115966266B (en) | 2023-11-17 |
Family
ID=87357842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310015687.5A Active CN115966266B (en) | 2023-01-06 | 2023-01-06 | Anti-tumor molecule strengthening method based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115966266B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118280482B (en) * | 2024-06-04 | 2024-08-23 | 浙江大学 | Method and system for predicting antioxidant molecules based on deep learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898730A (en) * | 2020-06-17 | 2020-11-06 | 西安交通大学 | Structure optimization design method for accelerating by using graph convolution neural network structure |
CN112820361A (en) * | 2019-11-15 | 2021-05-18 | 北京大学 | Drug molecule generation method based on confrontation and imitation learning |
CN113140267A (en) * | 2021-03-25 | 2021-07-20 | 北京化工大学 | Directional molecule generation method based on graph neural network |
CN113327651A (en) * | 2021-05-31 | 2021-08-31 | 东南大学 | Molecular diagram generation method based on variational self-encoder and message transmission neural network |
CN114822718A (en) * | 2022-03-25 | 2022-07-29 | 云南大学 | Human oral bioavailability prediction method based on graph neural network |
CN115274007A (en) * | 2022-08-02 | 2022-11-01 | 殷越铭 | Generalizable and interpretable depth map learning method for discovering and optimizing drug lead compound |
CN115526246A (en) * | 2022-09-21 | 2022-12-27 | 吉林大学 | Self-supervision molecular classification method based on deep learning model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BR112021015643A2 (en) * | 2019-02-08 | 2021-10-05 | Google Llc | SYSTEMS AND METHODS TO PREDICT THE OLFACTIVE PROPERTIES OF MOLECULES USING MACHINE LEARNING |
-
2023
- 2023-01-06 CN CN202310015687.5A patent/CN115966266B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112820361A (en) * | 2019-11-15 | 2021-05-18 | 北京大学 | Drug molecule generation method based on confrontation and imitation learning |
CN111898730A (en) * | 2020-06-17 | 2020-11-06 | 西安交通大学 | Structure optimization design method for accelerating by using graph convolution neural network structure |
CN113140267A (en) * | 2021-03-25 | 2021-07-20 | 北京化工大学 | Directional molecule generation method based on graph neural network |
CN113327651A (en) * | 2021-05-31 | 2021-08-31 | 东南大学 | Molecular diagram generation method based on variational self-encoder and message transmission neural network |
CN114822718A (en) * | 2022-03-25 | 2022-07-29 | 云南大学 | Human oral bioavailability prediction method based on graph neural network |
CN115274007A (en) * | 2022-08-02 | 2022-11-01 | 殷越铭 | Generalizable and interpretable depth map learning method for discovering and optimizing drug lead compound |
CN115526246A (en) * | 2022-09-21 | 2022-12-27 | 吉林大学 | Self-supervision molecular classification method based on deep learning model |
Non-Patent Citations (1)
Title |
---|
"A Review of Graph Neural Networks and Their Applications in Power Systems";Wenlong Liao.etc;《JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115966266A (en) | 2023-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ronoud et al. | An evolutionary deep belief network extreme learning-based for breast cancer diagnosis | |
Jin et al. | Bayesian symbolic regression | |
Tomar et al. | Twin support vector machine: a review from 2007 to 2014 | |
Jiang et al. | Protein secondary structure prediction: A survey of the state of the art | |
Tan et al. | Multi-stage dimension reduction for expensive sparse multi-objective optimization problems | |
CN109816000A (en) | A kind of new feature selecting and parameter optimization method | |
Altun et al. | Gaussian process classification for segmenting and annotating sequences | |
CN115966266B (en) | Anti-tumor molecule strengthening method based on graph neural network | |
Şahín et al. | Robust feature selection with LSTM recurrent neural networks for artificial immune recognition system | |
Zhang et al. | A cost-sensitive attention temporal convolutional network based on adaptive top-k differential evolution for imbalanced time-series classification | |
Kiran et al. | Harnessing quantum power using hybrid quantum deep neural network for advanced image taxonomy | |
Demirel et al. | Meta-tuning loss functions and data augmentation for few-shot object detection | |
Raiaan et al. | A systematic review of hyperparameter optimization techniques in Convolutional Neural Networks | |
Cai et al. | A general convergence analysis method for evolutionary multi-objective optimization algorithm | |
WO2023174064A1 (en) | Automatic search method, automatic-search performance prediction model training method and apparatus | |
CN115953609B (en) | Data set screening method and system | |
Peng et al. | Predicting chromosome flexibility from the genomic sequence based on deep learning neural networks | |
Fuchs et al. | Iterative creation of matching-graphs–finding relevant substructures in graph sets | |
Lagutin et al. | Ex2MCMC: Sampling through Exploration Exploitation | |
CN117574309B (en) | Hierarchical text classification method integrating multi-label contrast learning and KNN | |
CN116913379B (en) | Directional protein transformation method based on iterative optimization pre-training large model sampling | |
Zhang et al. | An MCMC-based prior sub-hypergraph matching in presence of outliers | |
Genders et al. | Plant diseases detection and classification using transfer learning | |
Li | Towards Structured Prediction in Bioinformatics with Deep Learning | |
Guo et al. | Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |