CN110600085A - Organic matter physicochemical property prediction method based on Tree-LSTM - Google Patents

Organic matter physicochemical property prediction method based on Tree-LSTM Download PDF

Info

Publication number
CN110600085A
CN110600085A CN201910500140.8A CN201910500140A CN110600085A CN 110600085 A CN110600085 A CN 110600085A CN 201910500140 A CN201910500140 A CN 201910500140A CN 110600085 A CN110600085 A CN 110600085A
Authority
CN
China
Prior art keywords
tree
lstm
organic
molecular
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910500140.8A
Other languages
Chinese (zh)
Other versions
CN110600085B (en
Inventor
申威峰
粟杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910500140.8A priority Critical patent/CN110600085B/en
Publication of CN110600085A publication Critical patent/CN110600085A/en
Application granted granted Critical
Publication of CN110600085B publication Critical patent/CN110600085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Abstract

A Tree-LSTM-based organic matter physicochemical property prediction method comprises a generation prediction model and a physicochemical property prediction part, wherein the generation prediction model comprises the following steps: 1) the molecular structure of the organic matter is normalized and coded and a tree-shaped data structure (molecular feature descriptor) is generated; 2) training a Tree-LSTM model by using molecular feature descriptors and physicochemical property experimental data of organic matters to obtain a sea surface temperature prediction model based on LSTM; the predicted organic physicochemical properties include: and normalizing the molecular structure, coding and inputting the molecular structure into a prediction model to obtain an output result of the physicochemical property of the organic matter. The invention can lead the computer to automatically extract the relationship between the molecular structure and the physical and chemical properties, is more suitable for learning the molecular structure information of various organic matters and can obtain better prediction results.

Description

Organic matter physicochemical property prediction method based on Tree-LSTM
Technical Field
The invention relates to the field of chemistry C07, in particular to a prediction method of chemical substance quantitative structure correlation properties based on an artificial intelligence technology.
Background
The physicochemical property is basic data closely related to chemistry and chemical engineering, critical property, boiling point, generated heat, octanol water distribution coefficient and the like are closely related to scientific research and production practice of chemistry and chemical engineering, and the scientific and reasonable predicted value of various physicochemical properties can reduce the measurement work of the physicochemical properties and save a large amount of manpower and material resources. The data acquisition of physicochemical properties is usually difficult to develop due to harsh experimental determination conditions or objective factors such as easy decomposition of the measured substance, and is currently mainly estimated by using a group contribution method and a topological coefficient method based on multiple linear regression. However, the group contribution method and the topological coefficient method require manual extraction of molecular structure characteristics before prediction, so that the application range of the two methods is limited.
The Tree-LSTM recurrent neural network is improved on the basis of an LSTM (Long-Short Term Memory) recurrent neural network, the neural network can learn a dependency relationship which is more complex than a sequence structure, and can autonomously learn the contribution of a molecular Tree topology structure to prediction data for input data, particularly the neural network overcomes the defect that other neural networks cannot reproduce atom connection relationships in molecules, and is more suitable for mining the implicit relationship between a molecular structure and physicochemical properties of the molecular structure. The existing group contribution method needs to disassemble molecules into different groups (molecular substructure fragments) and needs to apply multivariate linear fitting to realize the prediction of the physicochemical properties of the organic matters. The group contribution method predicts that various group contribution methods have different resolving schemes, and some molecules cannot find a proper resolving scheme, so that the prediction is deviated or cannot be completed. The existing topological index rule is limited by the complex calculation of the topological index and the incapability of intuitively representing the local structure of a molecule, so that the existing topological index rule does not have the capability of predicting more extensive physicochemical properties. Therefore, a method for predicting the physicochemical properties of organic matters by using a Tree-LSTM recurrent neural network system alone does not appear.
Disclosure of Invention
The invention provides a method for predicting the physicochemical property of an organic matter based on Tree-LSTM, which solves the technical problems of the prior art that the prediction range is not wide, the coverage substance is not wide and the prediction precision is not high.
In order to solve the problems, the invention adopts the following technical scheme:
the method comprises the following steps: step A, generating a prediction model; b, predicting two parts of physical and chemical properties;
the step A comprises the following steps:
a1, acquiring experimental data of the physicochemical properties of the organic matters and molecular structure information of the organic matters, and capturing a large amount of data from various databases by using a web crawler technology;
a2 normalizing the structure of a single organic molecule (by a graph normalization algorithm), traversing each atom in the single organic molecule and generating a corresponding atom feature descriptor, ordering all the atom feature descriptors of the single organic molecule according to a lexicographic order, and taking the smallest atom feature descriptor as a molecule feature descriptor;
a3 generating molecular feature descriptors representing each representative molecular structure normalized graph and corresponding linear codes according to the obtained all organic molecular structures in the step A2;
a4 splitting all organic molecules into various chemical bonds, arranging character strings representing the chemical bonds according to each molecule, and generating word vectors for the character strings by adopting a word embedding algorithm;
a5 builds a neural network model based on Tree-LSTM, and loads the physicochemical data obtained from A1 and the molecular structure data processed by A2-A4, and the Tree-LSTM automatically adapts to the topological shape of the molecular structure normalized graph. Manually adjusting various hyper-parameters and training a model, and preferentially selecting parameters in the training process to obtain a Tree-LSTM-based organic matter physicochemical prediction model;
the step B comprises the following steps:
b1 processing the molecular structure of organic matter without experimental data of certain physicochemical properties by A2-A4 steps, loading the generated feature descriptor and code into the physicochemical property prediction model obtained by A5, and inputting the molecular feature descriptor to predict the data of unknown physicochemical properties.
As a further refinement, said step a5 comprises the following:
a51: building a Tree-LSTM model under a Linux system or a Windows system;
a52: setting the input dimension of Tree-LSTM and the length of input data; a53: setting the data quantity proportion of the Tree-LSTM training set and the test set; a54: setting a Tree-LSTM model optimizer and a learning rate; a55: setting the width of hidden layer neuron; a56: setting the iteration times of the model; a57: and continuously adjusting parameters, checking the convergence degree of the model according to the model loss, and preferentially selecting high-convergence parameters to form a physical and chemical property prediction model based on Tree-LSTM.
Drawings
FIG. 1 is a flow chart of the process for predicting the physicochemical properties of organic substances according to the present invention;
FIG. 2 is a computational graph of the Tree-LSTM recurrent neural network in predicting the properties of acetodoxime substance;
FIG. 3 is a graph of the prediction effect of the Tree-LSTM physicochemical property prediction model on the critical temperature of organic matters, wherein x represents a predicted value, and a straight line represents an actual value.
Figure 4 is a molecular feature descriptor generation exemplified by the acetodoxime substance.
FIG. 5 is a coding rule of molecular characterization illustrating the meaning of each bit of code.
Detailed Description
The invention will be described in detail with reference to the accompanying drawings and specific examples, it being understood that the examples described below are for the purpose of facilitating an understanding of the invention only and are not intended to limit the invention itself in any way.
The invention provides a method for predicting the physicochemical property of an organic matter based on Tree-LSTM, which comprises the following two steps as shown in figure 1: step A, generating a prediction model; b, predicting two parts of physical and chemical properties;
step A, generating a prediction model:
a1 obtains experimental data of the physicochemical properties of the organic matter and the molecular structure information of the organic matter, and captures a large amount of data from various databases by using a web crawler technology.
The physicochemical properties of the a11 organic substance mainly include: critical properties, normal boiling point, transfer properties, spontaneous combustion point, flash point, toxicity, octanol water partition coefficient, biochemical activity, etc.
The molecular structure information of A12 mainly takes SMILES expression, SMARTS expression, MOL file and SDF file as carriers.
A2 standardizes the structure of a single organic molecule, traverses each atom in the single organic molecule and generates corresponding atom feature descriptors, sorts all the atom feature descriptors of the single organic molecule according to a lexicographic order, and takes the smallest atom feature descriptor as a molecule feature descriptor and encodes the molecule feature descriptor.
A21, generating a specification graph from the two-dimensional topological graph of the organic molecule by a graph normalization algorithm in graph theory to realize isomorphic comparison of the molecular graph, for example, Nauty and Faulon graph normalization algorithms can be adopted.
The A22 encoding method is as follows:
the first method, which directly uses the molecular feature descriptor outputted by the Faulon normalization algorithm as the coding of the organic matter, is exemplified in fig. 4.
The second method, encoding the molecular feature descriptors in a linear encoded format, is exemplified in table 1.
And A3 generating a molecular feature descriptor and a corresponding code of each molecule according to the obtained molecular structure information of all the organic matters in the step A2.
A4 splits all organic molecules into various chemical bonds, arranges character strings representing the chemical bonds according to each molecule, and generates word vectors by using a word embedding algorithm for the character strings.
A5 builds a Tree-LSTM-based neural network model, loads physical and chemical data obtained from A1 and molecular structure data processed by A2-A4, continuously adjusts parameters, and preferentially selects the parameters to obtain a Tree-LSTM-based organic matter physical and chemical prediction model.
The step B comprises the following steps:
b1 processing the molecular structure of the organic matter without some physicochemical property experimental data by adopting the steps of A2-A4, loading the generated characteristic descriptors and codes into a physicochemical property prediction model to obtain the data of unknown physicochemical property;
step a4 further includes the following:
a41: traversing each molecule in the database, traversing the connected chemical bond and atom with each atom in each molecule as a starting point, forming a character string like 'A-B', and recording to form original data. Description of the drawings: "A" represents the symbol of the element of the atom A, "B" represents the symbol of the element of the atom B, and "-" represents the type of chemical bond between the atom A and the atom B.
A42: splitting a character string in the form of 'A-B' in original data to form a sub-character string set of three combination modes: the combination is as follows: "A" and "-B", in combination two: "A-" and "B", in combination three: "A", "-", and "B".
A43: and (3) building a neural network based on a skip-gram algorithm under a Linux system or a Windows system, and obtaining an embedded vector representing each character string in the character string set obtained by A42.
As a further refinement, said step a5 comprises the following:
a51: building a Tree-LSTM model under a Linux system or a Windows system;
a52: the feature descriptors or linear encodings of each numerator are parsed into a tree-like data structure and a corresponding embedded vector obtained by a4 is matched for each node in the tree-like structure (for each atom in the numerator).
A52: setting the input dimension of Tree-LSTM and the length of input data; the input dimension in the present invention is 1 and the length is 50.
A53: setting the data quantity proportion of the Tree-LSTM training set and the test set; the ratio in the present invention is 4: 1.
A54: setting a Tree-LSTM model optimizer and a learning rate; the method adopts an Adam algorithm optimizer, and the learning rate is 0.001:
a55: setting the width of each hidden layer neuron;
a56: setting the iteration times of the model;
a57: and adjusting the number of the cryptomelanic ganglion points under the same iteration number, adjusting the iteration number under the same number of the cryptomelanic ganglion points, checking the convergence degree of the model according to the overall loss and the iteration loss of the model, and preferentially selecting a high-convergence parameter to form a physical and chemical property prediction model based on Tree-LSTM.
The structure of the Tree-LSTM neural network is shown in FIG. 2.
Tree-LSTM has two mathematical models, one is a sub-node summation model, and the other is a sub-node independent model.
The core of Tree-LSTM is the control unit state c, the control includes forgetting gate fjAnd input gate ijAnd an output gate oj. Current node j, forget gate fjC in charge of controlling how much c of child node is saved to current nodej(ii) a Input door ijResponsible for controlling how much the instant state of the current node is input into the current unit state cj(ii) a Current input unit state ujControlling how much new node information is added to the output; output gate ojIs responsible for controlling the current cell state cjHow many hidden layer outputs h as current nodej. The calculation formulas of the child node summation model are respectively as follows:
fjk=σ(W(f)xj+U(f)hk+b(f)) (2)
cj=ij·uj+fj (6)
hj=oj·tanh(cj) (7)
wherein, W(f)、W(i)、W(o)Weight matrices for forgetting gate, input gate and output gate, respectively, b(f)、b(i)、b(o)Are respectively offset terms of a forgetting gate, an input gate and an output gate, and sigma is a sigmoid function. The following is a child node independent modelCalculating the formula:
cj=ij·uj+fj (14)
hj=oj·tanh(cj) (15)
the difference between the two models is whether or not h is a child nodejlThe addition is carried out, the sub-node independent model adds a parameter to hjl of each sub-node, and the sub-node addition model is h of the sub-nodejlSum ofProviding parameters for training.
The Tree-LSTM recurrent neural network structure is shown in fig. 3. The inputs to the LSTM include: cell state c of child nodejlHidden layer output value h of child nodejlInput value x of the current nodej(ii) a The output of the LSTM includes: cell state c at the present timejHidden layer output value h of LSTM at current timej
Wherein the current input unit state ujBy input x of the current nodejHidden layer output value h of child nodejl(if it is a child node summation model, here the hidden layer output value h of the child nodejlSum of) The calculation formula is shown in formula (4) or (12).
Wherein, W(u)Is a weight matrix of the states of the input cells, b(u)Is a bias term for the input cell state, and tanh is the hyperbolic tangent function. Current cell state cjBy forgetting door fj(including child node Unit State c)jlChild node forgetting door fjl) And input gate ijAnd the currently input cell state ujThe joint decision is shown in the formula (4) or (12), wherein the symbol is multiplied by the element. Hidden layer output value h of current nodejFrom an output gate ojAnd the current cell state cjThe calculation formula is shown in (7) or (15).
The Tree-LSTM neural network output is determined by a single-layer or multi-layer neural network, for example, the single-layer neural network is used as an output layer, and the calculation formula is as follows:
pi=w*hij+b (16)
property p of the ith componentiThe Tree-LSTM neural network output hj of the root node of the Tree structure represented by the molecular signature of the component is related, w and b are trainable parameters.
In the invention, Mean Square Error (MSE) or Mean Absolute Error (MAE) is used as a loss function (loss), and the calculation formula is as follows:
wherein N is the number of samples, xexpAs an observed value, xprepIs a predicted value.
Examples of the experiments
The effect of the Tree-LSTM based physicochemical property prediction method will be exemplified below. Taking the critical temperature of an organic matter as an example, the property is used as basic data of various thermodynamic models and physical property estimation models, and has certain practical and representative significance for predicting the property.
Obtaining experimental data of critical temperature and molecular structure information of corresponding substances, wherein 1759 organic matters are counted, 1407 substances are used as a training set, and 352 substances are used as a testing set.
The construction principle of the molecular feature descriptor is illustrated by taking the substance acetaldoxime in the sample as an example, and the detailed description is shown in fig. 4. The molecular feature descriptor is a data structure for storing molecular structure information, which is developed by selecting a certain atom in a molecule as a starting point and developing the molecule according to a tree structure. The acetaldoxime in this example starts with the root atom, which is the carbon atom marked with a zero number. Starting from this root atom C0, a predetermined distance (or height) is searched down and the atoms encountered on the path and the type of chemical bond attached to the atom are recorded to record the characteristics of the molecule. Starting from the root atom, all atoms in the molecule are traversed to obtain an atom feature descriptor. If different root atoms are selected, different atom feature descriptors are generated and are arranged in descending order according to the dictionary order, wherein the first atom is a molecular feature descriptor. Fig. 4 depicts using acetodoxime as an example: (A) molecular structure (B) the tree expansion and atomic features of the molecular structure are described as atomic feature descriptors at different heights (C) from height 0 and height 1. Wherein the sub-atoms of an atom are indicated by nested brackets, and when the type of chemical bond is not specified, indicates that there is a single bond between the atoms in the atomic feature descriptor. Otherwise, the chemical bond is represented as follows ("═ is a double bond;" # "is a triple bond;": is an aromatic bond.)
In order to conveniently store the molecular feature descriptors, the invention develops a linear code to represent the tree-shaped expansion structure of the molecules, the linear code of the molecular feature descriptors with various depths, such as acetodoxime, is shown in a table 1, each atom is separated by an I in a character string, and the meaning of each number and letter in the atom is shown in a table 5. The first atom and the root atom have a current depth of 0, and are denoted by "S", and since there is no parent atom, the parent atom is encoded as "S", and since there is no chemical bond to the parent atom, the parent atom is also encoded as "S".
1759 organic substances are converted into molecular feature descriptors and are linearly encoded. Before being input into the neural network, the substances are analyzed into a tree structure, and each node (atom) is associated with the embedded vector obtained in the step A43. For each molecule in the sample, each atom corresponds to each node in the Tree-LSTM neural network, and the embedded vector of each atom is the input vector of the node. In the case of 300 iterations, the number of output layer nodes is continuously adjusted, and finally 128 output layer nodes are used as a better value in this example. The Tree-LSTM neural network structure is determined by the molecules of each organic substance, and is a dynamic neural network, adapting to the topological structure of different molecules. In this example, the learning rate was 0.008 in the first 300 trains, and then the learning rate was adjusted to 0.00001 for 5000 trains. To prevent overfitting, the calculation is ended in advance when the loss function value no longer decreases. And finally, obtaining the prediction results in the table 3, wherein the higher the coincidence degree of the experimental values and the predicted values in the table is, the better the prediction effect is. Table 2 shows the statistical evaluation parameters of the Tree-LSTM neural network for the training and prediction of critical temperature of organic matter. The predicted value in the x table of fig. 3, the straight line represents the experimental value, and it can be seen that for most data points, the invention obtains better prediction effect by using Tree-LSTM.
Table 1 example of linear encoding of molecular feature descriptors
TABLE 2 statistical parameters for organic matter critical temperature training and prediction
TABLE 3 partial prediction of critical temperature of organic substances
Comparing the present invention with representative methods of the group contribution method, the Joback and Constantinou-Gani (CG) methods, under the same list of materials, the following results were obtained, as shown in Table 4:
TABLE 4 comparison of the prediction ability of the present invention with classical group contribution method
The list of materials used for comparison in Table 4 contains 460 materials of which only 352 can be predicted by the Joback method, and when the prediction method of the present invention is used to predict the 352 materials, the prediction method of the present invention shows the method due to Joback. The amount of predictable substance predicted by the CG method is also less than and slightly less than the present invention. When the present invention predicts that the substance is all the substances in a single substance, 452 substances therein can be covered and an acceptable accuracy achieved. The superscript a indicates all predictable species and the superscript b indicates species with a number of carbon atoms greater than 3.

Claims (2)

1. A Tree-LSTM-based organic matter physicochemical property prediction method is characterized in that a molecular graph of an organic matter is converted into a canonical graph so as to be convenient for computer recognition and learning, so that a computer can capture structural features of molecules, and the computer can correlate the features with the organic matter and physical or chemical properties, and finally the prediction of the material properties is realized, and the process comprises the following steps: step A, producing a prediction model; b, predicting two parts of physical and chemical properties;
the step A comprises the following steps:
a1, acquiring experimental data of the physicochemical properties of the organic matters and molecular structure information of the organic matters, and capturing a large amount of data from various databases by using a web crawler technology;
a2 normalizing the structure of a single organic molecule (by a graph normalization algorithm), traversing each atom in the single organic molecule and generating a corresponding atom feature descriptor, ordering all the atom feature descriptors of the single organic molecule according to a lexicographic order, and taking the smallest atom feature descriptor as a molecule feature descriptor;
a3 generating molecular feature descriptors representing each representative molecular structure normalized graph and corresponding linear codes according to the obtained all organic molecular structures in the step A2;
a4 splitting all organic molecules into various chemical bonds, arranging character strings representing the chemical bonds according to each molecule, and generating word vectors for the character strings by adopting a word embedding algorithm;
a5 builds a neural network model based on Tree-LSTM, and loads the physicochemical data obtained from A1 and the molecular structure data processed by A2-A4, and the Tree-LSTM automatically adapts to the topological shape of the molecular structure normalized graph. Manually adjusting various hyper-parameters and training a model, and preferentially selecting parameters in the training process to obtain a Tree-LSTM-based organic matter physicochemical prediction model;
the step B comprises the following steps:
b1 processing the molecular structure of organic matter without experimental data of certain physicochemical properties by A2-A4 steps, loading the generated feature descriptor and code into the physicochemical property prediction model obtained by A5, and inputting the molecular feature descriptor to predict the data of unknown physicochemical properties.
2. The method for predicting the physicochemical property of organic substances based on Tree-LSTM according to claim 1, wherein the step A5 comprises the following steps:
a51: building a Tree-LSTM-based neural network under a Linux system or a Windows system; a52: setting the input dimension of Tree-LSTM and the length of input data; a53: setting the data quantity proportion of the Tree-LSTM training set and the test set; a54: setting a Tree-LSTM model optimizer and a learning rate; a55: setting the width of hidden layer neuron; a56: setting the iteration times of the model; a57: and continuously adjusting parameters, checking the convergence degree of the model according to the model loss, and preferentially selecting high-convergence parameters to form a physical and chemical property prediction model based on Tree-LSTM.
CN201910500140.8A 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method Active CN110600085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910500140.8A CN110600085B (en) 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910500140.8A CN110600085B (en) 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method

Publications (2)

Publication Number Publication Date
CN110600085A true CN110600085A (en) 2019-12-20
CN110600085B CN110600085B (en) 2024-04-09

Family

ID=68852617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910500140.8A Active CN110600085B (en) 2019-06-01 2019-06-01 Tree-LSTM-based organic matter physicochemical property prediction method

Country Status (1)

Country Link
CN (1) CN110600085B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524557A (en) * 2020-04-24 2020-08-11 腾讯科技(深圳)有限公司 Inverse synthesis prediction method, device, equipment and storage medium based on artificial intelligence
CN111710375A (en) * 2020-05-13 2020-09-25 中国科学院计算机网络信息中心 Molecular property prediction method and system
CN111899807A (en) * 2020-06-12 2020-11-06 中国石油天然气股份有限公司 Molecular structure generation method, system, equipment and storage medium
CN111899814A (en) * 2020-06-12 2020-11-06 中国石油天然气股份有限公司 Method, equipment and storage medium for calculating physical properties of single molecule and mixture
CN115171807A (en) * 2022-09-07 2022-10-11 合肥机数量子科技有限公司 Molecular coding model training method, molecular coding method and molecular coding system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150017694A1 (en) * 2008-11-06 2015-01-15 Kiverdi, Inc. Engineered CO2-Fixing Chemotrophic Microorganisms Producing Carbon-Based Products and Methods of Using the Same
US20180137389A1 (en) * 2016-11-16 2018-05-17 Facebook, Inc. Deep Multi-Scale Video Prediction
CN108108836A (en) * 2017-12-15 2018-06-01 清华大学 A kind of ozone concentration distribution forecasting method and system based on space-time deep learning
US20180211156A1 (en) * 2017-01-26 2018-07-26 The Climate Corporation Crop yield estimation using agronomic neural network
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning
CN109476721A (en) * 2016-04-04 2019-03-15 英蒂分子公司 CD8- specificity capturing agent, composition and use and preparation method
US20190114320A1 (en) * 2017-10-17 2019-04-18 Tata Consultancy Services Limited System and method for quality evaluation of collaborative text inputs

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150017694A1 (en) * 2008-11-06 2015-01-15 Kiverdi, Inc. Engineered CO2-Fixing Chemotrophic Microorganisms Producing Carbon-Based Products and Methods of Using the Same
CN109476721A (en) * 2016-04-04 2019-03-15 英蒂分子公司 CD8- specificity capturing agent, composition and use and preparation method
US20180137389A1 (en) * 2016-11-16 2018-05-17 Facebook, Inc. Deep Multi-Scale Video Prediction
US20180211156A1 (en) * 2017-01-26 2018-07-26 The Climate Corporation Crop yield estimation using agronomic neural network
US20190114320A1 (en) * 2017-10-17 2019-04-18 Tata Consultancy Services Limited System and method for quality evaluation of collaborative text inputs
CN108108836A (en) * 2017-12-15 2018-06-01 清华大学 A kind of ozone concentration distribution forecasting method and system based on space-time deep learning
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
秦琦枫等: "深度神经网络在化学中的应用研究", 《江西化工》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524557A (en) * 2020-04-24 2020-08-11 腾讯科技(深圳)有限公司 Inverse synthesis prediction method, device, equipment and storage medium based on artificial intelligence
CN111524557B (en) * 2020-04-24 2024-04-05 腾讯科技(深圳)有限公司 Inverse synthesis prediction method, device, equipment and storage medium based on artificial intelligence
CN111710375A (en) * 2020-05-13 2020-09-25 中国科学院计算机网络信息中心 Molecular property prediction method and system
CN111710375B (en) * 2020-05-13 2023-07-04 中国科学院计算机网络信息中心 Molecular property prediction method and system
CN111899807A (en) * 2020-06-12 2020-11-06 中国石油天然气股份有限公司 Molecular structure generation method, system, equipment and storage medium
CN111899814A (en) * 2020-06-12 2020-11-06 中国石油天然气股份有限公司 Method, equipment and storage medium for calculating physical properties of single molecule and mixture
CN115171807A (en) * 2022-09-07 2022-10-11 合肥机数量子科技有限公司 Molecular coding model training method, molecular coding method and molecular coding system
CN115171807B (en) * 2022-09-07 2022-12-06 合肥机数量子科技有限公司 Molecular coding model training method, molecular coding method and molecular coding system

Also Published As

Publication number Publication date
CN110600085B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN110600085B (en) Tree-LSTM-based organic matter physicochemical property prediction method
Zhang et al. An end-to-end deep learning architecture for graph classification
Peel et al. Detecting change points in the large-scale structure of evolving networks
CN113299354B (en) Small molecule representation learning method based on transducer and enhanced interactive MPNN neural network
Burnaev et al. Efficient design of experiments for sensitivity analysis based on polynomial chaos expansions
CN113722877A (en) Method for online prediction of temperature field distribution change during lithium battery discharge
Chandra et al. A multivariate time series clustering approach for crime trends prediction
Li et al. Four Methods to Estimate Minimum Miscibility Pressure of CO2‐Oil Based on Machine Learning
CN114861928B (en) Quantum measurement method and device and computing equipment
Wang et al. Time-weighted kernel-sparse-representation-based real-time nonlinear multimode process monitoring
WO1997025676A1 (en) Time-series signal predicting apparatus
CN115759461A (en) Internet of things-oriented multivariate time sequence prediction method and system
Tuli et al. FlexiBERT: Are current transformer architectures too homogeneous and rigid?
Li et al. Deep reliability learning with latent adaptation for design optimization under uncertainty
CN116894180B (en) Product manufacturing quality prediction method based on different composition attention network
CN113674807A (en) Molecular screening method based on deep learning technology qualitative and quantitative model
CN116302088B (en) Code clone detection method, storage medium and equipment
CN107220483B (en) Earth temperature mode prediction method
Hellström et al. High-dimensional neural network potentials for atomistic simulations
Li et al. Using modified lasso regression to learn large undirected graphs in a probabilistic framework
McWilliams et al. A PRESS statistic for two-block partial least squares regression
Ihme et al. On the optimization of artificial neural networks for application to the approximation of chemical systems
CN116884536B (en) Automatic optimization method and system for production formula of industrial waste residue bricks
CN111563623B (en) Typical scene extraction method and system for wind power system planning
CN113779884B (en) Detection method for service life of recovered chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant