CN111798935A - Universal compound structure-property correlation prediction method based on neural network - Google Patents

Universal compound structure-property correlation prediction method based on neural network Download PDF

Info

Publication number
CN111798935A
CN111798935A CN201910280668.9A CN201910280668A CN111798935A CN 111798935 A CN111798935 A CN 111798935A CN 201910280668 A CN201910280668 A CN 201910280668A CN 111798935 A CN111798935 A CN 111798935A
Authority
CN
China
Prior art keywords
neural network
prediction method
molecular
layer
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910280668.9A
Other languages
Chinese (zh)
Inventor
王晓华
杨民民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pharmablock Sciences (nanjing) Inc
Original Assignee
Pharmablock Sciences (nanjing) Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pharmablock Sciences (nanjing) Inc filed Critical Pharmablock Sciences (nanjing) Inc
Priority to CN201910280668.9A priority Critical patent/CN111798935A/en
Publication of CN111798935A publication Critical patent/CN111798935A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a universal compound structure-property correlation prediction method based on a neural network, which comprises the following steps: step 1, transforming a molecular descriptor into a characteristic vector form to construct a data set; step 2, dividing the data set into a training set and a testing set, sending the training set into a fully-connected neural network for training, and determining parameters of a fully-convolutional model; and 3, transforming the molecular descriptors to be predicted into a characteristic vector form, and predicting according to the full convolution model. The prediction method has higher accuracy in the prediction of the solubility.

Description

Universal compound structure-property correlation prediction method based on neural network
Technical Field
The invention relates to a construction method of a prediction model, in particular to a universal compound structure-property correlation prediction method based on a neural network.
Background
Solubility is an essential property of compounds, particularly small molecule compounds. In general, for different compounds, different compounds will have different solubilities under the same conditions in the same solution due to the structure and spatial arrangement of the compounds themselves. The determination of the solubility plays an important role in chemical processes, process preparation, chemical substance migration in medicines and environments and the like in the chemical industry.
However, since the variety of compounds is large in reality and different solutions require different storage and measurement conditions, it is impractical to measure the solubility of all compounds by practical methods, and it is urgent to establish a universal solubility measurement method which is accurate, reliable and fast based on existing data.
Such methods may be collectively referred to as quantitative structure-property correlation prediction (hereinafter, abbreviated as QSPR). QSPR is the latest solubility calculation and prediction method at present, that is, a fitting model is established according to the quantitative relationship between the calculated molecular structure parameter (molecular descriptor) of the compound and a specific property (such as solubility) for prediction, and the QSPR research is generally divided into three steps:
(1) calculating a molecular descriptor;
(2) establishing a prediction model;
(3) and (5) analyzing the prediction accuracy.
Disclosure of Invention
The invention aims to provide a universal compound structure-property correlation prediction method based on a neural network, which has higher accuracy in the prediction of solubility.
In order to achieve the above purpose, the solution of the invention is:
a universal compound structure-property correlation prediction method based on a neural network comprises the following steps:
step 1, transforming a molecular descriptor into a characteristic vector form to construct a data set;
step 2, dividing the data set into a training set and a testing set, sending the training set into a fully-connected neural network for training, and determining parameters of a fully-convolutional model;
and 3, transforming the molecular descriptors to be predicted into a characteristic vector form, and predicting according to the full convolution model.
The specific content of the step 1 is as follows: the molecular descriptor is divided into 2 parts, which are respectively corresponding to the molecular fingerprint calculated by the molecular structural formula and the general descriptor, and the 2 parts are connected to form a characteristic vector.
In step 2, the architecture of the fully-connected neural network is as follows: the first layer is an input layer, then a plurality of convolution layers are arranged, finally, a full convolution network with the convolution kernel size of [1,1] is used as a classification network, and the mean value of each layer is used for representing the category represented by the layer.
In the step 2, a back propagation algorithm or a gradient descent algorithm is adopted to train the data set.
In the step 2, the training set and the test set are divided in the following manner: 90% of the data set was taken as the training set and 10% of the data set was taken as the test set.
After the scheme is adopted, the descriptor is universal in design, calculation and combination, the descriptor can be considered to have strong universality, and various description methods can be universal and combined. The prediction model has learning and prediction capabilities for solubility of indefinite length due to the adoption of the convolutional neural network based on deep learning, so that the model provided by the invention has great universality.
The invention has the following characteristics:
(1) the length of the most common molecular structure expression formula SMILE at present can be unlimited, and the method has universality;
(2) descriptor data can be freely added, and only the same descriptor is added to the same data set, so that the method has universality;
(3) the descriptor characteristics are automatically extracted by the convolutional neural network, and meanwhile, the descriptor characteristics are trained by combining the labels, so that the method has simplicity;
(4) the convolutional neural network is ingenious in design, can achieve perfect accuracy rate in a very short training period, and has high practicability;
(5) the method is originally applied to QSPR, and the accuracy rate reaches the advanced level of the world.
Drawings
FIG. 1 is a schematic diagram of the structure of building molecular descriptors;
FIG. 2 is a schematic diagram of a fully connected layer classifier;
FIG. 3 is a diagram of a similar molecule structure;
FIG. 4 is a schematic diagram of a full convolution neural network architecture;
FIG. 5 is a schematic diagram of a full convolution neural network structure with parameters.
Detailed Description
1. Molecular descriptors
Molecular descriptors are the basis for QSPR, which refers to the nature and measure of molecules that can be represented numerically in one or more aspects, as understood and analyzed by computers. The molecular descriptors can be direct numerical representations of the physicochemical properties of the molecules, or can be calculated from a variety of data indices according to a particular algorithm. The former includes physical and chemical indexes of molecular compound such as boiling point and melting point, and the latter relates more to the outer energy of molecules, the outer electronic charge distribution between bonds, and the like.
The calculation methods of the molecular descriptors are various, and almost six thousand of various physicochemical parameters covering the characteristic characters and the structural characteristics of the compounds can be calculated at the present stage of different software and software packages.
RDkit is a free source of chemical informatics and machine learning software, and provides APIs of C + + and Python, wherein the API carries a specific molecular descriptor calculation method, and the calculation result converts a molecular structural formula into a vector group with 279 characteristic representations.
As shown in fig. 1, the molecular descriptor in the model is divided into 2 parts, which are respectively the molecular fingerprint calculated corresponding to the molecular structural formula and the general descriptor, and then the two parts are connected through a connection operation to form a feature vector for representing the feature molecule.
2. Full convolution neural network model
2.1 fully-connected neural networks
A general neural network is generally configured by basic modules such as convolutional layers, active layers, and full link layers. The convolutional layer is responsible for feature extraction, and the convolutional kernel is an important component of the convolutional layer and essentially performs feature extraction on signals of all levels. The convolutions cooperate to form convolutional layers, which are connected to the previous convolutional layer and the next convolutional layer or the fully-connected layer by neurons on the kernel. After the feature of the previous layer is convoluted by a learnable convolution kernel, the corresponding feature graph is output through an activation function, and the corresponding feature graph is combined into the values of a plurality of feature graphs in convolution.
Figure BDA0002021543740000041
Figure BDA0002021543740000042
In the formula (1), the reaction mixture is,
Figure BDA0002021543740000043
the output result of the last convolutional layer convolution kernel, which passes through the convolution kernel in the current convolutional layer
Figure BDA0002021543740000044
Convolution is carried out, product calculation is carried out, and then offset is carried out
Figure BDA0002021543740000045
And adding to obtain the final product. The F function is generally called an activation function and is constructed using a tanh or relu function. The structure thus formed is shown in fig. 2.
The full-connection layer is used for classifying data, the special effect of the data extracted from the previous layer is subjected to one-dimensional transformation to form a one-dimensional vector, the one-dimensional vector is connected with the full-connection kernel number determined according to the experience of a designer, and the result is calculated in a matrix calculation mode. The final result is the final classification calculation using softmax or sigmoid function as the activation function. And outputting the final classification according to the result probability.
In addition, in order to train the neural network, a downsampling layer and a regularization layer are generally added between convolution layers to process an activation result, so that the fitting result is improved, overfitting is reduced, and the training speed is increased.
2.2 full convolution neural network architecture
The classification function of the traditional neural network is completed by a final full-connection layer, namely a function of mapping a feature result extracted by a convolutional layer to a specific mark space. This has the advantage of facilitating the calculation by using a softmax or sigmoid function after a matrix calculation. However, this results in a large amount of redundant parameters, which is very poor for data reuse. Another significant drawback of the fully-connected layer is that when the input data is reconstructed into one-dimensional vectors, the data structure between the vectors is lost. In addition, for the set fully-connected layer, because the intrinsic calculation method is matrix calculation, for the vectors at the input end, the input must use the vectors with the same dimension so as to ensure a uniform calculation.
The conventional convolutional neural network obtains the judgment of the rotation invariance of the image recognition through the pooling effect of the space. However, when applied to the field of chemistry, the properties of the molecules and the changes in the positions of the structures are quite different, and as shown in FIG. 3, the nature of their determination is quite different even if it is slightly different.
The authors in this document have originally used a new fully convolutional based neural network to classify molecular description feature vectors. Starting from convolutional layer feature extraction, obtaining a feature map with the highest corresponding degree, fitting the features of the corresponding layer in a global pooling manner, and calculating the corresponding strongest features as the corresponding classification results, as shown in fig. 4 and 5, the specific architecture is as follows: the first layer is an input layer which is converted into molecular fingerprints; followed by several convolutional layers; and finally, using a full convolution network with a convolution kernel size of [1,1] as a classification network, and using the average value of each layer to represent the category represented by the layer.
2.3 training of full convolution neural networks
The training of convolutional neural networks is actually to find a set of optimal solutions in a data space that is assumed to exist, so that the value of the calculated objective function (loss function) is minimized. In theory, the data space is infinite, and the combination of solutions is infinite, so that it is impossible to artificially set a set of optimal solutions.
Common neural network training methods are mainly a Back Propagation (BP) algorithm and a Gradient Descent (GD) algorithm. The same is true of the full convolution neural network herein. According to input data, after forward propagation, calculating errors between the input data and actual values through a loss function, propagating the errors backwards layer by layer, calculating partial derivatives of the errors to each convolution kernel value, and updating weights and deviations according to the partial derivatives.
Figure BDA0002021543740000051
Where, conv is the convolution operation,
Figure BDA0002021543740000052
in order to be the parameters of the convolution kernel,
Figure BDA0002021543740000053
the result is output for the last convolution kernel, where the convolution kernel value after the current convolution kernel is rotated by 180 degrees is used as the weight multiplier.
2.4, discussion of some details of full convolution neural network
The neural network used herein is a full convolution neural network, and currently, researches find that there are 3 main factors affecting the convolution neural network: the number of convolutional layers, the number of convolutional kernels, and the organization of the neural network. In practical application, Facebook's Resnet successfully superimposes the input layer and the residual error together, and successfully solves the problem that gradient propagation disappears in the process of increasing the number of convolution layers. Google's incorporation and subsequent versions design a network with a good local topology, i.e., perform multiple convolution operations or pooling operations on the input image in parallel, and stitch all output results into a very deep feature map, increasing the number of convolution kernels greatly without increasing the parameter values greatly.
The fully convolutional neural network proposed herein successfully changes the organization structure of the conventional neural network, replacing the final fully-connected layer for classification with convolutional layers, which also conforms to the "fully convolutional" neural network proposed herein.
3. QSPR model training based on full convolution neural network
3.1 data Structure and data conversion
SMILES (Simplified molecular input specification) is a specification for explicitly describing a molecular structure using ASCII character strings. SMILES was developed by Arthur Weininger and David Weininger in the late 80's of the 20 th century and was modified and expanded by others, particularly by the Sun's Chemical Information Systems Inc. (Daylight Chemical Information Systems Inc.).
TABLE 1
Figure BDA0002021543740000061
Figure BDA0002021543740000071
Table 1 is a diagram of the structure of SMILES and the corresponding compound molecules displayed by the software, and it can be seen that different SMILES correspond to different compound molecule structures, and the SMILES can obtain the corresponding descriptors through the corresponding software calculation (herein Rdkit), as shown in table 2.
TABLE 2
Figure BDA0002021543740000072
For a common single SMILES, the generated molecular descriptor is a [1,200] array vector with dimensions, which is used to replace the molecular representation and is also treated as input data for the model. The solubility is specifically defined as that solubility itself presents a continuous numerical sequence, and therefore is artificially classified using one-hot coding, and is classified into 10 classes for the sake of simplicity herein.
3.2 specific design and parameters of the full convolution model
In order to solve the problem that the characteristics are not obvious in the training process, the network bridges multiple dense connections, in the process of characteristic extraction of the convolutional layers, direct connection is established between any two layers, and the input of each layer is the union of the outputs of all the layers. And all the feature information extracted by the layer is also transmitted as communication information to the next layer until the final global convolutional layer. The global convolutional layer is used for performing spatial mapping on the extracted features and mapping the most significant features corresponding to the extracted features on different spatial levels so as to determine the corresponding categories. Thus, after sufficient extraction and mapping, the final pair is made to have the corresponding input value fall within a specific target interval.
3.3, full convolution model QSPR experimental results and analysis
3.3.1 Experimental data set
There are 3 data sets used herein, in order:
1) abraham octanol solubility dataset
2) Delaney water solubility dataset
3) Tox21 toxicity data set
Abraham and Delaney have 283 and 1144 records, respectively, where the structural formula SMILES is used and the specific values of the solubility are calculated log.
Tox21 was derived from the Tox21 program of national institute of health, chemical genomics (NCGC) of Lockville, Maryland, USA, where 12 groups of data were selected (nr-ahr, nr-ar, nr-ar-lbd, nr-aromatase, nr-er, nr-er-lbd, nr-ppar-gamma, sr-are, sr-atad5, sr-hse, sr-mmp, sr-p 53). Each set of data was approximately 8000 records.
All data sets were divided into training sets and test sets, accounting for 90% of the training sets and 10% of the test sets, respectively.
3.3.2 Experimental results and analysis
The full convolutional neural network model was implemented using a Tensorflow library as the basic framework. The main hardware used herein is two NVIDIA 1070 graphics cards as image processors, the batch size (batch size) is set to 50, the number of training times is unlimited, the learning rate is 0.0001, and the adopted optimizer model is a stochastic gradient descent optimizer (gradientdescreenoptimizer).
In each iteration of the model training, the parameters of all convolutional layers are involved in the calculation and are updated, and all the parameters are parameters of the convolutional filter. And the model simultaneously calculates the accuracy of the training set and the test set, and stops the model training when the accuracy on the training set is more than 0.9999. The results of the verification are shown in the following table:
TABLE 3
SVM Logistic regression Full convolution model
Abraham 0.38 0.17 0.79+
tox21 0.76 0.21 0.9999+
Delaney 0.51 0.15 0.92+
As can be seen from table 3, the full convolution model proposed herein achieves very good accuracy on each data set, and especially on Tox21 data set, the average of 12 verification results can substantially reach 100% accuracy, which is very necessary for the verification of toxicity. The results for the Abraham and Delaney datasets are less than ideal for Tox21, most likely because the data volume of the dataset is too small to cover all the required training points.
4. Concluding sentence
Experiments were performed on different sets of molecular activity data using a fully convolutional neural network. Experiments show that: compared with the traditional machine learning tool, the full convolution neural network can obtain the best accuracy rate on small data, and the training speed is not obviously reduced.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (5)

1. A universal compound structure-property correlation prediction method based on a neural network is characterized by comprising the following steps:
step 1, transforming a molecular descriptor into a characteristic vector form to construct a data set;
step 2, dividing the data set into a training set and a testing set, sending the training set into a fully-connected neural network for training, and determining parameters of a fully-convolutional model;
and 3, transforming the molecular descriptors to be predicted into a characteristic vector form, and predicting according to the full convolution model.
2. The prediction method of claim 1, wherein: the specific content of the step 1 is as follows: the molecular descriptor is divided into 2 parts, which are respectively corresponding to the molecular fingerprint calculated by the molecular structural formula and the general descriptor, and the 2 parts are connected to form a characteristic vector.
3. The prediction method of claim 1, wherein: in step 2, the architecture of the fully-connected neural network is as follows: the first layer is an input layer, then a plurality of convolution layers are arranged, finally, a full convolution network with the convolution kernel size of [1,1] is used as a classification network, and the mean value of each layer is used for representing the category represented by the layer.
4. The prediction method of claim 1, wherein: in the step 2, a back propagation algorithm or a gradient descent algorithm is adopted to train the data set.
5. The prediction method of claim 1, wherein: in step 2, the training set and the test set are divided in the following manner: 90% of the data set was taken as the training set and 10% of the data set was taken as the test set.
CN201910280668.9A 2019-04-09 2019-04-09 Universal compound structure-property correlation prediction method based on neural network Pending CN111798935A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910280668.9A CN111798935A (en) 2019-04-09 2019-04-09 Universal compound structure-property correlation prediction method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910280668.9A CN111798935A (en) 2019-04-09 2019-04-09 Universal compound structure-property correlation prediction method based on neural network

Publications (1)

Publication Number Publication Date
CN111798935A true CN111798935A (en) 2020-10-20

Family

ID=72805083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910280668.9A Pending CN111798935A (en) 2019-04-09 2019-04-09 Universal compound structure-property correlation prediction method based on neural network

Country Status (1)

Country Link
CN (1) CN111798935A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634993A (en) * 2020-12-30 2021-04-09 中国科学院生态环境研究中心 Prediction model and screening method for activation activity of estrogen receptor of chemicals
CN113362905A (en) * 2021-06-08 2021-09-07 浙江大学 Asymmetric catalytic reaction enantioselectivity prediction method based on deep learning
CN113380346A (en) * 2021-06-08 2021-09-10 河南大学 Coupling reaction yield intelligent prediction method based on attention convolution neural network
CN113380337A (en) * 2021-06-08 2021-09-10 浙江大学 Organic fluorescent small molecule optical property prediction method based on deep neural network
CN113674807A (en) * 2021-08-10 2021-11-19 南京工业大学 Molecular screening method based on deep learning technology qualitative and quantitative model
CN113903409A (en) * 2021-12-08 2022-01-07 北京晶泰科技有限公司 Molecular data processing method, model construction and prediction method and related device
CN115062181A (en) * 2022-08-16 2022-09-16 苏州创腾软件有限公司 Polymer glass transition temperature prediction method based on convolutional neural network

Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028330A1 (en) * 2001-07-13 2003-02-06 Ailan Cheng System and method for aqueous solubility prediction
CN101329699A (en) * 2008-07-31 2008-12-24 四川大学 Method for predicting medicament molecule pharmacokinetic property and toxicity based on supporting vector machine
CN101339180A (en) * 2008-08-14 2009-01-07 南京工业大学 Organic compound explosive characteristic prediction method based on support vector machine
CN101477597A (en) * 2009-01-15 2009-07-08 浙江大学 Natural product active ingredient computation and recognition method based compound characteristic
KR20120085153A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting lower flammability limit volume percent of organic compound
KR20120085148A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network model predicting standard state enthalpy of formation of pure organic compound
KR20120085147A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting solubility index of organic compound
KR20120085144A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting water solubility of pure organic compound
WO2012177108A2 (en) * 2011-10-04 2012-12-27 주식회사 켐에쎈 Model, method and system for predicting, processing and servicing online physicochemical and thermodynamic properties of pure compound
CN102980972A (en) * 2012-11-06 2013-03-20 南京工业大学 Method for determining hot dangerousness of self-reactive chemical substance
CN103235901A (en) * 2013-05-07 2013-08-07 黑龙江中医药大学 Method for determining phase transition boundary of microemulsion drug carrier of tanshinone IIA
CN104376221A (en) * 2014-11-21 2015-02-25 环境保护部南京环境科学研究所 Method for predicating skin permeability coefficients of organic chemicals
US20150134315A1 (en) * 2013-09-27 2015-05-14 Codexis, Inc. Structure based predictive modeling
CN106777986A (en) * 2016-12-19 2017-05-31 南京邮电大学 Ligand molecular fingerprint generation method based on depth Hash in drug screening
CN106777922A (en) * 2016-11-30 2017-05-31 华东理工大学 A kind of CTA hydrofinishings production process agent model modeling method
US20170161635A1 (en) * 2015-12-02 2017-06-08 Preferred Networks, Inc. Generative machine learning systems for drug design
CN106874688A (en) * 2017-03-01 2017-06-20 中国药科大学 Intelligent lead compound based on convolutional neural networks finds method
CN107239803A (en) * 2017-07-21 2017-10-10 国家海洋局第海洋研究所 Utilize the sediment automatic classification method of deep learning neutral net
CN107563496A (en) * 2017-08-07 2018-01-09 北京工业大学 A kind of deep learning mode identification method of vectorial core convolutional neural networks
US20180012124A1 (en) * 2016-07-05 2018-01-11 International Business Machines Corporation Neural network for chemical compounds
CN107871054A (en) * 2016-09-23 2018-04-03 中国石油天然气股份有限公司 Oil refining atmospheric and vacuum distillation unit nertralizer Selection Method and nertralizer composition based on AHP Field Using Fuzzy Comprehensive Assessments
CN107886491A (en) * 2017-11-27 2018-04-06 深圳市唯特视科技有限公司 A kind of image combining method based on pixel arest neighbors
US10146914B1 (en) * 2018-03-01 2018-12-04 Recursion Pharmaceuticals, Inc. Systems and methods for evaluating whether perturbations discriminate an on target effect
CN108985001A (en) * 2017-06-05 2018-12-11 欧阳德方 A kind of pharmaceutical preparation prediction technique
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning
CN109376798A (en) * 2018-11-23 2019-02-22 东南大学 A kind of classification method based on convolutional neural networks titanium dioxide lattice phase
CN109461475A (en) * 2018-10-26 2019-03-12 中国科学技术大学 Molecular attribute prediction method based on artificial neural network
US20190080057A1 (en) * 2017-09-12 2019-03-14 Michael Stanley Toxicity or adverse effect of a substance predicting automated system and method of training thereof
WO2019050902A1 (en) * 2017-09-05 2019-03-14 Adaptive Phage Therapeutics, Inc. Methods to determine the sensitivity profile of a bacterial strain to a therapeutic composition

Patent Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028330A1 (en) * 2001-07-13 2003-02-06 Ailan Cheng System and method for aqueous solubility prediction
CN101329699A (en) * 2008-07-31 2008-12-24 四川大学 Method for predicting medicament molecule pharmacokinetic property and toxicity based on supporting vector machine
CN101339180A (en) * 2008-08-14 2009-01-07 南京工业大学 Organic compound explosive characteristic prediction method based on support vector machine
CN101477597A (en) * 2009-01-15 2009-07-08 浙江大学 Natural product active ingredient computation and recognition method based compound characteristic
WO2012177108A2 (en) * 2011-10-04 2012-12-27 주식회사 켐에쎈 Model, method and system for predicting, processing and servicing online physicochemical and thermodynamic properties of pure compound
KR20120085153A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting lower flammability limit volume percent of organic compound
KR20120085147A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting solubility index of organic compound
KR20120085144A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network hybrid model predicting water solubility of pure organic compound
KR20120085148A (en) * 2011-10-05 2012-07-31 주식회사 켐에쎈 Multiple linear regression-artificial neural network model predicting standard state enthalpy of formation of pure organic compound
CN102980972A (en) * 2012-11-06 2013-03-20 南京工业大学 Method for determining hot dangerousness of self-reactive chemical substance
CN103235901A (en) * 2013-05-07 2013-08-07 黑龙江中医药大学 Method for determining phase transition boundary of microemulsion drug carrier of tanshinone IIA
US20150134315A1 (en) * 2013-09-27 2015-05-14 Codexis, Inc. Structure based predictive modeling
CN104376221A (en) * 2014-11-21 2015-02-25 环境保护部南京环境科学研究所 Method for predicating skin permeability coefficients of organic chemicals
US20170161635A1 (en) * 2015-12-02 2017-06-08 Preferred Networks, Inc. Generative machine learning systems for drug design
US20180012124A1 (en) * 2016-07-05 2018-01-11 International Business Machines Corporation Neural network for chemical compounds
CN107871054A (en) * 2016-09-23 2018-04-03 中国石油天然气股份有限公司 Oil refining atmospheric and vacuum distillation unit nertralizer Selection Method and nertralizer composition based on AHP Field Using Fuzzy Comprehensive Assessments
CN106777922A (en) * 2016-11-30 2017-05-31 华东理工大学 A kind of CTA hydrofinishings production process agent model modeling method
CN106777986A (en) * 2016-12-19 2017-05-31 南京邮电大学 Ligand molecular fingerprint generation method based on depth Hash in drug screening
CN106874688A (en) * 2017-03-01 2017-06-20 中国药科大学 Intelligent lead compound based on convolutional neural networks finds method
CN108985001A (en) * 2017-06-05 2018-12-11 欧阳德方 A kind of pharmaceutical preparation prediction technique
CN107239803A (en) * 2017-07-21 2017-10-10 国家海洋局第海洋研究所 Utilize the sediment automatic classification method of deep learning neutral net
CN107563496A (en) * 2017-08-07 2018-01-09 北京工业大学 A kind of deep learning mode identification method of vectorial core convolutional neural networks
WO2019050902A1 (en) * 2017-09-05 2019-03-14 Adaptive Phage Therapeutics, Inc. Methods to determine the sensitivity profile of a bacterial strain to a therapeutic composition
US20190080057A1 (en) * 2017-09-12 2019-03-14 Michael Stanley Toxicity or adverse effect of a substance predicting automated system and method of training thereof
CN107886491A (en) * 2017-11-27 2018-04-06 深圳市唯特视科技有限公司 A kind of image combining method based on pixel arest neighbors
US10146914B1 (en) * 2018-03-01 2018-12-04 Recursion Pharmaceuticals, Inc. Systems and methods for evaluating whether perturbations discriminate an on target effect
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning
CN109461475A (en) * 2018-10-26 2019-03-12 中国科学技术大学 Molecular attribute prediction method based on artificial neural network
CN109376798A (en) * 2018-11-23 2019-02-22 东南大学 A kind of classification method based on convolutional neural networks titanium dioxide lattice phase

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DRUGAL: "基于神经网络的溶解度预测", Retrieved from the Internet <URL:https://blog.csdn.net/u012325865/article/details/82725777> *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634993A (en) * 2020-12-30 2021-04-09 中国科学院生态环境研究中心 Prediction model and screening method for activation activity of estrogen receptor of chemicals
CN113362905A (en) * 2021-06-08 2021-09-07 浙江大学 Asymmetric catalytic reaction enantioselectivity prediction method based on deep learning
CN113380346A (en) * 2021-06-08 2021-09-10 河南大学 Coupling reaction yield intelligent prediction method based on attention convolution neural network
CN113380337A (en) * 2021-06-08 2021-09-10 浙江大学 Organic fluorescent small molecule optical property prediction method based on deep neural network
CN113362905B (en) * 2021-06-08 2022-07-08 浙江大学 Asymmetric catalytic reaction enantioselectivity prediction method based on deep learning
CN113674807A (en) * 2021-08-10 2021-11-19 南京工业大学 Molecular screening method based on deep learning technology qualitative and quantitative model
CN113903409A (en) * 2021-12-08 2022-01-07 北京晶泰科技有限公司 Molecular data processing method, model construction and prediction method and related device
CN113903409B (en) * 2021-12-08 2023-07-07 北京晶泰科技有限公司 Molecular data processing method, model construction and prediction method and related devices
CN115062181A (en) * 2022-08-16 2022-09-16 苏州创腾软件有限公司 Polymer glass transition temperature prediction method based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN111798935A (en) Universal compound structure-property correlation prediction method based on neural network
CN113707235B (en) Drug micromolecule property prediction method, device and equipment based on self-supervision learning
WO2022083624A1 (en) Model acquisition method, and device
CN110532417B (en) Image retrieval method and device based on depth hash and terminal equipment
CN110163258A (en) A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention
Guo et al. A centroid-based gene selection method for microarray data classification
CN107577605A (en) A kind of feature clustering system of selection of software-oriented failure prediction
CN112199532B (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
EP4322031A1 (en) Recommendation method, recommendation model training method, and related product
EP4273754A1 (en) Neural network training method and related device
Feng et al. Dual-graph convolutional network based on band attention and sparse constraint for hyperspectral band selection
Günen et al. Analyzing the contribution of training algorithms on deep neural networks for hyperspectral image classification
CN116417093A (en) Drug target interaction prediction method combining transducer and graph neural network
Kong et al. Deep PLS: A lightweight deep learning model for interpretable and efficient data analytics
Liao et al. Class-wise graph embedding-based active learning for hyperspectral image classification
CN110348287A (en) A kind of unsupervised feature selection approach and device based on dictionary and sample similar diagram
Cui et al. Dual-triple attention network for hyperspectral image classification using limited training samples
CN116580848A (en) Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers
CN107480441B (en) Modeling method and system for children septic shock prognosis prediction
CN116798652A (en) Anticancer drug response prediction method based on multitasking learning
CN112086144A (en) Molecule generation method, molecule generation device, electronic device, and storage medium
CN113257357B (en) Protein residue contact map prediction method
Kothari et al. Potato leaf disease detection using deep learning
CN113724195B (en) Quantitative analysis model and establishment method of protein based on immunofluorescence image
CN115861902A (en) Unsupervised action migration and discovery methods, systems, devices, and media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination