CN112086139A - Multi-source transfer learning method and device for virtual screening of small molecule drugs - Google Patents

Multi-source transfer learning method and device for virtual screening of small molecule drugs Download PDF

Info

Publication number
CN112086139A
CN112086139A CN202010854924.3A CN202010854924A CN112086139A CN 112086139 A CN112086139 A CN 112086139A CN 202010854924 A CN202010854924 A CN 202010854924A CN 112086139 A CN112086139 A CN 112086139A
Authority
CN
China
Prior art keywords
data set
virtual screening
ligand
module
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010854924.3A
Other languages
Chinese (zh)
Inventor
袁露
吴建盛
胡海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010854924.3A priority Critical patent/CN112086139A/en
Publication of CN112086139A publication Critical patent/CN112086139A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Medicinal Chemistry (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Library & Information Science (AREA)
  • Biomedical Technology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

The invention provides a multisource migration learning method and device for virtual screening of small molecule drugs, wherein the method comprises the following steps: acquiring a same source data set, sampling the same source data set, and acquiring a sampled same source data set; inputting ligand molecules smiles and a biological activity value, and training in a neural network to obtain a virtual screening model; putting the sampled homologous source data set into a virtual screening model for training to obtain model parameters; predicting the biological activity value of the ligand molecule combined with the drug target.

Description

Multi-source transfer learning method and device for virtual screening of small molecule drugs
Technical Field
The invention relates to a virtual learning method and a virtual learning device, in particular to a multisource transfer learning method and a multisource transfer learning device for virtual screening of small molecule drugs.
Background
Virtual screening of drugs is a computational technique for drug discovery, which is used to search small molecule libraries to identify structures that are most likely to bind to drug targets, thus concentrating targets and greatly reducing the number of experimentally screened compounds, thereby shortening the development cycle and saving cost.
Among them, virtual screening can be classified into two categories, i.e., receptor-based virtual screening and ligand-based virtual screening. The virtual screening based on the receptor starts from the three-dimensional structure of a target protein, researches the characteristic properties of the binding site of the target protein and the interaction mode between the binding site and a small molecule compound, evaluates the binding capacity of the protein and the small molecule compound according to an affinity scoring function related to binding energy, and finally selects a compound with a reasonable binding mode and a high prediction score from a large amount of compound molecules for subsequent bioactivity test. Ligand-based virtual screening generally utilizes small molecule compounds with known activities, searches chemical molecular structures capable of matching the compounds in a compound database according to the shape similarity or pharmacophore model of the compounds, and then performs experimental screening research on the selected compounds.
The number of compounds with druggy properties is enormous, and machine learning can help search a huge chemical molecule library, and meanwhile, the properties of massive compounds are cataloged, characterized and compared by using an algorithm, so that researchers can be helped to quickly and economically find the best candidate drugs. Meanwhile, the medicine is safer, and the failure rate of the medicine in clinical tests is lower. In addition, it is helpful to discover new classes of drugs, exploring unexplored or repudiated chemical spaces.
At present, many of the drug developments of the discovered targets are approaching saturation, and new drug development requires discovery of new drug targets. However, the research of new drug targets is not sufficient, and virtual screening for new drug targets often faces the problem of insufficient training samples, so that a good virtual screening model is difficult to construct. Existing research shows that transfer learning is helpful for improving the virtual screening problem of a drug target when the training sample amount is insufficient. In addition, new drug targets can often find homologous or similar target proteins, some of which can even find more, and these target proteins are easier to act with similar compounds, and the interaction mode and mechanism are often more similar.
Disclosure of Invention
The purpose of the invention is as follows: in order to solve the problem of virtual screening of small molecule drugs of a new target under a small sample, the invention aims to provide an effective multisource migration learning method for virtual screening of small molecule drugs, and aims to provide a corresponding multisource migration learning device for virtual screening of small molecule drugs according to the method.
The technical scheme is as follows: the invention provides a multisource migration learning method for virtual screening of small molecule drugs, which comprises the following steps:
(1) acquiring a same source data set, sampling the same source data set, and acquiring a sampled same source data set;
(2) inputting ligand molecules smiles and a biological activity value, and training in a neural network to obtain a virtual screening model;
(3) putting the sampled homologous source data set into a virtual screening model for training to obtain model parameters;
(4) predicting the biological activity value of the ligand molecule combined with the drug target.
Wherein, step (1) includes:
(1.4) selecting a homologous drug target;
(1.5) obtaining a desired initial data set of homologous drug targets, wherein the initial data set comprises information of the homologous drug targets, and the information comprises desired smiles of the ligand molecules and activity values of ligand action;
(1.6) randomly putting back and sampling the data set corresponding to the homologous drug target, setting the sampling ratio, repeating for several times, and obtaining the sub-homologous source data set after sampling.
Preferably, step (2) comprises:
(2.5) obtaining a target drug target initial data set T { (x)1,y1),...,(xiyi),...(xN,yN)},
Wherein x isiSmiles for the ith ligand molecule,
yiis the activity value of the ith ligand acting on the drug target,
n is the number of ligand molecules in the data set,
said initial data set comprising information on homologous drug targets, said information comprising desired ligand molecules smiles and activity values for ligand action;
(2.6) Using the formula
Figure BDA0002646092330000021
Generating a molecular fingerprint of the ligand molecule by convolution operation, and marking the molecular fingerprint as f;
wherein m isj: attribute vector of jth atom;
NI: neighborhood of atom j;
Aij: associated with the edge connecting atoms i and j;
Figure BDA0002646092330000024
a weight matrix;
b: a bias vector;
Figure BDA0002646092330000022
(2.7) using the formula
Figure BDA0002646092330000023
Generating a weighted molecular fingerprint of the ligand molecule, and marking as F;
wherein f isiIs the molecular fingerprint of the ith unit;
w is a parameter of the weight layer;
(2.8) predicting the bioactivity value through two full-connection layers by using the generated molecular fingerprint,
Figure BDA0002646092330000031
wherein the content of the first and second substances,
Figure BDA0002646092330000032
is the predicted biological activity value of the binding of the ith ligand molecule;
omsparameters of the full connection layer;
Fjthe weighted molecule for the jth ligand molecule refers toAnd (4) pattern.
Further, the step (3) comprises:
(3.1) acquiring a plurality of source datasets generated in the step (1), wherein the datasets comprise information including desired ligand molecules smiles and activity values of ligand action;
(3.2) training each sub-source domain data set in the drug small molecule virtual screening model generated in the step (2) and obtaining model parameters, wherein the model parameters are set in the application
Figure BDA0002646092330000033
And (3.3) inputting the target domain data set into the virtual screening model, and replacing the original parameters in the virtual model with the model parameters obtained in the previous step. Obtaining the predicted biological activity value of the target domain.
Preferably, step (4) comprises:
(4.1) comparing the reliability of the target domain biological activity values obtained by training a plurality of sub-source domains, and measuring by using a correlation coefficient;
(4.2) selecting a plurality of sub-source domains with the maximum correlation coefficient, and averaging the biological activity values corresponding to the sub-source domains to obtain a final predicted biological activity value;
(4.3) comparing the final predicted value of biological activity with the actual value of activity and using the correlation coefficient r2To measure the reliability of the prediction.
Figure BDA0002646092330000034
The invention also provides a multi-source transfer learning device for virtual screening of small molecule drugs, which comprises the following modules:
the same source data set generating module is used for acquiring a sub same source data set;
the virtual screening module is used for constructing a virtual screening model;
the multi-source migration module is used for helping to construct a target drug target virtual screening model by utilizing ligand molecule information of a homologous drug target;
and the activity value prediction module is used for predicting the size of the activity value of the ligand molecule after being combined with the drug target and evaluating the performance of the virtual screening model.
Wherein, the homologous data set generating module includes: downloading a drug target data set from an uniprot database, wherein the obtained data set comprises smiles molecular formula of ligand molecules and activity value of the action of the ligand molecules and drug targets; and sampling the data set by using a put-back sampling mode, and outputting and obtaining the sampled data set.
The virtual screening module is as follows: predicting the bioactivity value of the ligand molecule and the drug target, inputting: a compound in smiles format, output: the biological activity value of the drug target effect, and applying the biological activity value to drug design aiming at the drug target;
preferably, the multisource migration learning device for virtual screening of small molecule drugs comprises a migration module, a virtual screening module and a prediction module; the migration module is used for migrating information of a homologous drug target ligand; inputting a data set to be sampled, and outputting a migration parameter by using a demo module; the virtual screening module is used for constructing a virtual screening module of a small-molecule drug, predicting a life activity value of a ligand molecule combined with the drug target, inputting smiles molecular formula of the ligand molecule in a data set into the demo module, and replacing an initial parameter with a migration parameter to obtain a biological activity value acting on the drug target; a prediction module: the method is used for predicting the activity value of the ligand molecule after being combined with the drug target, evaluating the performance of the model, comparing the predicted biological activity value with the actual life activity value, and evaluating the model by utilizing the reliability index.
The activity value prediction module comprises: and selecting the most reliable data sets to be predicted by using the reliability index, averaging the activity values of the data sets to obtain the final predicted biological activity value, and evaluating the reliability of the final predicted value by using the reliability index.
Has the advantages that: the compound sample information rich in the target proteins is utilized to help the drug targets with insufficient sample information to establish a virtual screening model. Through multi-source migration learning, a plurality of homologous or similar drug targets are used as source domains, a target drug target is used as a target domain, and compound information of the source domain is migrated into the target domain to help to construct a virtual screening model. Therefore, a model with strong generalization capability can be established under the condition of a small sample, and the accuracy of virtual screening can be improved.
Drawings
Fig. 1 is a schematic flow chart of a multisource migration learning method for virtual screening of small molecule drugs according to the present application;
FIG. 2 is a flow chart of step 101 in an embodiment of the method of the present application;
FIG. 3 is a flow chart of step 102 in an embodiment of the method of the present application;
FIG. 4 is a flow chart of step 302 in an embodiment of the method of the present application;
FIG. 5 is a flow chart of step 103 in an embodiment of the method of the present application;
FIG. 6 is a flow chart of step 104 in an embodiment of the method of the present application;
FIG. 7 is a schematic flow chart of a multi-source transfer learning apparatus for virtual screening of small molecule drugs according to the present application;
FIG. 8 is a schematic diagram of the structure of a module 601 in an embodiment of the apparatus of the present application;
FIG. 9 is a block diagram of an embodiment of the apparatus 602;
fig. 10 is a schematic structural diagram of a module 603 in an embodiment of the apparatus of the present application.
Detailed Description
The present invention will be further explained with reference to the following embodiments.
Fig. 1 shows a schematic diagram of the multi-source transfer learning method for virtual screening of small molecule drugs in this example, which may include the following steps:
step 101: constructing a data set generation model with the put-back samples;
specifically, referring to fig. 2, which is a flowchart of the step 101 in practical application, the step 101 specifically includes:
step 201: selecting a homologous drug target according to the target drug target. The drug target was P46093, and its cognate target was sought, as shown in table 1, four cognate drug targets were selected:
TABLE 1
Figure BDA0002646092330000051
Step 202: and acquiring a required homologous data set comprising the ligand molecules smiles and the combined biological activity value. Taking table 2 as an example, the homologous data set includes:
canonical smiles: molecular characteristics for generating ligands;
standard value: the activity value of the action of the respective ligand;
TABLE 2
CANONICAL SMILES STANDARD VALUE
CCCCC(C(=O)NC(CC1CCCCC1)C(=O) 0.78
Step 203: respectively sampling P25106, P25106, P47900 and P3248 data sets by a sampling mode with put back, setting the sampling ratio to be 0.5, and repeating the sampling three times to obtain 12 sub-data sets, namely D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11 and D12;
step 102: establishing a virtual screening model based on a graph neural network;
specifically, referring to fig. 3, which is a flowchart of the step 102 in practical application, the step 102 specifically includes:
step 301: a target P46093 dataset T was obtained for the target object as shown in the following figure:
TABLE 3
Figure BDA0002646092330000061
Step 302: generating a molecular fingerprint;
specifically, referring to fig. 4, as a flowchart of step 302 in practical application, step 302 may specifically include:
step 302-1: input target dataset T { (x)1,y1),...,(xi,yi),...(xN,yN) In which xiSmiles, y as the ith ligand moleculeiIs the activity value of the action of the ith ligand and the drug target, and N is the number of ligand molecules.
The generation of the molecular fingerprint may comprise L units, each unit consisting of a convolutional layer and an accumulation layer. The following operations are performed for each unit:
step 302-2: input xiAfter rdkit processing, let A be includediAn atom, xiEach atom in (a) is represented by a 62-dimensional attribute vector as: m isj(j=1,...,Ai);
Step 302-3: initialization of parameters, pair C1,N1,E1,bl,l∈[1,L]Initializing and letting F, F equal to 0;
step 302-4 randomly selecting N from data set TSSamples, forming a new sample set
Figure BDA0002646092330000077
Step 302-5: the operation is performed on the ith unit as follows:
each atom is exported by the convolutional layer as:
Figure BDA0002646092330000071
mjattribute vector of j-th atom;
NI is the neighborhood of atom j;
Aijassociated with the edge connecting the linking atoms i and j;
Figure BDA0002646092330000072
a weight matrix;
b is a bias vector;
Figure BDA0002646092330000073
step 302-6: all atoms go through one layer of summation, the output is: f ═ f + zi
Step 303: generating weighted molecular fingerprints
Figure BDA0002646092330000074
Step 304: connecting the weighted molecular fingerprints generated in step 304 to two fully-connected layers, and outputting:
Figure BDA0002646092330000075
Figure BDA0002646092330000076
pjmweights for connecting neuron j to neuron m;
om,sweights that connect neuron m to neuron s;
Figure BDA0002646092330000081
is the predicted biological activity value of the ith ligand binding to the drug target.
Step 305: optimizing an error function, and continuously iterating a parameter theta, wherein the theta is a set of all parameters;
Figure BDA0002646092330000082
step 306: a determination is made as to whether the model optimization meets the desired criteria, and if not, the process returns to step 304.
Step 307: with return prediction
Figure BDA0002646092330000083
And all model parameters.
Step 103: constructing a multi-source transfer learning model based on parameter transfer;
specifically, fig. 5 may be referred to as a flowchart of the step 103 in practical application;
this step 103 may specifically include:
step 401: acquiring 12 homologous data sets generated in the step 1, wherein the data sets comprise ligand molecules smiles and activity values of ligand action;
TABLE 4
Serial number The affiliated ID
1 P25106
2 P25106
3 P25106
4 P25566
5 P21556
6 P21556
7 P47900
8 P47900
9 P47900
10 P32246
11 P32466
12 P32466
Step 402: training each source domain data set in the virtual screening model generated in the step 102, and acquiring all model parameters;
step 403: inputting the target data set into the virtual screening model in the step (2), replacing the original parameters in the virtual screening model with the model parameters obtained in the previous step, and obtaining the biological activity value predicted by the target domain.
Step 104: constructing an activity value prediction model based on ensemble learning;
specifically, referring to fig. 6, as a flowchart of step 104 in practical application, step 104 may specifically include:
step 501: meterCalculating the reliability of activity values obtained by training a plurality of source domains by using a correlation coefficient r2Wherein, the correlation coefficient ranges from 0 to 1, and the closer to 1 represents the higher;
Figure BDA0002646092330000091
wherein y isiThe bands represent actual values;
Figure BDA0002646092330000092
representing the predicted value.
Step 502: in which r is selected2A maximum of 5 sub-domains;
step 503: and averaging the corresponding predicted biological activity values to obtain the final predicted biological activity value.
Comparing the final predicted value of biological activity with the actual value of biological activity, and using r2And rmse to measure the reliability of the prediction.
Figure BDA0002646092330000093
yi: the activity value of the ith ligand binding to the target;
Figure BDA0002646092330000094
a predicted activity value for binding of the ith ligand to the target;
y: an average value of activity values for ligand binding to the target;
Figure BDA0002646092330000095
average of predicted activity values for ligand binding to target.
Corresponding to the method provided by the above embodiment of the multi-source migration learning method for virtual screening of small molecule drugs, the present application also provides an embodiment of a multi-source migration learning apparatus for virtual screening of small molecule drugs, referring to fig. 7, in this example, the apparatus may include:
the same source data set generating module 601 is configured to obtain a same source data set;
referring to fig. 8, fig. 8 is a schematic diagram of a same-source data set generating model, which specifically includes:
homologous target selection module 701: selection of drug targets for aiding model construction;
an initial block 702: an initial dataset for obtaining activity values for the ligands smiles and ligand action;
with the put back sampling module 703: setting a sampling ratio, repeating the sampling with the put back for multiple times, and generating a final source domain data set;
virtual screening module 602: the method is used for constructing a virtual screening model; the virtual screening module based on the graph neural network predicts the bioactivity values of ligand molecules and drug targets, and applies the bioactivity values to new drug design aiming at the drug targets, and inputs: a compound in smiles format, output: biological activity values that interact with these drug targets;
the multi-source transfer learning module 603 is used for helping to construct a target drug target virtual screening model by utilizing ligand molecule information of the homologous drug target;
referring to fig. 9, fig. 9 is a schematic structural diagram of the multi-source migration learning module 603 based on parameter migration, which specifically includes:
the homologous data set selection module 801: the method comprises the steps of obtaining information of a homologous drug target;
the migration module 802: information for migrating a cognate drug target ligand;
the virtual screening module 803 is used for constructing a virtual screening module of the small molecule drug and predicting the life activity value of the ligand molecule combined with the target drug;
a prediction module 804 for predicting the magnitude of the activity value of the ligand molecule after binding to the drug target and evaluating the performance of the model.
An activity value prediction module 604;
referring to fig. 10, fig. 10 is a schematic structural diagram of the activity value prediction module 604 based on ensemble learning, which specifically includes:
optimal data set selection module 901: selecting data with the most reliable predicted values;
mean predicted activity value module 902: the final predicted activity value is obtained by averaging.

Claims (10)

1. A multisource migration learning method for virtual screening of small molecule drugs is characterized by comprising the following steps:
(1) acquiring a same source data set, sampling the same source data set, and acquiring a sampled same source data set;
(2) inputting ligand molecules smiles and a biological activity value, and training in a neural network to obtain a virtual screening model;
(3) putting the sampled homologous source data set into a virtual screening model for training to obtain model parameters;
(4) predicting the biological activity value of the ligand molecule combined with the drug target.
2. The multi-source migratory learning method for virtual screening of small molecule drugs according to claim 1, wherein the step (1) comprises:
(1.1) selecting a homologous drug target;
(1.2) acquiring a required homologous drug target data set, wherein the initial data set comprises information of the homologous drug target, and the information comprises required ligand molecules smiles and an activity value of ligand action;
(1.3) randomly putting back and sampling the data set corresponding to the homologous drug target, setting the sampling ratio, repeating for several times, and obtaining the sub-homologous source data set after sampling.
3. The multi-source migratory learning method for virtual screening of small molecule drugs according to claim 1, wherein the step (2) comprises:
(2.1) acquiring a target drug target data set T, wherein the initial data set comprises information of homologous drug targets, and the information comprises desired ligand molecules smiles and activity values of ligand action;
(2.2) using the formula
Figure FDA0002646092320000011
Generating a molecular fingerprint of the ligand molecule by convolution operation, and marking the molecular fingerprint as f;
(2.3) Using the formula
Figure FDA0002646092320000012
Generating a weighted molecular fingerprint of the ligand molecule, and marking as F;
(2.4) predicting the bioactivity value through two full-connection layers by using the generated molecular fingerprint,
Figure FDA0002646092320000013
4. the multi-source migratory learning method for virtual screening of small molecule drugs according to claim 1, wherein the step (3) comprises:
(3.1) acquiring a plurality of source datasets generated in the step (1), wherein the datasets comprise information including desired ligand molecules smiles and activity values of ligand action;
(3.2) training each sub-source domain data set in the drug small molecule virtual screening model generated in the step (2), and obtaining model parameters;
and (3.3) inputting the target domain data set into the virtual screening model, and replacing the original parameters in the virtual model with the model parameters obtained in the previous step. Obtaining the predicted biological activity value of the target domain.
5. The multi-source migratory learning method for virtual screening of small molecule drugs according to claim 1, wherein the step (4) comprises:
(4.1) comparing the reliability of the target domain biological activity values obtained by training a plurality of sub-source domains, and measuring by using a correlation coefficient;
(4.2) selecting a plurality of sub-source domains with the maximum correlation coefficient, and averaging the biological activity values corresponding to the sub-source domains to obtain a final predicted biological activity value;
(4.3) comparing the final predicted value of biological activity with the actual value of activity and using the correlation coefficient r2To measure the reliability of the prediction.
Figure FDA0002646092320000021
6. The utility model provides a multisource migration learning device towards virtual screening of small molecule medicine which characterized in that comprises following module:
the same source data set generating module is used for acquiring a sub same source data set;
the virtual screening module is used for constructing a virtual screening model;
the multi-source migration module is used for helping to construct a target drug target virtual screening model by utilizing ligand molecule information of a homologous drug target;
and the activity value prediction module is used for predicting the size of the activity value of the ligand molecule after being combined with the drug target and evaluating the performance of the virtual screening model.
7. The multi-source migration learning device for virtual screening of small molecule drugs according to claim 6, wherein the homology data set generation module comprises: downloading a drug target data set from an uniprot database, wherein the obtained data set comprises smiles molecular formula of ligand molecules and activity value of the action of the ligand molecules and drug targets; and sampling the data set by using a put-back sampling mode, and outputting and obtaining the sampled data set.
8. The multi-source migration learning device for virtual screening of small molecule drugs according to claim 6, wherein the virtual screening module is: predicting the bioactivity value of the ligand molecule and the drug target, inputting: a compound in smiles format, output: biological activity value of the drug target effect.
9. The multi-source migration learning device for virtual screening of small molecule drugs according to claim 6, wherein the multi-source migration module comprises a migration module, a virtual screening module and a prediction module; the migration module is used for migrating information of a homologous drug target ligand; inputting a data set to be sampled, and outputting a migration parameter by using a demo module; the virtual screening module is used for constructing a virtual screening module of a small-molecule drug, predicting a life activity value of a ligand molecule combined with the drug target, inputting smiles molecular formula of the ligand molecule in a data set into the demo module, and replacing an initial parameter with a migration parameter to obtain a biological activity value acting on the drug target; a prediction module: the method is used for predicting the activity value of the ligand molecule after being combined with the drug target, evaluating the performance of the model, comparing the predicted biological activity value with the actual life activity value, and evaluating the model by utilizing the reliability index.
10. According to the multisource migration learning device for virtual screening of small molecule drugs, the activity value prediction module comprises: and selecting the most reliable data sets to be predicted by using the reliability index, averaging the activity values of the data sets to obtain the final predicted biological activity value, and evaluating the reliability of the final predicted value by using the reliability index.
CN202010854924.3A 2020-08-24 2020-08-24 Multi-source transfer learning method and device for virtual screening of small molecule drugs Withdrawn CN112086139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010854924.3A CN112086139A (en) 2020-08-24 2020-08-24 Multi-source transfer learning method and device for virtual screening of small molecule drugs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010854924.3A CN112086139A (en) 2020-08-24 2020-08-24 Multi-source transfer learning method and device for virtual screening of small molecule drugs

Publications (1)

Publication Number Publication Date
CN112086139A true CN112086139A (en) 2020-12-15

Family

ID=73728500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010854924.3A Withdrawn CN112086139A (en) 2020-08-24 2020-08-24 Multi-source transfer learning method and device for virtual screening of small molecule drugs

Country Status (1)

Country Link
CN (1) CN112086139A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192572A (en) * 2021-04-29 2021-07-30 南京邮电大学 Drug virtual screening method and device based on molecular similarity and semi-supervised learning
CN113192571A (en) * 2021-04-29 2021-07-30 南京邮电大学 Small molecule drug hERG toxicity prediction method and device based on graph attention mechanism transfer learning
CN113808683A (en) * 2021-09-02 2021-12-17 深圳市绿航星际太空科技研究院 Method and system for virtual screening of drugs based on receptors and ligands
CN114220497A (en) * 2021-12-14 2022-03-22 中国科学院过程工程研究所 Ionic liquid type antibiotic drug property prediction method based on transfer learning and graph neural network and high-throughput screening platform
CN115240762A (en) * 2021-07-23 2022-10-25 杭州钛石科技有限公司 Multi-scale small molecule virtual screening method and system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192572A (en) * 2021-04-29 2021-07-30 南京邮电大学 Drug virtual screening method and device based on molecular similarity and semi-supervised learning
CN113192571A (en) * 2021-04-29 2021-07-30 南京邮电大学 Small molecule drug hERG toxicity prediction method and device based on graph attention mechanism transfer learning
CN113192571B (en) * 2021-04-29 2022-08-23 南京邮电大学 Small molecule drug hERG toxicity prediction method and device based on graph attention mechanism transfer learning
CN113192572B (en) * 2021-04-29 2022-08-23 南京邮电大学 Drug virtual screening method and device based on molecular similarity and semi-supervised learning
CN115240762A (en) * 2021-07-23 2022-10-25 杭州钛石科技有限公司 Multi-scale small molecule virtual screening method and system
CN115240762B (en) * 2021-07-23 2023-07-18 杭州生奥信息技术有限公司 Multi-scale small molecule virtual screening method and system
CN113808683A (en) * 2021-09-02 2021-12-17 深圳市绿航星际太空科技研究院 Method and system for virtual screening of drugs based on receptors and ligands
CN114220497A (en) * 2021-12-14 2022-03-22 中国科学院过程工程研究所 Ionic liquid type antibiotic drug property prediction method based on transfer learning and graph neural network and high-throughput screening platform

Similar Documents

Publication Publication Date Title
CN112086139A (en) Multi-source transfer learning method and device for virtual screening of small molecule drugs
Gao et al. Hierarchical graph learning for protein–protein interaction
CN107862173B (en) Virtual screening method and device for lead compound
CN113327644B (en) Drug-target interaction prediction method based on deep embedding learning of graph and sequence
Nikkilä et al. Analysis and visualization of gene expression data using self-organizing maps
Jiang et al. Predicting protein function by multi-label correlated semi-supervised learning
CN113393911B (en) Ligand compound rapid pre-screening method based on deep learning
CN113744799B (en) Method for predicting interaction and affinity of compound and protein based on end-to-end learning
CN113299346B (en) Classification model training and classifying method and device, computer equipment and storage medium
CN114333986A (en) Method and device for model training, drug screening and affinity prediction
CN112086146A (en) Small molecule drug virtual screening method and device based on deep parameter transfer learning
CN110890130B (en) Biological network module marker identification method based on multi-type relationship
US20020072887A1 (en) Interaction fingerprint annotations from protein structure models
Yuan et al. Protein-ligand binding affinity prediction model based on graph attention network
Wang et al. A novel stochastic block model for network-based prediction of protein-protein interactions
Hu et al. Cancer gene selection with adaptive optimization spiking neural p systems and hybrid classifiers
US20030124548A1 (en) Method for association of genomic and proteomic pathways associated with physiological or pathophysiological processes
CN115881232A (en) ScRNA-seq cell type annotation method based on graph neural network and feature fusion
CN115458045A (en) Drug pair interaction prediction method based on heterogeneous information network and recommendation system
CN115064207A (en) Spatial proteomics deep learning prediction method for protein subcellular localization
Wang et al. DPLA: prediction of protein-ligand binding affinity by integrating multi-level information
Zenbout et al. Prediction of cancer clinical endpoints using deep learning and rppa data
CN112086143B (en) Small molecule drug virtual screening method and device based on unsupervised domain adaptation
Gopal et al. TEXTALTM: Artificial Intelligence Techniques for Automated Protein Structure Determination.
Cai et al. Application and research progress of machine learning in Bioinformatics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201215