CN113782114B - Automatic excavating method of oligopeptide medicine lead based on machine learning - Google Patents

Automatic excavating method of oligopeptide medicine lead based on machine learning Download PDF

Info

Publication number
CN113782114B
CN113782114B CN202111094052.6A CN202111094052A CN113782114B CN 113782114 B CN113782114 B CN 113782114B CN 202111094052 A CN202111094052 A CN 202111094052A CN 113782114 B CN113782114 B CN 113782114B
Authority
CN
China
Prior art keywords
oligopeptide
oligopeptides
candidate
amino acid
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111094052.6A
Other languages
Chinese (zh)
Other versions
CN113782114A (en
Inventor
张永彪
肖百川
王晓刚
马超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111094052.6A priority Critical patent/CN113782114B/en
Publication of CN113782114A publication Critical patent/CN113782114A/en
Application granted granted Critical
Publication of CN113782114B publication Critical patent/CN113782114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses an automatic excavating method of oligopeptide medicine lead based on machine learning, which comprises the following steps: acquiring a functional protein set, and extracting inherent disorder regions (IntrinsicallyDisordered Regions, IDRs) of the functional protein set; constructing an N-Gram model based on a deep neural network; learning a semantic distribution mode of IDRs based on an N-Gram model to obtain a context probability vector of amino acid of oligopeptide of a possible patent medicine; simulating the process of oligopeptide rising from zero by adopting a Monte Carlo method according to the context probability vector of the amino acid to obtain candidate oligopeptides; scoring and ranking the candidate oligopeptides, and selecting a plurality of candidate oligopeptides with top ranking results for functional verification. The invention combines an N-Gram model and a Monte Carlo method to dig out the functional oligopeptide of the possible patent medicine from the functional protein concentrate with forward relation with the treatment of related diseases, and has universality.

Description

Automatic excavating method of oligopeptide medicine lead based on machine learning
Technical Field
The invention relates to the technical field of computer-aided drug design, in particular to an automatic excavating method of oligopeptide drug lead based on machine learning.
Background
The polypeptide medicine is used as a medicine with high selectivity and strong effect, and has high safety and tolerance. However, traditional polypeptide drug design relies heavily on accurate protein structure and function annotation, which results in high cost and time period for drug development. In order to reduce the cost and time period of drug development, attempts have been made to assist in drug development using various types of machine learning and statistical analysis methods, and good progress has been made.
Throughout recent years, almost all commonly used machine learning methods such as deep neural networks, support vector machines, KNNs, random forests and GBMs, logistic regression, discriminant analysis, hidden markov models, etc. have been used to assist in the development of drugs by artificial intelligence. From the application point of view, these works mainly focus on the field where the data stores of antibacterial peptides (AMPs), antitumor peptides (ACPs), and tumor cell neoantigens (neoantigens) are mature.
Based on the features used, these algorithms can be divided into two categories: one is a deep learning-based method, which can achieve high accuracy without manually designing features, but has the defects of "data hunger and thirst" and opaque decision process. The other type is a traditional machine learning method based on feature engineering, and the method is not as fast as deep learning in model capacity, but can obtain more accurate results through high-quality manual features under the condition of data scarcity. Common manual features can be divided into two categories, one category being characterized by the elemental composition of the primary sequence. For example: n-and C-terminal or amino acid residue number of the holotoxin; pseudo amino acid composition (PseAAC) method; a sequence order based method; methods based on Evolutionary Feature Construction (EFC) are based on non-local correlations between motifs. Another class of manual features is based on the physicochemical properties of the natural amino acids and features an average of the physicochemical indices of the entire polypeptide sequence or all amino acids contained at its ends. Taking antibacterial peptide as an example, the physicochemical property indexes based on the primary sequence commonly used at present are 56, wherein 47 peptide fragment characteristics and 9 global characteristics comprise known t-scale, u-polarity and other structure-activity indexes.
However, these methods, which have achieved good results in the development of polypeptide drugs, are difficult to use in the development of oligopeptide drugs. In one aspect, the available data set for oligopeptide drugs is far less than for polypeptide drugs such as ACP, AMP, etc. Up to now, there are 28 FDA approved oligopeptide drugs, 55 in experimental stages, most of which are different modifications or derivatives of the same oligopeptide, which severely limits the use of supervised learning methods such as deep learning. On the other hand, because of the small number of amino acid residues in oligopeptide drugs, manual features for development of polypeptide drugs are difficult to distinguish on oligopeptide drugs, resulting in difficulty in feature migration. The lack of prior information and the limitations of self length make it difficult and important to design unique manual features for oligopeptide drugs.
Therefore, it is becoming urgent and necessary to design an automatic design method for oligopeptide medicines based on machine learning.
Disclosure of Invention
In view of the above, the invention provides an automatic excavating method of oligopeptide medicine lead based on machine learning, which combines an N-Gram model and a Monte Carlo method to intensively excavate functional oligopeptides of possible patent medicine from functional proteins with forward relation with treatment of related diseases, and has universality.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an automatic excavating method of oligopeptide medicine lead based on machine learning, comprising the following steps:
s1, acquiring a functional protein set, and extracting IDRs of the functional protein set;
s2, constructing an N-Gram model based on a deep neural network;
s3, learning a semantic distribution mode of the IDRs based on the N-Gram model to obtain a context probability vector of amino acid of oligopeptide of a possible patent medicine;
s4, simulating the process of oligopeptide rising from zero by adopting a Monte Carlo method according to the context probability vector of the amino acid to obtain candidate oligopeptides;
and S5, scoring and ranking the candidate oligopeptides, and selecting a plurality of candidate oligopeptides with top ranking results for functional verification.
Preferably, in the above-mentioned automatic excavation method of oligopeptide drug lead based on machine learning, the expression of the N-Gram model is:
wherein F represents a deep neural network, θ represents a parameter to be learned in F,represents the kth word omega k Sequence numbers, v (context (ω) k ) Character omega k Context (ω) of (1) k ) Is a word vector of (a).
Preferably, in the above-mentioned automatic excavating method of oligopeptide drug lead based on machine learning, S4 comprises the following steps:
s41, selecting any amino acid as an initial amino acid;
s42, deducing a context probability vector of the linking amino acid of the oligopeptide to be delayed by using the N-Gram model;
s43, adopting a Monte Carlo method to simulate and generate the linking amino acid according to the context probability vector deduced in the S42;
s44, connecting the linking amino acid with the current oligopeptide to be delayed to obtain a new oligopeptide to be delayed;
s45, circularly executing S42-S44, and carrying out one amino acid delay and rise for each round until the preset ending condition is met, so as to obtain the candidate oligopeptide.
Preferably, in the above-mentioned automatic excavation method of oligopeptide drug lead based on machine learning, the preset end condition in S45 is: the length of the oligopeptide is increased to 10 and the probability of all potential linking amino acids of the current oligopeptide is smaller than the random probability.
Preferably, in the above-mentioned automatic excavation method of oligopeptide drug lead based on machine learning, S5 comprises:
grouping and clustering according to the lengths of the candidate oligopeptides;
and respectively scoring the recommendation degree of the oligopeptides in each cluster in each group of clusters, and selecting a plurality of candidate oligopeptides with top score ranks for functional verification.
Preferably, in the above-mentioned automatic excavating method of oligopeptide medicine lead based on machine learning, in S5, if one or more of the selected candidate oligopeptides ranked at the top have the function verification result satisfying the preset requirement, continuing to perform the function verification on the remaining candidate oligopeptides in the cluster where the oligopeptides whose function verification result satisfies the requirement are located.
Preferably, in the above-mentioned automatic excavating method of oligopeptide drug lead based on machine learning, in S5, the product of context probabilities of the linking amino acids in each round of oligopeptide extension is used as a recommendation score of the candidate oligopeptides, and the candidate oligopeptides are ranked according to the recommendation score.
Preferably, in the above-mentioned automatic excavation method of oligopeptide drug lead based on machine learning, the deep neural network architecture of the N-Gram model is composed of an input layer, a projection layer, a hidden layer, an output layer and a SoftMax layer.
Compared with the prior art, the invention discloses an automatic excavating method of oligopeptide medicine lead based on machine learning, because protein IDRs are the structural basis of protein phase change, and the phase change has strong correlation with occurrence of diseases, the invention takes the IDRs as characteristic areas, can bypass the problem of lack of data sets to a certain extent, and improves the success rate of developing oligopeptide medicines based on small samples.
Meanwhile, the invention considers the difficulty of manually designing the oligopeptide descriptor, and adopts a deep learning method to avoid the problem of manual feature design. The invention also considers that the oligopeptide does not have a long-distance semantic mode and the functional protein set (namely the model training set) is usually smaller, so that the most basic natural language processing model, namely N-Gram, is adopted to carry out semantic mode mining on IDRs so as to learn the amino acid distribution mode of the oligopeptide of a possible patent medicine. The N-Gram model is essentially a conditional probability calculation model, which functions similar to a conventional naive Bayesian model, but performs the calculation of the inter-word conditional probability through a deep neural network, so that the model capacity is larger than that of a traditional machine learning model, and manual characteristics are not required to be designed. The model has simple principle, does not need to rely on a large amount of training data, and the decision probability of each step can be obtained, so that the model is suitable for development of oligopeptide medicines. In addition, the invention simulates the process of the oligopeptide from zero to rise by a Monte Carlo method, so that the de novo design of the oligopeptide medicament is more similar to the natural process.
In general, the invention fills the research blank in the related field by using machine learning for full-automatic excavation of oligopeptide drug leads; meanwhile, the invention has high universality, and for any application scene (indication), the functional oligopeptide which is possibly prepared can be extracted from the functional protein set only by providing the functional protein set which has forward relation with the treatment of the disease.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the automatic excavating method of oligopeptide medicine lead based on machine learning;
FIG. 2 is a flow chart showing the overall process of the present invention for the mining of therapeutic oligopeptides from a collection of functional proteins;
FIG. 3 is a flow chart of the method for obtaining candidate oligopeptides by combining an N-Gram model and a Monte Carlo method;
FIG. 4 (A) - (E) are diagrams showing the results of candidate oligopeptides and experimental verification provided by the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the embodiment of the invention discloses an automatic excavating method of oligopeptide medicine lead based on machine learning, which comprises the following steps:
s1, acquiring a functional protein set, and extracting IDRs of the functional protein set;
s2, constructing an N-Gram model based on a deep neural network;
s3, learning a semantic distribution mode of IDRs based on an N-Gram model to obtain a context probability vector of amino acid of oligopeptide of the possible patent medicine;
s4, simulating the process of oligopeptide rising from zero by adopting a Monte Carlo method according to the context probability vector of the amino acid to obtain candidate oligopeptides;
and S5, scoring and ranking the candidate oligopeptides, and selecting a plurality of candidate oligopeptides with top ranking results for functional verification.
The above steps are further described below.
S1, obtaining a functional protein set, and extracting IDRs of the functional protein set.
Among proteins, there is a region of hot spots called intrinsically disordered regions (Intrinsically Disorder Regions, IDRs) which generally interact with domains of other proteins through peptide motifs (conserved linear peptide fragments of less than 10 in length) within the region, thereby generating allosteric events, and according to existing studies, phase separation caused by the allosteric events of proteins has a strong correlation with the occurrence of diseases, and thus IDRs of proteins are a target of great interest in drug development work.
As shown in FIG. 2, the invention extracts IDRs from the functional protein set as characteristic regions, which can bypass the problem of data set shortage to a certain extent and improve the success rate of developing oligopeptide medicines based on small samples.
S2, constructing an N-Gram model based on the deep neural network.
In S2, an N-Gram model based on a deep neural network is constructed, and the model is used as an unsupervised deep learning model, and can learn the semantic mode through the functional protein IDRs. The N-Gram model is expressed as follows:
wherein F represents a deep neural network, θ represents a parameter to be learned in F,represents the kth word omega k Sequence numbers, v (context (ω) k ) Character omega k Context (ω) of (1) k ) Is a word vector of (a).
Specifically, the deep neural network architecture of the N-Gram model consists of an input layer, a projection layer, a hiding layer, an output layer and a softMax layer. Wherein,
1) Input layer: in this layer, each residue is mapped into a word vector of length m. The word vectors are randomly initialized before training and iterated in the training process.
2) Projection layer: word vectors are mapped into a higher dimensional space to increase the representation capacity of the model.
3) Hidden layer: activation is performed using the tanh function for extracting deep features.
4) Output layer: the output of the hidden layer is mapped to a low-dimensional feature vector, the dimension of the vector being the number of possible results.
5) SoftMax layer: and normalizing the output layer results to obtain the probability of each result.
S3, learning a semantic distribution mode of IDRs based on an N-Gram model to obtain a context probability vector of amino acid of oligopeptide of the possible patent medicine.
The invention obtains the semantic distribution mode (context probability vector) of IDRs of the functional protein set based on N-Gram model learning. The semantic distribution pattern refers to: in a text or sentence, the relative positional relationship between characters is specifically represented by a context probability vector that describes the probability of occurrence of each possible character in a particular context.
S4, simulating the process of the oligopeptide rising from zero by adopting a Monte Carlo method according to the context probability vector of the amino acid, and obtaining the candidate oligopeptide.
After obtaining the context probability vector of the amino acid of the oligopeptide which is possibly used as a drug based on the N-Gram model, the invention introduces a Monte Carlo simulation method for simulating the natural delay and rise process of the oligopeptide. The monte carlo method uses the probability vector obtained from the softmax layer as the probability distribution of the simulator (similar to a random seed) to simulate the process of the oligopeptide rising from zero.
Overall, starting from one amino acid residue, the context probability vector for that character (called character 1) is first calculated using the N-Gram model, and then the next preparatory character (called character 2) is simulated using the monte carlo method. The character 1 and the character 2 generated based on this are spliced to constitute a new character input in the next round (i.e., character 1 in the next round). The above procedure is repeated until the final output length (iteration is terminated when the length reaches 10 due to the definition of the oligopeptide).
Specifically, as shown in fig. 3, the procedure for modeling candidate oligopeptides in combination with the N-Gram model and the monte carlo method is as follows:
s41, selecting any amino acid as an initial amino acid; in the embodiment, 10 amino acids with highest frequency in the functional protein IDRs are selected as initial amino acids;
s42, deducing a context probability vector of the linking amino acid of the oligopeptide to be prolonged by using an N-Gram model;
s43, adopting a Monte Carlo method to simulate and generate the linking amino acid according to the context probability vector deduced in the S42;
s44, connecting the linking amino acid with the current oligopeptide to be delayed to obtain a new oligopeptide to be delayed;
s45, circularly executing S42-S44, and carrying out one amino acid delay and rise for each round until the preset ending condition is met, so as to obtain the candidate oligopeptide.
Wherein, the preset end conditions are two, respectively: the length of the oligopeptide which is prolonged and increased reaches 10; in condition two, the probability of the linking amino acid of the current oligopeptide is less than the random probability, namely 1/20.
And S5, scoring and ranking the candidate oligopeptides, and selecting a plurality of candidate oligopeptides with top ranking results for functional verification.
After the candidate oligopeptides are obtained, grouping and clustering are carried out according to the lengths of the candidate oligopeptides, and then recommendation degree scoring is carried out on the oligopeptides in each cluster in each group of clusters, wherein the recommendation degree score is the product of the context probability of the linking amino acids of the oligopeptides in each cycle of delay. And finally, carrying out functional verification on a plurality of candidate oligopeptides with the top scores, and continuing to carry out functional verification on the rest candidate oligopeptides in the cluster where the oligopeptides with the verification results meeting the requirements are located.
The method of the invention is verified in a specific example as follows:
the invention is realized by dividing the device into 3 parts in practical application, and firstly, the device is needed to pass through UniProt #https:// www.uniprot.org/) The website searches the functional protein set with positive relation with the treatment of a certain disease, and then passes through IUPred2A #https://iupred2a.elte.hu/) And extracting IDRs of the functional proteins, and finally inputting the IDRs into a deep learning model loaded with an N-Gram model and a Monte Carlo method to obtain the required candidate oligopeptide. This example exemplifies the excavation of oligopeptides for the treatment of osteoporosis (osteogenesis):
1. on the UniProt, 171 related functional protein sequences are obtained through the retrieval of 4 keywords, namely "ossification", "osseogenesis", "osteoblast development", "osteoblast differentiation".
2. IDRs of these functional proteins were predicted by IUPred 2A.
3. And inputting the protein IDRs sequence into a deep learning model to obtain candidate oligopeptides.
4. And carrying out grouping clustering according to the lengths of the candidate oligopeptides, selecting 3 oligopeptides with highest scores from a plurality of clusters obtained by each grouping clustering to carry out functional verification, and if the oligopeptides with the top 3 clusters are good in experimental effect, carrying out cell experimental verification on the rest oligopeptides in the clusters.
As shown in fig. 4 (a), a plurality of oligopeptides with good osteogenesis promoting effect were finally obtained, cell experiments were performed on 28 oligopeptides generated by the algorithm, the obtained Alizarin Red (ARS) had an osteogenic color value, the darker the color was, the stronger the osteogenic function was, and animal experiments were performed on the oligopeptides (named AIB 5P) with the best cell experiment effect, and fig. 4 (B) shows a double-labeled femur of hypercalciferin and xylenol orange by animal experiments, with a scale of 100 μm. Fig. 4 (C) shows femoral von kossa staining. The scale bar is 200 μm. FIG. 4 (D) shows an anti-DMP 1 immunohistochemical staining pattern. Fig. 4 (E) shows a representative microct of a mouse femur, with a medial axis scan of longitudinal cross section at the upper portion, on a scale of 1mm. The lower part is the trabecula under the growth plate, and the scale is 500 μm. The oligopeptide can be found to have good bone formation promoting effect.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. An automatic excavating method of oligopeptide medicine lead based on machine learning is characterized by comprising the following steps:
s1, acquiring a functional protein set, and extracting IDRs of the functional protein set;
s2, constructing an N-Gram model based on a deep neural network;
s3, learning a semantic distribution mode of the IDRs based on the N-Gram model to obtain a context probability vector of amino acid of oligopeptide of a possible patent medicine;
s4, simulating the process of oligopeptide rising from zero by adopting a Monte Carlo method according to the context probability vector of the amino acid to obtain candidate oligopeptides, wherein the method specifically comprises the following steps of:
s41, selecting any amino acid as an initial amino acid;
s42, deducing a context probability vector of the linking amino acid of the oligopeptide to be delayed by using the N-Gram model;
s43, adopting a Monte Carlo method to simulate and generate the linking amino acid according to the context probability vector deduced in the S42;
s44, connecting the linking amino acid with the current oligopeptide to be delayed to obtain a new oligopeptide to be delayed;
s45, circularly executing S42-S44, and carrying out one amino acid delay for each round until a preset ending condition is met, so as to obtain candidate oligopeptides;
and S5, scoring and ranking the candidate oligopeptides, and selecting a plurality of candidate oligopeptides with top ranking results for functional verification.
2. The automatic excavation method of oligopeptide drug lead based on machine learning according to claim 1, wherein the expression of the N-Gram model is:
wherein F represents a deep neural network, θ represents a parameter to be learned in F,represents the kth word omega k Sequence numbers, v (context (ω) k ) Character omega k Context (ω) of (1) k ) Is a word vector of (a).
3. The automatic excavating method of oligopeptide medicine lead based on machine learning according to claim 1, wherein the preset ending condition in S45 is: the length of the oligopeptide is increased to 10 and the probability of all potential linking amino acids of the current oligopeptide is smaller than the random probability.
4. The automatic excavation method of oligopeptide drug lead based on machine learning according to claim 1, wherein S5 comprises:
grouping and clustering according to the lengths of the candidate oligopeptides;
and respectively scoring the recommendation degree of the oligopeptides in each cluster in each group of clusters, and selecting a plurality of candidate oligopeptides with top score ranks for functional verification.
5. The method for automatically excavating an oligopeptide drug lead based on machine learning according to claim 4, wherein in S5, if one or more of the plurality of candidate oligopeptides selected to be ranked at the top have functional verification results satisfying a preset requirement, continuing functional verification on the remaining candidate oligopeptides in the cluster where the oligopeptides whose functional verification results satisfy the requirement are located.
6. The method for automatic mining of machine learning based oligopeptide drug leads according to claim 1, wherein in S5, the product of the context probabilities of the joined amino acids in each round of oligopeptide lifting is used as the recommendation score of the candidate oligopeptides, and the candidate oligopeptides are ranked according to the recommendation score.
7. The automatic excavating method of oligopeptide medicine lead based on machine learning according to claim 1, wherein the deep neural network architecture of the N-Gram model consists of an input layer, a projection layer, a hiding layer, an output layer and a SoftMax layer.
CN202111094052.6A 2021-09-17 2021-09-17 Automatic excavating method of oligopeptide medicine lead based on machine learning Active CN113782114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111094052.6A CN113782114B (en) 2021-09-17 2021-09-17 Automatic excavating method of oligopeptide medicine lead based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111094052.6A CN113782114B (en) 2021-09-17 2021-09-17 Automatic excavating method of oligopeptide medicine lead based on machine learning

Publications (2)

Publication Number Publication Date
CN113782114A CN113782114A (en) 2021-12-10
CN113782114B true CN113782114B (en) 2024-02-09

Family

ID=78851947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111094052.6A Active CN113782114B (en) 2021-09-17 2021-09-17 Automatic excavating method of oligopeptide medicine lead based on machine learning

Country Status (1)

Country Link
CN (1) CN113782114B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1771337A (en) * 2004-02-20 2006-05-10 三星电子株式会社 A plynucleotide associated with a colon cancer comprising single nucleotide polymorphism, microarray and diagnostic kit comprising the same and method for diagnosing a colon cancer using the polynucle
CN101663668A (en) * 2007-03-13 2010-03-03 塞诺菲-安万特股份有限公司 Method for producing peptide libraries and use thereof
CN103413067A (en) * 2013-07-30 2013-11-27 浙江工业大学 Abstract convex lower-bound estimation based protein structure prediction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10255273B2 (en) * 2017-06-15 2019-04-09 Microsoft Technology Licensing, Llc Method and system for ranking and summarizing natural language passages

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1771337A (en) * 2004-02-20 2006-05-10 三星电子株式会社 A plynucleotide associated with a colon cancer comprising single nucleotide polymorphism, microarray and diagnostic kit comprising the same and method for diagnosing a colon cancer using the polynucle
CN101663668A (en) * 2007-03-13 2010-03-03 塞诺菲-安万特股份有限公司 Method for producing peptide libraries and use thereof
CN103413067A (en) * 2013-07-30 2013-11-27 浙江工业大学 Abstract convex lower-bound estimation based protein structure prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Joseph M.Cunningham,et al..Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.《Nature methods》.2020,全文. *
卷积神经网络下的Twitter文本情感分析;王煜涵;张春云;赵宝林;袭肖明;耿蕾蕾;崔超然;;数据采集与处理(第05期);全文 *

Also Published As

Publication number Publication date
CN113782114A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
Teney et al. Tips and tricks for visual question answering: Learnings from the 2017 challenge
CN112119412A (en) Neural network of map with attention
CN110070909B (en) Deep learning-based multi-feature fusion protein function prediction method
Sakakibara et al. Stochastic context-free grammers for tRNA modeling
CN113936735A (en) Method for predicting binding affinity of drug molecules and target protein
CN112331273B (en) Multi-dimensional information-based drug small molecule-protein target reaction prediction method
US11651841B2 (en) Drug compound identification for target tissue cells
Wu et al. Convolutional reconstruction-to-sequence for video captioning
CN112599187B (en) Method for predicting drug and target protein binding fraction based on double-flow neural network
CN114913916A (en) Drug relocation method for predicting new coronavirus adaptive drugs
CN115376704A (en) Medicine-disease interaction prediction method fusing multi-neighborhood correlation information
CN110299194B (en) Similar case recommendation method based on comprehensive feature representation and improved wide-depth model
CN115985520A (en) Medicine disease incidence relation prediction method based on graph regularization matrix decomposition
Manzoor et al. Protein encoder: An autoencoder-based ensemble feature selection scheme to predict protein secondary structure
CN115862747A (en) Sequence-structure-function coupled protein pre-training model construction method
CN113782114B (en) Automatic excavating method of oligopeptide medicine lead based on machine learning
Du et al. Species tree and reconciliation estimation under a duplication-loss-coalescence model
Zhou et al. Compositional diversity in visual concept learning
CN115964475A (en) Dialogue abstract generation method for medical inquiry
Poursoltani Disclosing AI inventions
CN114139531A (en) Medical entity prediction method and system based on deep learning
Roussel et al. Mapping of morpho-electric features to molecular identity of cortical inhibitory neurons
KR102187594B1 (en) Multi-omics data processing apparatus and method for discovering new drug candidates
Cao et al. Learning functional embedding of genes governed by pair-wised labels
EP4182928A1 (en) Method, system and computer program product for determining presentation likelihoods of neoantigens

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant