CN114649054A - Antigen affinity prediction method and system based on deep learning - Google Patents

Antigen affinity prediction method and system based on deep learning Download PDF

Info

Publication number
CN114649054A
CN114649054A CN202011506001.5A CN202011506001A CN114649054A CN 114649054 A CN114649054 A CN 114649054A CN 202011506001 A CN202011506001 A CN 202011506001A CN 114649054 A CN114649054 A CN 114649054A
Authority
CN
China
Prior art keywords
oligomer
vector
affinity
immune molecule
immune
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011506001.5A
Other languages
Chinese (zh)
Inventor
刘宇轩
李京宇
刘耿
李波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jinuoyin Biotechnology Co ltd
Original Assignee
Shenzhen Jinuoyin Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jinuoyin Biotechnology Co ltd filed Critical Shenzhen Jinuoyin Biotechnology Co ltd
Priority to CN202011506001.5A priority Critical patent/CN114649054A/en
Publication of CN114649054A publication Critical patent/CN114649054A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medicinal Chemistry (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention belongs to the field of artificial intelligence and discloses an antigen affinity prediction method and a prediction model based on deep learning. The method comprises the following steps: (1) obtaining a training dataset of oligomer-immunomer binding structures, including affinities; (2) for each group of oligomer-immune molecules, respectively representing the sequence of the oligomer, the monomer position of the oligomer and the immune molecules as high-dimensional vectors, and fusing the three high-dimensional vectors into a vector of a combined structure of the oligomer and the immune molecules; (3) training a deep neural network by using the vector of the oligomer-immune molecule combination structure in the training data set and affinity data, and establishing an oligomer-immune molecule affinity prediction model; (4) and inputting the vector of the binding structure of the oligomer to be detected and the immune molecule to be detected into the antigen affinity prediction model, and predicting the binding affinity of the oligomer to be detected and the immune molecule by using the trained deep neural network.

Description

Antigen affinity prediction method and system based on deep learning
Technical Field
The invention belongs to the field of artificial intelligence, and particularly relates to an antigen affinity prediction method and system based on deep learning.
Background
Epitope (AD) refers to a specific chemical group in an Antigenic molecule that determines the specificity of an antigen. The antigen binds to the corresponding antigen receptor on the surface of the lymphocyte through the antigen epitope, thereby activating the lymphocyte and causing immune response. The nature, number and spatial configuration of the antigenic epitopes determine the specificity of the antigen. The size of the epitope is compatible with the antigen binding site of the corresponding antibody. Generally, one polypeptide epitope contains 5-6 amino acid residues; one polysaccharide epitope contains 5-7 monosaccharides; the epitope of one nucleic acid hapten contains 6-8 nucleotides. The specificity of an epitope is determined by all of the residues that make up it, but some of these residues play a greater role in binding to the antibody than others, and are referred to as immunodominant groups.
T cell epitopes are immunogenic polypeptide fragments that must be processed by antigen presenting cells into small peptide molecules and then bound to Major Histocompatibility Complex (MHC) molecules for recognition by T cells. T Cell antigen receptors (TCRs) recognize only small polypeptides of about 10-20 amino acids. These epitopes are composed of amino acids linked in sequence, and are mainly present in the hydrophobic region of the antigenic molecule, called linear epitopes or sequence epitopes. T cells generally do not recognize conformational epitopes of native antigens since they only recognize processed epitopes.
The methods for analyzing epitope are various, including chemical lysis, biological enzymolysis, nuclear magnetic resonance spectroscopy (NMR), Surface Plasmon Resonance (SPR), hybrid peptide lysis, polypeptide library construction, and theoretical measurement. At present, with the development of computer technology, especially the popularization of artificial intelligence technology, the epitope analysis method with highest flux, lowest cost and shortest period is formed by directly screening and predicting the epitope from biological big data.
In terms of T cell epitopes, the conventional scheme of theoretical prediction is to extract a plurality of key characteristics such as antigen peptide sequences, antigen peptide lengths, antigen peptide expression levels, antigen peptide thermostability and the like by learning the data of combination of MHC molecules and antigen peptides, thereby forming a machine learning model, and combining a high-throughput sequencing technology to theoretically predict potential antigen peptides of unknown proteins or genes. Currently, several companies at home and abroad have developed antigen prediction software successively, such as netMHCpan software developed by Denmark science and technology university, EDGE software developed by Gritstone Oncology company, EPIP software developed by GmbH Biotechnology, Inc. of Wuhan Hua Dajinuoyin, and the like.
With the continuous development and mutual promotion of modern bioinformatics, molecular biology and molecular immunology, epitope research and application thereof have made great progress and show great application potential. The application of the antigen epitope is mainly embodied in three aspects, namely disease diagnosis, vaccine development and disease treatment.
In disease diagnosis, the key to the efficiency of epitope diagnostic methods is sensitivity and specificity. The epitope is a basic unit for stimulating immune reaction, the single epitope stimulates specific single immune reaction, and the multiple epitopes often stimulate multiple mixed immune reaction, thereby generating nonspecific mixed antibody, sensitized lymphocyte or effector. Therefore, the research of epitope peptide in disease diagnosis focuses on the selection of specific epitope peptide, thereby achieving better diagnosis efficiency.
In vaccine development, conventional vaccines each contain a large number of epitopes, including protective, inhibitory or null epitopes. The vaccine can achieve the desired protective effect only if the immune response induced by the protective epitope dominates. Therefore, in the research of the prior antiviral vaccine, how to obtain the protein epitope which has strong immunogenicity, strong sequence conservation and plays a key role in virus invasion is a technical difficulty.
In the disease treatment, the immune response induced by the epitope has high specificity and pertinence, and can be used for the immunotherapy of tumors, infectious diseases and autoimmune diseases. Immunotherapy, which activates a cytotoxic response to an antigen by enhancing the patient's own immune system, has proven to be an effective strategy in recent years. This strategy utilizes multiple antigens on the cell surface formed by various muteins from intracellular proteasomal cleavage. These polypeptides bind to HLA molecules, forming polypeptide-HLA complexes that are presented to T Cell Receptors (TCRs). If the TCR can recognize the polypeptide-HLA complex, CytoToxic T Lymphocyte (CTL) can be activated, the CTL is a subunit of leukocyte, is a specific T cell, specially secretes various cytokines to participate in immunization, has killing effect on certain viruses, tumor cells and other antigen substances, and forms an important defense line of the body against virus and tumor immunity together with natural killer cells. The first step in cytotoxic T lymphocyte therapy is to predict the binding affinity of an antigen to an HLA molecule. The current rapidly evolving Neoantigen (Neoantigen) therapy in the field of tumor therapy is a good application of epitopes in the field of disease therapy. Foreign companies have developed research on treating malignant tumors with new antigens and have entered clinical experimental stages, such as BioNTech, Neon Therapeutics, Gritstone Oncology, etc.
At present, there are four methods for predicting the affinity of antigen and HLA molecule, which are a structure-based method, a machine learning-based method, a location weight matrix (PSSM) -based method, and a combinatorial method, respectively. Wherein the machine learning-based approach learns a high-dimensional classification plane from known binding and non-binding peptide information to predict the affinity of polypeptide binding. Methods of machine learning are capable of accurately predicting the affinity of polypeptides to specific HLA alleles, such as HLA-a 0201, HLA-a 0101 and HLA-B0702 [1,2], and are therefore frequently used in many studies [3-5 ]. The most prevalent of these machine learning algorithms is the pan-specific affinity algorithm, which takes as input both the molecular amino acid sequence of HLA and the amino acid sequence of polypeptide, the affinity prediction output is obtained by a machine learning algorithm, currently the most common algorithm in the industry is NetMHCPan [6], the method characterizes HLA molecule by a segment of 34-bit amino acid Pseudo Sequence (Pseudo Sequence), then preprocesses the Pseudo Sequence and polypeptide short Sequence (Peptide Sequence), taking the parallel preprocessing result as an input characteristic, obtaining polypeptide-HLA molecular affinity prediction through a BP neural network model (BP), this approach models each polypeptide-HLA molecule pair as a unique input sequence, the mapping between the sequence and the polypeptide-HLA affinity value can be learned by a Model, and the method is called a Pan-specific training strategy (Pan Model). NetMHCPan works show that a single BP model does not work well, and in order to obtain the best effect, NetMHCPan generally simultaneously aggregates and learns a large number of models.
In recent years, a plurality of affinity prediction models based on deep learning appear in the industry, wherein a representative algorithm is DeepSeqPan [7], the method respectively learns the characteristics of polypeptide and HLA through two sets of independent convolution networks, and predicts the affinity output through a neural network after combining the characteristics; similar ideas are used for MHCSeqNet [8] which uses Gated Recurrent Units (GRUs) instead of convolutional neural networks in order to learn variable-length data of various polypeptides, and AI-MHC [9] which processes different polypeptides by Padding (Padding) to realize variable-length input learning using more efficient convolutional neural networks; similar work also has ACME [10], which differs from DeepseqPan in that splicing each HLA computation layer splices a polypeptide characterization vector; in addition to using natural language processing techniques, convMHC [11] models the affinity model using a class of image processing ideas: and establishing the position physicochemical data of the polypeptide and the HLA molecules into a plurality of 2-dimensional data matrixes, and predicting an affinity model by adopting a 2-dimensional convolutional neural network.
The main disadvantages of the prior art are that: 1) there is a lack of an efficient way to model expressed polypeptide-HLA complexes. In the method, the sequence characteristics of the polypeptide and the HLA molecule are learned through an independent neural network and then are directly spliced into a whole, and the structure of a complex generated once the polypeptide and the typing are presented is not considered in model design; 2) the polypeptide sequence is short, the effect of directly using a deep network is poor, and related work mostly adopts a shallow network. [6] A large number of traditional neural network aggregation are adopted to make up for the defect of weak learning ability of a single model; [7,9,10] all use shallow convolutional networks for learning polypeptide features; 3) certain methods do not take into account polypeptide length diversity. [11] Although the deep network can be used for polypeptide learning, the design only can consider fixed length and cannot realize variable length; the same problem occurs in [6 ].
Accordingly, the present invention requires an improved deep learning based affinity prediction model.
Disclosure of Invention
Based on the problems in the prior art, the invention aims to provide an improved affinity prediction model based on deep learning and a construction method thereof.
Accordingly, in a first aspect, the present invention provides a method for deep learning antigen affinity, the method comprising:
(1) obtaining a training data set of an oligomer-to-biomolecule binding structure, the training data set comprising an affinity of oligomer-to-biomolecule binding;
(2) for each group in the training data set of the oligomer to be detected, the immune molecule and the oligomer-immune molecule combination structure, respectively representing the sequence of the oligomer, the monomer position of the oligomer and the immune molecule as high-dimensional vectors, and fusing the three high-dimensional vectors into a vector of the oligomer-immune molecule combination structure;
(3) training a deep neural network by using the vector of the oligomer-immune molecule combination structure in the training data set and affinity data, and establishing an oligomer-immune molecule affinity prediction model;
(4) and inputting the vector of the binding structure of the oligomer to be detected and the immune molecule to be detected into the antigen affinity prediction model, and predicting the binding affinity of the oligomer to be detected and the immune molecule by using the trained deep neural network.
In one embodiment, the oligomer is selected from the group consisting of: polypeptides, polysaccharides, and nucleic acid haptens.
In one embodiment, the oligomers are vectors of variable length.
In one embodiment, the immune molecule is a major histocompatibility complex molecule, such as a human leukocyte antigen, including but not limited to class I and class II human leukocyte antigens.
In one embodiment, the affinity of the oligomer-to-immune molecule binding is expressed as IC of oligomer-to-immune molecule binding50The value is obtained.
In one embodiment, in (2), a monomer vector representation that reflects the relationship between any two monomers is obtained by word embedding.
In one embodiment, the input oligomer sequence in (2) is (x)1,x2,…,xn) Wherein x isiIs an amino acid character, xiE is e { A, C, D,. eta., Y }, and the mapping result is that x is defined asiInto an m-dimensional vector zi∈RmR is real number space, sequence (x)1,x2,...,xn) Is converted into (z)1,z2,...,zn)。
In one embodiment, the monomer position in (2) characterizes a vector: mapping the tag for each amino acid position of the polypeptide to a high-dimensional continuous numerical space: the inputs are unified into a vector (1, 2.. times.n), each location i is mapped by the fully-connected neural network into a vector pi∈RmR is a real number space, i.e., (1, 2.. times.n) output is (p)1,p2,...,pn)。
In one embodiment, the sequence of oligomers and the high dimensional vector of monomer positions of the oligomers in (2) are added to form a characterization vector for the oligomers.
In one embodiment, the input for the vector of immune molecules in (2) is (y)1,y2,...,yk),yiE { A, C, D,. quadrature.Y }, with the output being (z'1,z′2,...,z′n),z′i∈RmFrom the k-dimensional original amino acid vector, a vector of dimensions n x m, which is the same format as the oligomer, is mapped.
In one embodiment, the characterization vector of the oligomer and the vector of the immune molecule are subjected to tensor multiplication, tensor addition, or attention operations in (2).
In one embodiment, in (3), the deep neural network comprises a deep convolutional layer and a multi-layer fully-connected layer, the deep convolutional layer is used for extracting feature vectors of oligomer-immune molecules in the training data set, the feature vectors are input into the multi-layer fully-connected layer, the multi-layer fully-connected layer maps the feature vectors into affinity values, and network parameters are obtained through a back propagation algorithm.
In one embodiment, for each deep convolutional layer, copying 3 parts of input, learning two parts through different convolutional networks, and performing Sigmoid normalization operation on one convolution output to obtain two vectors A and B, and performing bit-wise multiplication to obtain a residual error A multiplied by B; the other is added to the residual of A B.
In one embodiment, a global pooling layer θ is added after the last convolutional layer: rn×p→RpAnd p is the output dimension of the last layer to obtain a feature vector.
In one embodiment, in the multi-layered fully-connected layer, the feature vectors obtained from the deep convolutional layer are mapped to affinity values by the fully-connected layer as follows:
1) performing linear transformation on the output characteristic vector x of the deep convolutional layer to serve as an input vector y of the full-link layer, namely Wx + b;
2) and carrying out nonlinear conversion on the linear conversion result by using a linear rectification function:
Figure BDA0002844965650000051
1) and 2) forming a layer of mapping network, and outputting the affinity prediction result through a plurality of layers of mapping networks.
In one embodiment, in (4), the trained deep neural network comprises a deep convolutional layer and a multi-layer fully-connected layer, wherein the deep convolutional layer is used for extracting the feature vector of the oligomer-immune molecule to be detected, and inputting the feature vector into the multi-layer fully-connected layer to predict the binding affinity of the oligomer-immune molecule to be detected.
In a second aspect, the present invention provides a method for building an antigen affinity prediction model based on deep learning, the method comprising building an oligomer-immunomer affinity prediction model by (1) to (3) of the method of the first aspect of the present invention.
In a third aspect, the present invention provides an antigen affinity prediction model established using the method of the second aspect of the invention, the antigen affinity prediction model comprising: a data acquisition module, an input vector establishing module and a model establishing module,
the data acquisition module is used for acquiring a data set of the oligomer-immune molecule combination structure, wherein the data set comprises the affinity of oligomer-immune molecule combination;
the input vector establishing module is used for mapping the sequence of the oligomer, the monomer position of the oligomer and the immune molecule into high-dimensional vectors respectively by using a data set of the oligomer-immune molecule combination structure, and fusing the three high-dimensional vectors into a vector of the oligomer-immune molecule combination structure;
the model building module is used for training a deep neural network by using the vector of the oligomer-immune molecule combination structure and affinity data to build an antigen affinity prediction model.
In one embodiment, in the model building module, the deep neural network comprises a deep convolutional layer and a multi-layer fully-connected layer, the deep convolutional layer is used for extracting feature vectors of oligomer-immune molecules in a training data set, the feature vectors are input into the multi-layer fully-connected layer, the multi-layer fully-connected layer maps the feature vectors into affinity values, and network parameters are obtained through a back propagation algorithm.
In one embodiment, for each deep convolutional layer, copying 3 parts of input, learning two parts through different convolutional networks, and performing Sigmoid normalization operation on one convolution output to obtain two vectors A and B, and performing bit-wise multiplication to obtain a residual error A multiplied by B; the other is added to the residual of A B.
In one embodiment, a global pooling layer θ R is added after the last convolutional layern×p→RpAnd p is the output dimension of the last layer to obtain a feature vector.
In one embodiment, in the multi-layered fully-connected layer, the feature vectors obtained from the deep convolutional layer are mapped to affinity values by the fully-connected layer by the following method:
1) performing linear transformation on the output characteristic vector x of the deep convolutional layer to serve as an input vector y of the full-link layer, namely Wx + b;
2) and carrying out nonlinear conversion on the linear conversion result by using a linear rectification function:
Figure BDA0002844965650000071
1) and 2) forming a layer of mapping network, and outputting the affinity prediction result through a plurality of layers of mapping networks.
According to the invention, the oligomer-immune molecule composite structure is modeled by a deep learning technology, so that the expression of a deep learning model is enhanced; the deep model of oligomer short sequences is realized by improving the deep learning technology, and the complexity and the learning capacity of a single neural network are enhanced.
Drawings
Fig. 1 shows the overall structure of an antigen affinity prediction model according to one embodiment of the present invention.
Fig. 2 shows an input network structure of an antigen affinity prediction model according to an embodiment of the present invention.
FIG. 3 shows a computational network structure of an antigen affinity prediction model according to one embodiment of the invention.
Detailed Description
The most important two-step design of the deep learning technology is the design of an input feature expression Layer (Embedding Layer) and the design of a deep network structure of a calculation Layer. The invention considers the expression layer design of the compound and realizes the deep design of the polypeptide calculation layer by an effective means, thereby improving the learning capacity of the model, so that the model fusion learning amount is less than that required by the scheme adopted by netMHCPan commonly used in the industry, and the efficiency of the process is improved; the invention is not limited to fixed-length sequences and has wider use scenes.
In the present invention, the deep residual network model is applicable to the correlation prediction of affinity, antigen presentation and immunogenicity models of all polypeptide sequences; the method is also suitable for other short gene sequence scenes, including but not limited to MHC molecule sequence analysis, DNA high-throughput sequencing fragment analysis and the like. The length of the polypeptide sequence is typically 9 or 10 amino acids, but other lengths are also possible.
In the present invention, tensor addition and attention operations can also be used to integrate the vector of polypeptides and the vector of immune molecules into one vector. The attention operation changes the tensor product of the polypeptide and the typing into the inner product operation, then the nonlinear normalization is carried out, and the product operation of the attention weight matrix and the polypeptide/typing tensor is calculated. Tensor addition, however, is less effective than tensor multiplication; attention operations are common in natural language processing models, such as the BERT [12] of Google. The polypeptide and HLA molecule sequence can also be stereoscopically combined, or the polypeptide and the HLA molecule are spliced.
The method for establishing the antigen affinity prediction model based on deep learning comprises the following steps: (1) obtaining a training data set of an oligomer-to-biomolecule binding structure, the training data set comprising an affinity of oligomer-to-biomolecule binding; (2) for each group in the training data set of the oligomer-immune molecule combination structure, respectively representing the sequence of the oligomer, the monomer position of the oligomer and the immune molecule as high-dimensional vectors, and fusing the three high-dimensional vectors into a vector of the oligomer-immune molecule combination structure; (3) and training a deep neural network by using the vector of the oligomer-immune molecule combination structure in the training data set and affinity data to establish an oligomer-immune molecule affinity prediction model.
Preferably, the deep neural network includes a deep convolutional layer and a multi-layer fully-connected layer, the deep convolutional layer is used for extracting feature vectors of oligomer-immune molecules in a training data set, the feature vectors are input into the multi-layer fully-connected layer, the multi-layer fully-connected layer maps the feature vectors into affinity values, and network parameters are obtained through a back propagation algorithm.
In one example, in a multi-layer convolutional layer, each convolutional neural network learns the last layer of residuals: the i-th layer input is marked as X, the i-th layer convolution network is conviThen the output of the ith layer is X + convi(X); copying 3 parts of input, learning two parts of input through different convolution networks, and carrying out Sigmoid normalization operation on one convolution output to obtain two vectors A and B, and carrying out bit-wise multiplication to obtain a residual error (A multiplied by B); the other is added to the residual of A B.
In one example, in a multi-layered fully-connected layer, feature vectors learned by a computational network are mapped to affinity values over the fully-connected network by the following method:
1) performing linear transformation on the input of the mapping network (namely the output of the computing network), and converting an input vector x of the mapping layer into y which is Wx + b;
2) and carrying out nonlinear conversion on the linear conversion result by using a linear rectification function (relu):
Figure BDA0002844965650000081
1) and 2) forming a layer of mapping network, and outputting the affinity prediction result through multiple layers.
The invention is exemplified below by the modeling and application of polypeptide-HLA molecules. The polypeptide-HLA molecule dataset used in the present invention is derived from IEDB published data.
In the invention, a Pan-specific Model (Pan Model) needs to map a polypeptide and HLA molecule sequence text into a unique vector, so that a neural network Model establishes a function mapping relation between the vector and a prediction index; in addition, the mapping process requires modeling of the binding structure between the polypeptide and the HLA molecule.
The deep neural network model body of the invention is composed of three parts, and refer to fig. 1: the input network, the computing network and the mapping network form a neural network whole. In the model training process, firstly, network parameters are obtained through a training process, and the network parameters are obtained through a back propagation algorithm by using vector data of an oligomer-immune molecule combination structure with known affinity in the training process; in the testing/predicting process, the vector of the oligomer-immune molecule combination structure to be predicted is input into an input network, and the predicted affinity value is output through a computing network and a mapping network. The following is a description of specific functions.
Inputting a network:
the input network has the functions of establishing a high-dimensional vector by the polypeptide and the HLA molecules through a neural network, modeling a polypeptide-HLA molecule combination structure and realizing the universal specificity input of the model. Different polypeptide + HLA molecules are mapped into different vectors, and it is a high dimension to fully capture the subtle differences of different input sequences. The inventors input the HLA molecule pseudo sequence, which is more accurate.
The specific design refers to fig. 2. Mapping each amino acid molecule of the polypeptide sequence and the HLA molecule sequence into a higher-dimensional vector A by a full-connection network; marking the polypeptide position as 1-polypeptide length, and mapping the position sequence to the same dimension through another network; and expanding the vectors into the tensor in the graph, and then carrying out tensor operation to obtain the final input tensor mixed with the information of the polypeptide and the HLA molecules. Exemplified in fig. 2 is a 15-polypeptide length, 128-dimensional mapping output dimension. The top 4 and bottom 3 full-link circles in FIG. 2 represent the mapping of each amino acid molecule from the character space A-Z to the high-dimensional continuous numerical space, which is a commonly used full-link layer representation, with the top circle representing the full-link input dimension, i.e., the amino acid character space, and the bottom circle representing the full-link output dimension, i.e., the higher-dimensional vector. Taking a 15-length polypeptide as an example, each position is mapped from an amino acid character into a continuous high-dimensional vector, here also exemplified as m. 15 × 128 × 1 is a process of expanding the vector into a tensor in order to complete tensor addition and tensor multiplication in the figure. The principle of tensor operation is that the next two dimensions are operated as a matrix, the previous dimension is regarded as a Batch Size (Batch Size), the addition of the two tensors of 15 × 128 × 1 is still 15 × 128 × 1, and the multiplication with the tensors of 15 × 1 × 128 results in 15 × 128 × 128.
The language model [12-15] can be used to work on Word Embedding (Word Embedding) of text sequences, mapping the original input data into three high-dimensional vectors through a neural network. Here, the polypeptide length n, the HLA molecule length k, and the output dimension are mapped to m dimensions as an example:
1) the polypeptide characterization vector is used as a variable-length vector by fully connecting a neural network, and each amino acid molecule of the vector is mapped to a high-dimensional continuous numerical space from an A-Z character space. The input polypeptide sequence is (x)1,x2,...,xn) Wherein x isiIs an amino acid character, xiE is e { A, C, D,. eta., Y }, and the mapping result is that x is defined asiInto an m-dimensional vector zi∈RmM is a parameter requiring cross-validation debugging, R is a real number space, sequence (x)1,x2,...,xn) Is converted into (z)1,z2,...,zn);
2) Peptide position characterization vector: the labeling of each amino acid position of the polypeptide is mapped to a high-dimensional (e.g., m-dimensional) continuous numerical space by fully-connected neural networks: the inputs are unified into a vector (1, 2.. times.n), each position i is mapped to a vector p by the fully-connected neural networki∈RmR is a real number space, i.e., (1, 2.. times.n) output is (p)1,p2,,..,pn);
3) HLA molecule characterization vector: the HLA molecule corresponding to each polypeptide also needs to be mapped to a high-dimensional continuous numerical space, and considering that each position of the HLA molecule and the polypeptide is an amino acid molecule and has the same meaning, the neural network model used for mapping is the same as the polypeptide characterization vector. The input of HLA molecular vector is (y)1,y2,...,yk),yiE.g., (z ') and output is'1,z′2,...,z′n),z′i∈RmMapping the k-dimensional original amino acid vector into an n multiplied by m-dimensional vector with the same format as the polypeptide, wherein the process comprises the following steps:
a) through 1) spiritWill (y) through the network1,y2,...,yk) Is mapped to (y'1,y′2,...,y′k),y′j∈Rm
b) To (y'1,y′2,...,y′k) Taking an average to obtain
Figure BDA0002844965650000101
Replication of n parts gives (z'1,z′2,...,z′n),
Figure BDA0002844965650000102
Figure BDA0002844965650000103
Two mechanisms can be used in fusing the three vectors, one is to add the three vectors like the traditional language model. However, unlike the language, the genomic sequence, polypeptides and HLA molecules have a spatial binding structure that requires capture by more complex mechanisms, and therefore another mechanism can be used:
1) calculating the characterization vector A ═ (z) for the polypeptide1+p1,z2+p2,...,zn+pn);
2) For each element a of AiTensor expansion (kronecker product) is performed: rm→Rm×1
3) Characterization vector of HLA molecule, denoted as B, for each element B of BiPerform tensor expansion (kronecker product): rm→R1×m
4) Bit-wise tensor product of A, B
Figure BDA0002844965650000104
Wherein:
Figure BDA0002844965650000105
5) to pair
Figure BDA0002844965650000111
A flatting (Flatten) operation is performed. I.e. for each bit element in n bits
Figure BDA0002844965650000112
Converting the result of formula (1) to m2Dimension vector (a)i1bi1,...,ai1bim,...,aimbi1,...,aimbim). This vector is the final input to the computational network.
The polypeptide and the HLA molecule are mapped to the same high-dimensional space through a neural network, and the stereo structure mapping of the sequences of the polypeptide and the HLA molecule is generated through tensor product, so that the input is provided for a high-efficiency pan-specific affinity model.
For training data, a learning process needs to be completed by combining a computing network and a mapping network as a whole. And inputting the vector and the affinity value of the oligomer-immune molecule combination structure into the calculation network and the mapping network for training through a deep convolutional neural network of the calculation network and a full connection network of the mapping network, thereby completing the learning of the parameters of the calculation network and the mapping network.
A computing network:
the computing network comprises a multilayer convolutional neural network, and the function of the computing network is to effectively extract short sequence features through a deep convolutional neural network to obtain feature vectors. Because the input polypeptides and HLA molecules are of very short length (HLA class I corresponds to polypeptides typically less than 15 bits, HLA class II is typically less than 26 bits), the low resolution data can result in an overfitting of the deep convolutional neural network, making the deep convolutional neural network less effective. To increase the complexity of deep convolutional neural networks to enhance learning, improve the effectiveness of using deep convolutional neural networks, residual neural networks are used in combination with Gated Linear Unit (Gated Linear Unit) activation (see [15 ]]) The generalization ability of deep convolutional neural networks (for example, at least 5 layers and more, preferably 10 layers and more, for example, more than 15 layers, even more than 20 layers) is significantly enhanced, the complexity is increased, and the algorithm learning ability is enhanced, and the specific design refers to fig. 3. Copy the input 3 parts by twoLearning through different convolution networks, and carrying out Sigmoid normalization operation on one convolution output to obtain two vectors A and B, and carrying out bit-wise multiplication to obtain a residual error (A multiplied by B in the figure); the other is added to the residual of A B. By using the above residual network design (fig. 3), each layer of convolutional neural network learns the last layer of residual: the i-th layer input is marked as X, the i-th layer convolution network is conviThen the output of the ith layer is X + convi(X). By learning residual errors instead of learning an input strategy, an overfitting effect caused by adding layers to a convolutional neural network is effectively reduced, a final computing network comprises 10 layers of convolutional neural networks, and meanwhile, a gated linear unit is used as an activation mechanism, and the whole process is as follows:
1) for input X ═ X1,X2,...,Xn) Respectively input two convolutional neural networks convAAnd convBThe output dimensions of the two convolutional neural networks are consistent with the input X;
2) respectively calculating to obtain two convolutional neural network outputs convA(X) and convB(X), performing Sigmoid mapping sigma on the output of one convolutional neural network according to bits: r → [0, 1]For example for convB
Figure BDA0002844965650000121
3) Bitwise multiplying convA(X)iAnd σ (conv)B(X)i) To obtain (conv)A(X)1×σ(convB(X)1),convA(X)2×σ(convB(X)2),...,convA(X)n×σ(convB(X)n) And X ═ X (X) —1,X2,...,Xn) Vector addition is carried out to obtain the output of a calculation layer, and the output is transmitted to the next layer after passing through a Dropout layer;
considering that in practice the polypeptide length is variable, and therefore the output length after passing through all convolutional layers is also variable, a global pooling layer θ is added after the last convolutional neural network layer: r isn×p→RpAnd p is the output dimension of the last layer to obtain a feature vector.
Aiming at the deep convolutional neural network model of the amino acid sequence, the generalization of the deep convolutional residual network structure is improved by learning the residual error output by each layer of convolutional neural network instead of the design of an output body, so that a single model can achieve good accuracy, and the overall process efficiency is obviously improved.
Mapping the network:
the mapping network comprises multiple layers, the function of the mapping network is to map the feature vectors learned by the computing network into affinity values through a fully-connected network, and the overall process is as follows:
1) performing linear transformation on the input of the mapping network (namely, the feature vector of the output of the computing network), and converting the input vector x of the mapping network into y which is equal to Wx + b;
2) and carrying out nonlinear conversion on the linear conversion result by using a linear rectification function (relu):
Figure BDA0002844965650000122
1) and 2) forming a layer of mapping network, and outputting the affinity prediction result through a plurality of layers of mapping networks.
For test/prediction data, the polypeptide + HLA molecules to be tested/predicted are input into the deep convolutional neural network of the trained computational network, and the feature vectors of the oligomer-immune molecules are extracted according to the parameters of the network. Inputting the mapping network, the parameters of the mapping network, and the multi-layer fully-connected layer of the mapping network mapping the feature vector into an affinity value. In one example, the affinity value can be IC50
The effect of the antigen affinity prediction model of the present invention is verified by examples below.
Examples
The algorithm uses the following software environment without video card resources but needs to be configured:
Python:3.6.5
Anaconda:4.5.4
Tensorflow:1.3.0
Cuda:8.0
Cudnn:6.0
the inventor, when in use, by executing the prediction script provided by the software: py path of the file to be predicted, a prediction result of each experimental data is generated. The file format to be predicted needs to be arranged into "mhc, peptide" format. The inventor has put an example file in the code, test _ batch.csv under the data folder, as follows:
Figure BDA0002844965650000131
the test data is a pMHC (9, 10mer) affinity quantitative data set (http:// www.iedb.org /) verified by an IEDB database MHC ligand assay method, and typing comprises 113 HLA types such as A0101, A0201, A2402 and the like. Using this script will generate a predict. This file records the polypeptide, typing type and software predicted IC for each sample50The value is obtained. The results generated using the IEDB data test are as follows:
Figure BDA0002844965650000132
the inventors compared the algorithm results of the present invention with the results of NetMHCPan on this data. IC (integrated circuit)50Converted into negative and positive polypeptides of more concern in actual development, so that the evaluation indexes used are Area Under ROC Curve (AUC) and f1 scores commonly used in machine learning classification problems, the model only uses one neural network model, and NetMHCPan uses 200 model fusion learning [ netpan3(2)]. The comparative results are as follows:
Figure BDA0002844965650000133
Figure BDA0002844965650000141
the result shows that the accuracy of using 200 models by NetMHCPan is achieved by using a single model, the execution efficiency of the model is high, on the NVIDIA p6000 video card, the time of training 11 thousands of data is about 30 minutes, and the time of predicting 2 thousands of 4 thousands of data of test files test _ batch.csv is about 1 minute; the polypeptide length distribution of the data set is 8-11 bits, and the HLA distribution of the training set and the test set is inconsistent.
By this embodiment, the method of the present invention has the following four advantages: 1) by designing a short peptide-HLA molecule compound characterization layer and designing a short peptide deep residual error network, the method provided by the invention improves the affinity prediction accuracy of a single neural network model; 2) by improving the accuracy of a single model, the method of the invention also reduces the execution time overhead of the model. On the inventor's machine (NVIDIA p6000 video card), the training time of 116559 pieces of data is only half an hour; 3) the methods of the invention allow for variable length input polypeptides; 4) because the invention develops a pan-specific algorithm, it can be used for new typing, i.e. rare typing that has never been seen in the previous training set, and can also give predictions by the algorithm of the invention.
Reference documents:
[1]Nielsen M,Lundegaard C,Worning P,et al.Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.Protein Sci 2003;12:1007-17.
[2]Zhang GL,Ansari HR,Bradley P,et al.Machine learning competition in immunology-prediction of HLA class I binding peptides.J Immunol Meth 2011;374:1-4.
[3]Carreno BM,Magrini V,Becker-Hapak M et al.Cancer immunotherapy.A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specifc T cells.Science 2015;348L803-8.
[4]Walter S,Weinschenk T,Stenzl A,et al.Multipeptide immune response to cancer vaccine IMA901 after single-dose cyclophosphamide associates with longer patient survival.Nat Med 2012;18:1254-61.
[5]Yadav M,Jhunjhunwala S,Phung QT,et al.Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing.Nature 2014;515:572-76.
[6]Nielsen,M.,Andreatta,M.NetMHCpan-3.0;improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets.Genome Med 8,33(2016)doi:10.1186/s13073-016-0288-x.
[7]Liu,Z.,Cui,Y.,Xiong,Z.et al.DeepSeqPan,a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction.Sci Rep 9,794(2019)doi:10.1038/s41598-018-37214-1.
[8]PhloyPhisut,P.,Pomputtapong,N.,Sriswasdi,S.et al.MHCSeqNet:a deep neural network model for universal MHC binding prediction.BMC Bioinformatics 20,270(2019)doi:10.1186/s12859-019-2892-4.
[9]Sidhom,J.-W.a.P.D.a.B.A.AI-MHC:an allele-integrated deep learning framework for improving Class I&Class II HLA-binding predictions.bioRxiv,p.318881(2018).
[10]Yan Hu,Ziqiang Wang,Hailin Hu,FangPing Wan,Lin Chen,Yuanpeng Xiong,Xiaoxia Wang,Dan Zhao,Weiren Huang,Jianyang Zeng,ACME:pan-specific peptide-MHC class I binding prediction through attention-based deep neural networks,Bioinformatics.
[11]Han,Youngmahn&Kim,Dongsup.(2017).Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction.BMC Bioinformatics.18.585.10.1186/s12859-017-1997-x.
[12]Devlin J,Chang M W,Lee K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv preprint arXiv:1810.04805,2018.
[13]Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[C]//Advances in neuralinformation processing systems.2017:5998-6008.
[14]Gehring J,Auli M,Grangier D,et al.Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70.JMLR.org,2017:1243-1252.
[15]Dauphin Y N,Fan A,Auli M,et al.Language modeling with gated convolutional networks[C]//Proceedings of the 34th International Conference on Machine Leaming-Volume 70.JMLR.org,2017:933-941.

Claims (16)

1. a method of predicting affinity of an oligomer for an immune molecule based on deep learning, the method comprising:
(1) obtaining a training data set of an oligomer-to-biomolecule binding structure, the training data set comprising an affinity of oligomer-to-biomolecule binding;
(2) for each group in the training data set of the oligomer to be detected, the immune molecule and the oligomer-immune molecule combination structure, respectively representing the sequence of the oligomer, the monomer position of the oligomer and the immune molecule as high-dimensional vectors, and fusing the three high-dimensional vectors into a vector of the oligomer-immune molecule combination structure;
(3) training a deep neural network by using the vector of the oligomer-immune molecule combination structure in the training data set and affinity data, and establishing an oligomer-immune molecule affinity prediction model;
(4) and inputting the vector of the binding structure of the oligomer to be detected and the immune molecule to be detected into the antigen affinity prediction model, and predicting the binding affinity of the oligomer to be detected and the immune molecule by using the trained deep neural network.
2. The method of claim 1, the oligomer being selected from the group consisting of: polypeptides, polysaccharides and nucleic acid haptens.
3. The method of claim 1 or 2, wherein the immune molecule is a major histocompatibility complex molecule, such as a human leukocyte antigen, including but not limited to class I and class II human leukocyte antigens.
4. The method of any one of claims 1 to 3, wherein in (2) a representation of a monomer vector reflecting the relationship between any two monomers is obtained by word embedding.
5. The method of any one of claims 1-4, wherein in (2) the input oligomer sequence is (x)1,x2,...,xn) Wherein x isiIs an amino acid character, xiE { A, C, D, aiInto an m-dimensional vector zi∈RmR is real number space, sequence (x)1,x2,...,xn) To (z)1,z2,...,zn)。
6. The method of any one of claims 1-5, wherein in (2) the monomer position characterizes a vector: mapping the tag for each amino acid position of the polypeptide to a high-dimensional continuous numerical space: the inputs are unified into a vector (1, 2.. times.n), each location i is mapped by the fully-connected neural network into a vector pi∈RmR is a real number space, i.e., (1, 2.. times.n) output is (p)1,p2,...,pn)。
7. The method of any one of claims 1-6, wherein in (2) the sequence of oligomers and the high dimensional vector of monomer positions of the oligomers are added to form a characterization vector for the oligomers.
8. The method of any one of claims 1-7, wherein in (2) the input of the vector of immune molecules is (y)1,y2,...,yk),yiE.g., (z ') and output is'1,z′2,...,z′n),z′i∈RmFrom the k dimensionThe starting amino acid vector maps to the same n × m dimensional vector as the oligomer format.
9. The method of claim 8, wherein the characterization vector of the oligomer and the vector of the immune molecule are subjected to tensor multiplication, tensor addition, or attention operations in (2).
10. The method according to any one of claims 1 to 9, wherein in (3), the deep neural network comprises a deep convolutional layer and a multi-layered fully-connected layer, the deep convolutional layer is used for extracting feature vectors of oligomer-immune molecules in the training data set, the feature vectors are input into the multi-layered fully-connected layer, the multi-layered fully-connected layer maps the feature vectors into affinity values, and network parameters are obtained through a back propagation algorithm.
11. The method of claim 10, copying 3 inputs for each deep convolutional layer, learning two through different convolutional networks, and performing Sigmoid normalization on one of the convolutional outputs to obtain two vectors a and B, performing bitwise multiplication to obtain a residual a x B; the other is added to the residual of A B.
12. The method of claim 11, adding a global pooling layer θ after the last convolutional layer: rn×p→RpAnd p is the output dimension of the last layer to obtain a feature vector.
13. The method according to any one of claims 10-12, wherein the affinity value is mapped by fully-connected layers to the feature vectors obtained from deep convolutional layers in the multi-layered fully-connected layers by:
1) performing linear transformation on the output characteristic vector x of the deep convolutional layer to serve as an input vector y of the full-link layer, namely Wx + b;
2) and carrying out nonlinear conversion on the linear conversion result by using a linear rectification function:
Figure FDA0002844965640000021
1) and 2) forming a layer of mapping network, and outputting the affinity prediction result through a plurality of layers of mapping networks.
14. The method according to any one of claims 10-13, wherein in (4), the trained deep neural network comprises a deep convolutional layer and a plurality of fully-connected layers, wherein the deep convolutional layer is used for extracting the feature vector of the oligomer-immune molecule to be detected, and the feature vector is input into the plurality of fully-connected layers to predict the binding affinity of the oligomer-immune molecule to be detected.
15. A method of building a model for predicting the affinity of an oligomer to an immune molecule based on deep learning, the method comprising building a model for predicting the affinity of an oligomer to an immune molecule by (1) - (3) of the method of any one of claims 1-14.
16. An antigen affinity prediction model established by the method of claim 15, the antigen affinity prediction model comprising: a data acquisition module, an input vector establishing module and a model establishing module,
the data acquisition module is used for acquiring a data set of the oligomer-immune molecule combination structure, wherein the data set comprises the affinity of oligomer-immune molecule combination;
the input vector establishing module is used for mapping the sequence of the oligomer, the monomer position of the oligomer and the immune molecule into high-dimensional vectors respectively by using a data set of the oligomer-immune molecule combination structure, and fusing the three high-dimensional vectors into a vector of the oligomer-immune molecule combination structure;
the model building module is used for training a deep neural network by using the vector of the oligomer-immune molecule combination structure and affinity data to build an antigen affinity prediction model.
CN202011506001.5A 2020-12-18 2020-12-18 Antigen affinity prediction method and system based on deep learning Pending CN114649054A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011506001.5A CN114649054A (en) 2020-12-18 2020-12-18 Antigen affinity prediction method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011506001.5A CN114649054A (en) 2020-12-18 2020-12-18 Antigen affinity prediction method and system based on deep learning

Publications (1)

Publication Number Publication Date
CN114649054A true CN114649054A (en) 2022-06-21

Family

ID=81991026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011506001.5A Pending CN114649054A (en) 2020-12-18 2020-12-18 Antigen affinity prediction method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN114649054A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588462A (en) * 2022-09-15 2023-01-10 哈尔滨工业大学 Polypeptide and major histocompatibility complex protein molecule combination prediction method based on transfer learning
CN116836231A (en) * 2023-06-30 2023-10-03 深圳大学总医院 New antigen peptide of t (8; 21) AML and application thereof
CN116994644A (en) * 2023-07-28 2023-11-03 天津大学 Medicine target affinity prediction method based on pre-training model
CN117095825A (en) * 2023-10-20 2023-11-21 鲁东大学 Human immune state prediction method based on multi-instance learning
CN118016158A (en) * 2024-02-05 2024-05-10 常州大学 TCR-epitope combination prediction method and system based on transfer learning
CN118248218A (en) * 2024-05-30 2024-06-25 北京航空航天大学杭州创新研究院 Method for developing high-affinity capture antibody EpCAM based on AI algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109671469A (en) * 2018-12-11 2019-04-23 浙江大学 The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network
CN110689965A (en) * 2019-10-10 2020-01-14 电子科技大学 Drug target affinity prediction method based on deep learning
WO2020046587A2 (en) * 2018-08-20 2020-03-05 Nantomice, Llc Methods and systems for improved major histocompatibility complex (mhc)-peptide binding prediction of neoepitopes using a recurrent neural network encoder and attention weighting
CN111951887A (en) * 2020-07-27 2020-11-17 深圳市新合生物医疗科技有限公司 Leukocyte antigen and polypeptide binding affinity prediction method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020046587A2 (en) * 2018-08-20 2020-03-05 Nantomice, Llc Methods and systems for improved major histocompatibility complex (mhc)-peptide binding prediction of neoepitopes using a recurrent neural network encoder and attention weighting
CN109671469A (en) * 2018-12-11 2019-04-23 浙江大学 The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network
CN110689965A (en) * 2019-10-10 2020-01-14 电子科技大学 Drug target affinity prediction method based on deep learning
CN111951887A (en) * 2020-07-27 2020-11-17 深圳市新合生物医疗科技有限公司 Leukocyte antigen and polypeptide binding affinity prediction method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王远强;丁元;徐东海;刘跃辉;张娅;韩英子;罗兴燕;林治华;: "基于氨基酸结构信息的MHCⅠ类抗原表位的定量构效关系建模", 免疫学杂志, vol. 27, no. 10, 1 October 2011 (2011-10-01), pages 829 - 832 *
高敬鹏等: "《深度学习 卷积神经网络技术与实践》", 31 July 2020, 机械工业出版社, pages: 72 - 73 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588462A (en) * 2022-09-15 2023-01-10 哈尔滨工业大学 Polypeptide and major histocompatibility complex protein molecule combination prediction method based on transfer learning
CN116836231A (en) * 2023-06-30 2023-10-03 深圳大学总医院 New antigen peptide of t (8; 21) AML and application thereof
CN116836231B (en) * 2023-06-30 2024-02-13 深圳大学总医院 New antigen peptide of t (8; 21) AML and application thereof
CN116994644A (en) * 2023-07-28 2023-11-03 天津大学 Medicine target affinity prediction method based on pre-training model
CN116994644B (en) * 2023-07-28 2024-02-02 天津大学 Medicine target affinity prediction method based on pre-training model
CN117095825A (en) * 2023-10-20 2023-11-21 鲁东大学 Human immune state prediction method based on multi-instance learning
CN117095825B (en) * 2023-10-20 2024-01-05 鲁东大学 Human immune state prediction method based on multi-instance learning
CN118016158A (en) * 2024-02-05 2024-05-10 常州大学 TCR-epitope combination prediction method and system based on transfer learning
CN118248218A (en) * 2024-05-30 2024-06-25 北京航空航天大学杭州创新研究院 Method for developing high-affinity capture antibody EpCAM based on AI algorithm
CN118248218B (en) * 2024-05-30 2024-07-30 北京航空航天大学杭州创新研究院 Method for developing high-affinity capture antibody EpCAM based on AI algorithm

Similar Documents

Publication Publication Date Title
CN114649054A (en) Antigen affinity prediction method and system based on deep learning
CN109671469B (en) Method for predicting binding relationship and binding affinity between polypeptide and HLA type I molecule based on circulating neural network
KR102607567B1 (en) GAN-CNN for MHC peptide binding prediction
Tomar et al. Immunoinformatics: an integrated scenario
Wu et al. TCR-BERT: learning the grammar of T-cell receptors for flexible antigen-binding analyses
Dasari et al. Explainable deep neural networks for novel viral genome prediction
Sunny et al. Protein–protein docking: Past, present, and future
CN113762417B (en) Method for enhancing HLA antigen presentation prediction system based on deep migration
Fu et al. An overview of bioinformatics tools and resources in allergy
Doneva et al. Predicting immunogenicity risk in biopharmaceuticals
Guo et al. A deep convolutional neural network to improve the prediction of protein secondary structure
Zheng et al. B-Cell Epitope Predictions Using Computational Methods
Attique et al. DeepBCE: evaluation of deep learning models for identification of immunogenic B-cell epitopes
Zhang et al. PiTE: TCR-epitope binding affinity prediction pipeline using Transformer-based sequence encoder
Huang et al. Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development
CN116130005B (en) Tandem design method and device for multi-epitope vaccine, equipment and storage medium
Zhang et al. EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides
Saxena et al. OnionMHC: A deep learning model for peptide—HLA-A* 02: 01 binding predictions using both structure and sequence feature sets
Bi et al. An attention based bidirectional LSTM method to predict the binding of TCR and epitope
Marzella et al. Improving generalizability for MHC-I binding peptide predictions through geometric deep learning
Zhang et al. TNFIPs-Net: A deep learning model based on multi-feature fusion for prediction of TNF-α inducing epitopes
Xue et al. FeatureDock: Protein-Ligand Docking Guided by Physicochemical Feature-Based Local Environment Learning using Transformer
Shang et al. Pretraining Transformers for TCR-pMHC Binding Prediction
Gupta et al. Comparative analysis of epitope predictions: proposed library of putative vaccine candidates for HIV
Xie et al. MHC2NNZ: A novel peptide binding prediction approach for HLA DQ molecules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination